Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 10 |
Descriptor
Source
Educational Testing Service | 11 |
Author
Sinharay, Sandip | 11 |
Haberman, Shelby J. | 3 |
Dorans, Neil J. | 2 |
Holland, Paul W. | 2 |
Liang, Longjuan | 2 |
Almond, Russell | 1 |
Curley, Edward | 1 |
Feigenbaum, Miriam | 1 |
Haberman, Shelby | 1 |
Jia, Helena | 1 |
Lee, Yi-Hsuan | 1 |
More ▼ |
Publication Type
Reports - Research | 7 |
Reports - Evaluative | 3 |
Numerical/Quantitative Data | 2 |
Information Analyses | 1 |
Education Level
High Schools | 2 |
Secondary Education | 2 |
Grade 4 | 1 |
Grade 8 | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
National Merit Scholarship… | 2 |
Preliminary Scholastic… | 2 |
SAT (College Admission Test) | 2 |
What Works Clearinghouse Rating
Sinharay, Sandip; Haberman, Shelby J.; Jia, Helena – Educational Testing Service, 2011
Standard 3.9 of the "Standards for Educational and Psychological Testing" (American Educational Research Association, American Psychological Association, & National Council for Measurement in Education, 1999) demands evidence of model fit when an item response theory (IRT) model is used to make inferences from a data set. We applied two recently…
Descriptors: Item Response Theory, Goodness of Fit, Statistical Analysis, Language Tests
Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011
Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…
Descriptors: True Scores, Test Theory, Prediction, Group Membership
Sinharay, Sandip; Haberman, Shelby – Educational Testing Service, 2011
Recently, the literature has seen increasing interest in subscores for their potential diagnostic values; for example, one study suggested the report of weighted averages of a subscore and the total score, whereas others showed, for various operational and simulated data sets, that weighted averages, as compared to subscores, lead to more accurate…
Descriptors: Equated Scores, Weighted Scores, Tests, Statistical Analysis
Sinharay, Sandip; Dorans, Neil J.; Liang, Longjuan – Educational Testing Service, 2009
To ensure fairness, it is important to better understand the relationship of language proficiency to standard psychometric analysis procedures. This paper examines how results of differential item functioning (DIF) analysis are affected by an increase in the proportion of examinees who report that English is not their first language in the…
Descriptors: Test Bias, Language Proficiency, English (Second Language), Measurement
Liang, Longjuan; Dorans, Neil J.; Sinharay, Sandip – Educational Testing Service, 2009
To ensure fairness, it is important to better understand the relationship of language proficiency with the standard procedures of psychometric analysis. This paper examines how equating results are affected by an increase in the proportion of examinees who report that English is not their first language, using the analysis samples for a…
Descriptors: Equated Scores, English (Second Language), Reading Tests, Mathematics Tests
von Davier, Matthias; Sinharay, Sandip – Educational Testing Service, 2009
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Descriptors: Item Response Theory, Regression (Statistics), Models, Methods
Haberman, Shelby J.; Sinharay, Sandip; Lee, Yi-Hsuan – Educational Testing Service, 2011
Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement (Carroll, 1993). Scale anchoring (Beaton & Allen, 1992), a technique that describes what students at different points on a score scale know and can do,…
Descriptors: Statistical Analysis, Scores, Regression (Statistics), Item Response Theory
Sinharay, Sandip – Educational Testing Service, 2010
Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman (2008) suggested a method based on classical test theory to determine whether subscores have added value over total scores. This paper provides a literature review and reports when subscores were found to have added value for…
Descriptors: Scores, Correlation, Reliability, Item Response Theory
Sinharay, Sandip; Holland, Paul W. – Educational Testing Service, 2008
The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each…
Descriptors: Equated Scores, Test Items, Item Response Theory, Data
Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward – Educational Testing Service, 2009
This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…
Descriptors: Equated Scores, Test Items, Difficulty Level, Error of Measurement
Sinharay, Sandip; Almond, Russell; Yan, Duanli – Educational Testing Service, 2004
Model checking is a crucial part of any statistical analysis. As educators tie models for testing to cognitive theory of the domains, there is a natural tendency to represent participant proficiencies with latent variables representing the presence or absence of the knowledge, skills, and proficiencies to be tested (Mislevy, Almond, Yan, &…
Descriptors: Statistical Analysis, Epistemology, Educational Assessment, Item Response Theory