ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	10

Source

Educational Testing Service

Author

Sinharay, Sandip	11
Haberman, Shelby J.	3
Dorans, Neil J.	2
Holland, Paul W.	2
Liang, Longjuan	2
Almond, Russell	1
Curley, Edward	1
Feigenbaum, Miriam	1
Haberman, Shelby	1
Jia, Helena	1
Lee, Yi-Hsuan	1
Liu, Jinghua	1
Yan, Duanli	1
von Davier, Matthias	1
More ▼

Publication Type

Reports - Research	7
Reports - Evaluative	3
Numerical/Quantitative Data	2
Information Analyses	1

Education Level

High Schools	2
Secondary Education	2
Grade 4	1
Grade 8	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Merit Scholarship…	2
Preliminary Scholastic…	2
SAT (College Admission Test)	2

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Fit of Item Response Theory Models: A Survey of Data from Several Operational Tests. Research Report. ETS RR-11-29

Download full text

Sinharay, Sandip; Haberman, Shelby J.; Jia, Helena – Educational Testing Service, 2011

Standard 3.9 of the "Standards for Educational and Psychological Testing" (American Educational Research Association, American Psychological Association, & National Council for Measurement in Education, 1999) demands evidence of model fit when an item response theory (IRT) model is used to make inferences from a data set. We applied two recently…

Descriptors: Item Response Theory, Goodness of Fit, Statistical Analysis, Language Tests

How Does the Knowledge of Subgroup Membership of Examinees Affect the Prediction of True Subscores? Research Report. ETS RR-11-43

Download full text

Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011

Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…

Descriptors: True Scores, Test Theory, Prediction, Group Membership

Equating of Subscores and Weighted Averages under the NEAT Design. Research Report. ETS RR-11-01

Download full text

Sinharay, Sandip; Haberman, Shelby – Educational Testing Service, 2011

Recently, the literature has seen increasing interest in subscores for their potential diagnostic values; for example, one study suggested the report of weighted averages of a subscore and the total score, whereas others showed, for various operational and simulated data sets, that weighted averages, as compared to subscores, lead to more accurate…

Descriptors: Equated Scores, Weighted Scores, Tests, Statistical Analysis

First Language of Examinees and Its Relationship to Differential Item Functioning. Research Report. ETS RR-09-11

Download full text

Sinharay, Sandip; Dorans, Neil J.; Liang, Longjuan – Educational Testing Service, 2009

To ensure fairness, it is important to better understand the relationship of language proficiency to standard psychometric analysis procedures. This paper examines how results of differential item functioning (DIF) analysis are affected by an increase in the proportion of examinees who report that English is not their first language in the…

Descriptors: Test Bias, Language Proficiency, English (Second Language), Measurement

First Language of Examinees and Its Relationship to Equating. Research Report. ETS RR-09-05

Download full text

Liang, Longjuan; Dorans, Neil J.; Sinharay, Sandip – Educational Testing Service, 2009

To ensure fairness, it is important to better understand the relationship of language proficiency with the standard procedures of psychometric analysis. This paper examines how equating results are affected by an increase in the proportion of examinees who report that English is not their first language, using the analysis samples for a…

Descriptors: Equated Scores, English (Second Language), Reading Tests, Mathematics Tests

Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

Download full text

von Davier, Matthias; Sinharay, Sandip – Educational Testing Service, 2009

This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

Descriptors: Item Response Theory, Regression (Statistics), Models, Methods

Statistical Procedures to Evaluate Quality of Scale Anchoring. Research Report. ETS RR-11-02

Download full text

Haberman, Shelby J.; Sinharay, Sandip; Lee, Yi-Hsuan – Educational Testing Service, 2011

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement (Carroll, 1993). Scale anchoring (Beaton & Allen, 1992), a technique that describes what students at different points on a score scale know and can do,…

Descriptors: Statistical Analysis, Scores, Regression (Statistics), Item Response Theory

When Can Subscores Be Expected to Have Added Value? Results from Operational and Simulated Data. Research Report. ETS RR-10-16

Download full text

Sinharay, Sandip – Educational Testing Service, 2010

Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman (2008) suggested a method based on classical test theory to determine whether subscores have added value over total scores. This paper provides a literature review and reports when subscores were found to have added value for…

Descriptors: Scores, Correlation, Reliability, Item Response Theory

The Missing Data Assumptions of the Nonequivalent Groups with Anchor Test (NEAT) Design and Their Implications for Test Equating. Research Report. ETS RR-09-16

Download full text

Sinharay, Sandip; Holland, Paul W. – Educational Testing Service, 2008

The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each…

Descriptors: Equated Scores, Test Items, Item Response Theory, Data

The Effects of Different Types of Anchor Tests on Observed Score Equating. Research Report. ETS RR-09-41

Download full text

Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward – Educational Testing Service, 2009

This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…

Descriptors: Equated Scores, Test Items, Difficulty Level, Error of Measurement

Assessing Fit of Models with Discrete Proficiency Variable in Educational Assessment. Research Report. RR-04-07

Download full text

Sinharay, Sandip; Almond, Russell; Yan, Duanli – Educational Testing Service, 2004

Model checking is a crucial part of any statistical analysis. As educators tie models for testing to cognitive theory of the domains, there is a natural tendency to represent participant proficiencies with latent variables representing the presence or absence of the knowledge, skills, and proficiencies to be tested (Mislevy, Almond, Yan, &…

Descriptors: Statistical Analysis, Epistemology, Educational Assessment, Item Response Theory

Item Response Theory	6
College Entrance Examinations	5
Statistical Analysis	5
Equated Scores	4
Mathematics Tests	4
Reading Tests	4
Difficulty Level	3
Licensing Examinations…	3
Test Items	3
Achievement Tests	2
Correlation	2
English (Second Language)	2
Goodness of Fit	2
High School Students	2
Language Proficiency	2
Language Tests	2
Measurement	2
Methods	2
Regression (Statistics)	2
Reliability	2
Scores	2
Test Bias	2
Test Theory	2
Writing Tests	2
Accuracy	1
More ▼