ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	8

Descriptor

Computation	8
Error of Measurement	8
Testing	8
Scores	3
Statistical Bias	3
Comparative Analysis	2
Decision Making	2
Item Response Theory	2
Measurement	2
Models	2
Monte Carlo Methods	2
Statistical Analysis	2
Structural Equation Models	2
Achievement	1
Background	1
Causal Models	1
Classification	1
Computer Software	1
Context Effect	1
Dialects	1
English (Second Language)	1
Evaluation Methods	1
Evidence	1
Grade 8	1
Identification	1
More ▼

Source

Annenberg Institute for…	1
Applied Measurement in…	1
Applied Psychological…	1
Educational and Psychological…	1
Psychological Review	1
Psychometrika	1
Structural Equation Modeling:…	1
Teachers College Record	1

Author

Benjamin W. Domingue	1
Birnbaum, Michael H.	1
Bollen, Kenneth A.	1
Davis, Walter R.	1
Hsiao, Yu-Yu	1
James G. Soland	1
Joshua B. Gilbert	1
Kwok, Oi-Man	1
Lai, Mark H. C.	1
Rutkowski, Leslie	1
Sijtsma, Klaas	1
Solano-Flores, Guillermo	1
Woods, Carol M.	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	4
Reports - Descriptive	3
Opinion Papers	2
Reports - Evaluative	1

Education Level

Grade 8	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Progress in International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 8 results Save | Export

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

Evaluation of Two Methods for Modeling Measurement Errors When Testing Interaction Effects with Observed Composite Scores

Peer reviewed

Direct link

Hsiao, Yu-Yu; Kwok, Oi-Man; Lai, Mark H. C. – Educational and Psychological Measurement, 2018

Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural…

Descriptors: Error of Measurement, Testing, Scores, Models

Sensitivity of Achievement Estimation to Conditioning Model Misclassification

Peer reviewed

Direct link

Rutkowski, Leslie – Applied Measurement in Education, 2014

Large-scale assessment programs such as the National Assessment of Educational Progress (NAEP), Trends in International Mathematics and Science Study (TIMSS), and Programme for International Student Assessment (PISA) use a sophisticated assessment administration design called matrix sampling that minimizes the testing burden on individual…

Descriptors: Measurement, Testing, Item Sampling, Computation

Testing Mixture Models of Transitive Preference: Comment on Regenwetter, Dana, and Davis-Stober (2011)

Peer reviewed

Direct link

Birnbaum, Michael H. – Psychological Review, 2011

This article contrasts 2 approaches to analyzing transitivity of preference and other behavioral properties in choice data. The approach of Regenwetter, Dana, and Davis-Stober (2011) assumes that on each choice, a decision maker samples randomly from a mixture of preference orders to determine whether "A" is preferred to "B." In contrast, Birnbaum…

Descriptors: Evidence, Testing, Computation, Probability

On the Use, the Misuse, and the Very Limited Usefulness of Cronbach's Alpha

Peer reviewed

Direct link

Sijtsma, Klaas – Psychometrika, 2009

This discussion paper argues that both the use of Cronbach's alpha as a reliability estimate and as a measure of internal consistency suffer from major problems. First, alpha always has a value, which cannot be equal to the test score's reliability given the inter-item covariance matrix and the usual assumptions about measurement error. Second, in…

Descriptors: Measurement, Error of Measurement, Scores, Computation

Ramsay-Curve Differential Item Functioning

Peer reviewed

Direct link

Woods, Carol M. – Applied Psychological Measurement, 2011

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…

Descriptors: Simulation, Item Response Theory, Testing, Questionnaires

Causal Indicator Models: Identification, Estimation, and Testing

Peer reviewed

Direct link

Bollen, Kenneth A.; Davis, Walter R. – Structural Equation Modeling: A Multidisciplinary Journal, 2009

We discuss the identification, estimation, and testing of structural equation models that have causal indicators. We first provide 2 rules of identification that are particularly helpful in models with causal indicators--the 2C emitted paths rule and the exogenous X rule. We demonstrate how these rules can help us distinguish identified from…

Descriptors: Structural Equation Models, Testing, Identification, Statistical Significance

Language, Dialect, and Register: Sociolinguistics and the Estimation of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Teachers College Record, 2006

This article examines the intersection of psychometrics and sociolinguists in the testing of English language learners (ELLs); it discusses language, dialect, and register as sources of measurement error. Research findings show that the dialect of the language in which students are tested (e.g., local or standard English) is as important as…

Descriptors: Second Language Learning, Test Construction, Sociolinguistics, Psychometrics