ERIC - Search Results

Publication Date

In 2025	2
Since 2024	5
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	19
Since 2006 (last 20 years)	42

Descriptor

Error of Measurement	42
Scaling	42
Item Response Theory	23
Test Items	10
Test Reliability	9
Computation	8
Equated Scores	8
Foreign Countries	8
Achievement Tests	7
Scores	7
Scoring	7
Simulation	7
Statistical Analysis	7
Test Validity	7
Language Tests	6
Mathematics Tests	6
Test Bias	6
Academic Achievement	5
Comparative Analysis	5
Correlation	5
Models	5
Statistical Bias	5
Test Construction	5
Data Analysis	4
Data Collection	4
More ▼

Publication Type

Journal Articles	35
Reports - Research	28
Reports - Descriptive	8
Reports - Evaluative	5
Numerical/Quantitative Data	3
Dissertations/Theses -…	1
Speeches/Meeting Papers	1

Education Level

Secondary Education	9
Elementary Secondary Education	6
Junior High Schools	6
Middle Schools	6
Elementary Education	4
Grade 3	4
Grade 5	4
Grade 7	4
High Schools	4
Higher Education	4
Early Childhood Education	3
Grade 4	3
Grade 6	3
Grade 8	3
Intermediate Grades	3
Postsecondary Education	3
Primary Education	3
Grade 9	2
More ▼

Audience

Location

New York	3
Indonesia	2
Germany	1
Japan	1
United Kingdom	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	3
ACT Assessment	1
Iowa Tests of Basic Skills	1
Iowa Tests of Educational…	1
Program for International…	1
SAT (College Admission Test)	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 42 results Save | Export

Combining Mokken Scale Analysis with Rasch Measurement Theory to Explore Differences in Measurement Quality between Subgroups

Peer reviewed

Direct link

Stefanie A. Wind; Benjamin Lugu; Yurou Wang – International Journal of Testing, 2025

Mokken Scale Analysis (MSA) is a nonparametric approach that offers exploratory tools for understanding the nature of item responses while emphasizing invariance requirements. MSA is often discussed as it relates to Rasch measurement theory, which also emphasizes invariance, but uses parametric models. Researchers who have compared and combined…

Descriptors: Item Response Theory, Scaling, Surveys, Evaluation Methods

Comparing Measurement Reliability Estimation Techniques: Correlation Coefficient vs. Bland-Altman Plot

Peer reviewed

Direct link

Tülin Otbiçer Acar – Measurement: Interdisciplinary Research and Perspectives, 2024

The aim of this study is to compare the results of correlation coefficient estimation of reliability with those obtained through the Bland-Altman plot technique. The scale was first divided into two halves using three different approaches. A linear and high-level relationship was found between the scale scores obtained from the halved forms.…

Descriptors: High School Students, Measurement Techniques, Psychometrics, Comparative Testing

Scale-Invariance, Equivariance and Dependency of Structural Equation Models

Peer reviewed

Direct link

Ke-Hai Yuan; Ling Ling; Zhiyong Zhang – Grantee Submission, 2024

Data in social and behavioral sciences typically contain measurement errors and do not have predefined metrics. Structural equation modeling (SEM) is widely used for the analysis of such data, where the scales of the manifest and latent variables are often subjective. This article studies how the model, parameter estimates, their standard errors…

Descriptors: Structural Equation Models, Computation, Social Science Research, Error of Measurement

Scale-Invariance, Equivariance and Dependency of Structural Equation Models

Peer reviewed

Direct link

Ke-Hai Yuan; Ling Ling; Zhiyong Zhang – Structural Equation Modeling: A Multidisciplinary Journal, 2024

Descriptors: Structural Equation Models, Computation, Social Science Research, Error of Measurement

On the Merits of Longitudinal Multiple Group Modelling: An Alternative to Multilevel Modelling for Intervention Evaluations

Peer reviewed

Direct link

Little, Todd D.; Bontempo, Daniel; Rioux, Charlie; Tracy, Allison – International Journal of Research & Method in Education, 2022

Multilevel modelling (MLM) is the most frequently used approach for evaluating interventions with clustered data. MLM, however, has some limitations that are associated with numerous obstacles to model estimation and valid inferences. Longitudinal multiple-group (LMG) modelling is a longstanding approach for testing intervention effects using…

Descriptors: Longitudinal Studies, Hierarchical Linear Modeling, Alternative Assessment, Intervention

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Growth across Grades and Common Item Grade Alignment in Vertical Scaling Using the Rasch Model

Peer reviewed

Direct link

Sanford R. Student; Derek C. Briggs; Laurie Davis – Educational Measurement: Issues and Practice, 2025

Vertical scales are frequently developed using common item nonequivalent group linking. In this design, one can use upper-grade, lower-grade, or mixed-grade common items to estimate the linking constants that underlie the absolute measurement of growth. Using the Rasch model and a dataset from Curriculum Associates' i-Ready Diagnostic in math in…

Descriptors: Elementary School Mathematics, Elementary School Students, Middle School Mathematics, Middle School Students

Modeling of Item Response Functions under the D-Scoring Method

Peer reviewed

Direct link

Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2020

This study presents new models for item response functions (IRFs) in the framework of the D-scoring method (DSM) that is gaining attention in the field of educational and psychological measurement and largescale assessments. In a previous work on DSM, the IRFs of binary items were estimated using a logistic regression model (LRM). However, the LRM…

Descriptors: Item Response Theory, Scoring, True Scores, Scaling

Variance Estimation in Evaluations with No-Shows: A Comparison of Methods

Peer reviewed

Direct link

Litwok, Daniel; Peck, Laura R. – American Journal of Evaluation, 2019

In experimental evaluations of policy interventions, the so-called Bloom adjustment is commonly used to estimate the impact of the treatment on the treated. It does so by rescaling the estimated impact of the intention to treat--that is, the overall treatment-control group difference in outcomes for the entire experimental sample--by the…

Descriptors: Computation, Outcomes of Treatment, Program Evaluation, Scaling

Conditioning: How Background Variables Can Influence PISA Scores

Peer reviewed

Direct link

Zieger, Laura Raffaella; Jerrim, J.; Anders, J.; Shure, N. – Assessment in Education: Principles, Policy & Practice, 2022

The OECD's Programme for International Student Assessment (PISA) has become one of the key studies for evidence-based education policymaking across the globe. PISA has however received a lot of methodological criticism, including how the test scores are created. The aim of this paper is to investigate the so-called 'conditioning model', where…

Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students

Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale. CEPA Working Paper No. 16-09

Download full text

Reardon, Sean F.; Ho, Andrew D.; Kalogrides, Demetra – Stanford Center for Education Policy Analysis, 2019

Linking score scales across different tests is considered speculative and fraught, even at the aggregate level (Feuer et al., 1999; Thissen, 2007). We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that…

Descriptors: Test Validity, Evaluation Methods, School Districts, Scores

Grouping Effects on Jackknifed Variance Estimation for Item Response Theory Scaling and Equating with Cluster-Based Assessment Data. Research Report. ETS RR-18-16

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2018

Educational assessment data are often collected from a set of test centers across various geographic regions, and therefore the data samples contain clusters. Such cluster-based data may result in clustering effects in variance estimation. However, in many grouped jackknife variance estimation applications, jackknife groups are often formed by a…

Descriptors: Item Response Theory, Scaling, Equated Scores, Cluster Grouping

Polytomous Rasch Models in Counseling Assessment

Peer reviewed

Direct link

Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017

This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.

Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation

Do Indonesian Children's Experiences with Large Currency Units Facilitate Magnitude Estimation of Long Temporal Periods?

Peer reviewed

Direct link

Cheek, Kim A. – Research in Science Education, 2017

Ideas about temporal (and spatial) scale impact students' understanding across science disciplines. Learners have difficulty comprehending the long time periods associated with natural processes because they have no referent for the magnitudes involved. When people have a good "feel" for quantity, they estimate cardinal number magnitude…

Descriptors: Foreign Countries, Scientific Concepts, Science Education, Spatial Ability

How Does Polytomous Item Bias Affect Total-Group Survey Score Comparisons?

Peer reviewed

Direct link

Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…

Descriptors: Test Items, Test Bias, Item Response Theory, Surveys

Previous Page | Next Page »

Pages: 1 | 2 | 3

ETS Research Report Series	5
Educational Measurement:…	3
Educational and Psychological…	3
International Journal of…	3
New York State Education…	3
Journal of Educational and…	2
Psychometrika	2
ACT, Inc.	1
American Journal of Evaluation	1
Applied Measurement in…	1
Applied Psychological…	1
Assessment	1
Assessment in Education:…	1
EURASIA Journal of…	1
Grantee Submission	1
International Journal of…	1
Language Assessment Quarterly	1
Measurement and Evaluation in…	1
Measurement:…	1
Multivariate Behavioral…	1
National Center for Education…	1
ProQuest LLC	1
Research Synthesis Methods	1
Research in Science Education	1
Social Indicators Research	1
More ▼

Bentler, Peter M.	2
Guo, Hongwen	2
Ke-Hai Yuan	2
Ling Ling	2
Moses, Tim	2
Zhiyong Zhang	2
Abbott, Rosemary A.	1
Anders, J.	1
Arce, Alvaro J.	1
Attali, Yigal	1
Baker, Rose D.	1
Benjamin Lugu	1
Benítez, Isabel	1
Bontempo, Daniel	1
Carstensen, Claus H.	1
Chafouleas, Sandra M.	1
Cheek, Kim A.	1
Christ, Theodore J.	1
Cole, Russell	1
Croon, Marcel A.	1
Croudace, Tim J.	1
Cui, Zhongmin	1
Curley, Edward	1
Daud, Muslem	1
Derek C. Briggs	1
More ▼