ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	11

Descriptor

Classification	16
Error of Measurement	16
Item Response Theory	16
Reliability	6
Psychometrics	5
Accuracy	4
Computation	4
Item Analysis	4
Test Items	4
Comparative Analysis	3
Evaluation Methods	3
Factor Analysis	3
Monte Carlo Methods	3
Probability	3
Scores	3
True Scores	3
Ability	2
Correlation	2
Decision Making	2
Difficulty Level	2
Evaluation Criteria	2
Foreign Countries	2
Goodness of Fit	2
Markov Processes	2
National Curriculum	2
More ▼

Source

Journal of Educational…	3
Structural Equation Modeling:…	2
ETS Research Report Series	1
Educational Measurement:…	1
Educational Research	1
Journal of Experimental…	1
Online Submission	1
Practical Assessment,…	1
Psychological Methods	1
Psychometrika	1
Research Papers in Education	1
More ▼

Publication Type

Journal Articles	13
Reports - Research	8
Reports - Evaluative	7
Reports - Descriptive	2
Speeches/Meeting Papers	2

Education Level

Elementary Education

Audience

Location

United Kingdom (England)

Laws, Policies, & Programs

Assessments and Surveys

Work Keys (ACT)

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Comparing Mimic and Mimic-Interaction to Alignment Methods for Investigating Measurement Invariance Concerning a Continuous Violator

Peer reviewed

Direct link

Yuanfang Liu; Mark H. C. Lai; Ben Kelcey – Structural Equation Modeling: A Multidisciplinary Journal, 2024

Measurement invariance holds when a latent construct is measured in the same way across different levels of background variables (continuous or categorical) while controlling for the true value of that construct. Using Monte Carlo simulation, this paper compares the multiple indicators, multiple causes (MIMIC) model and MIMIC-interaction to a…

Descriptors: Classification, Accuracy, Error of Measurement, Correlation

Impact of DIF on General Factor Mean Comparisons for Bifactor, Ordinal Data

Peer reviewed

Direct link

Liu, Yixing; Thompson, Marilyn S. – Journal of Experimental Education, 2022

A simulation study was conducted to explore the impact of differential item functioning (DIF) on general factor difference estimation for bifactor, ordinal data. Common analysis misspecifications in which the generated bifactor data with DIF were fitted using models with equality constraints on noninvariant item parameters were compared under data…

Descriptors: Comparative Analysis, Item Analysis, Sample Size, Error of Measurement

The Invariance Paradox: Using Optimal Test Design to Minimize Bias

Peer reviewed

Direct link

Jones, Andrew T.; Kopp, Jason P.; Ong, Thai Q. – Educational Measurement: Issues and Practice, 2020

Studies investigating invariance have often been limited to measurement or prediction invariance. Selection invariance, wherein the use of test scores for classification results in equivalent classification accuracy between groups, has received comparatively little attention in the psychometric literature. Previous research suggests that some form…

Descriptors: Test Construction, Test Bias, Classification, Accuracy

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

The Heteroscedastic Graded Response Model with a Skewed Latent Trait: Testing Statistical and Substantive Hypotheses Related to Skewed Item Category Functions

Peer reviewed

Direct link

Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul – Psychometrika, 2012

The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…

Descriptors: Simulation, Regression (Statistics), Psychometrics, Item Response Theory

A Multilevel Testlet Model for Dual Local Dependence

Peer reviewed

Direct link

Jiao, Hong; Kamata, Akihito; Wang, Shudong; Jin, Ying – Journal of Educational Measurement, 2012

The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet-based assessment, both local item dependence and local person dependence are likely to be induced.…

Descriptors: Item Response Theory, Test Items, Markov Processes, Monte Carlo Methods

Two Studies of Specification Error in Models for Categorical Latent Variables

Peer reviewed

Direct link

Kaplan, David; Depaoli, Sarah – Structural Equation Modeling: A Multidisciplinary Journal, 2011

This article examines the problem of specification error in 2 models for categorical latent variables; the latent class model and the latent Markov model. Specification error in the latent class model focuses on the impact of incorrectly specifying the number of latent classes of the categorical latent variable on measures of model adequacy as…

Descriptors: Markov Processes, Longitudinal Studies, Probability, Item Response Theory

A Response to an Article Published in "Educational Research"'s Special Issue on Assessment (June 2009). What Can Be Inferred about Classification Accuracy from Classification Consistency?

Peer reviewed

Direct link

Bramley, Tom – Educational Research, 2010

Background: A recent article published in "Educational Research" on the reliability of results in National Curriculum testing in England (Newton, "The reliability of results from national curriculum testing in England," "Educational Research" 51, no. 2: 181-212, 2009) suggested that: (1) classification accuracy can be…

Descriptors: National Curriculum, Educational Research, Testing, Measurement

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

A Review of Recent Developments in Differential Item Functioning. Research Report. ETS RR-08-43

Peer reviewed
PDF on ERIC

Download full text

Mapuranga, Raymond; Dorans, Neil J.; Middleton, Kyndra – ETS Research Report Series, 2008

In many practical settings, essentially the same differential item functioning (DIF) procedures have been in use since the late 1980s. Since then, examinee populations have become more heterogeneous, and tests have included more polytomously scored items. This paper summarizes and classifies new DIF methods and procedures that have appeared since…

Descriptors: Test Bias, Educational Development, Evaluation Methods, Statistical Analysis

Computing the Expected Proportions of Misclassified Examinees.

Peer reviewed

Rudner, Lawrence M. – Practical Assessment, Research & Evaluation, 2001

Provides and illustrates a method to compute the expected number of misclassifications of examinees using three-parameter item response theory and two state classifications (mastery or nonmastery). The method uses the standard error and the expected examinee ability distribution. (SLD)

Descriptors: Ability, Classification, Computation, Error of Measurement

Psychometric Properties of Scale Scores and Performance Levels for Performance Assessments Using Polytomous IRT.

Peer reviewed

Wang, Tianyou; Kolen, Michael J.; Harris, Deborah J. – Journal of Educational Measurement, 2000

Describes procedures for calculating conditional standard error of measurement (CSEM) and reliability of scale scores and classification of consistency of performance levels. Applied these procedures to data from the American College Testing Program's Work Keys Writing Assessment with sample sizes of 7,097, 1,035, and 1,793. Results show that the…

Descriptors: Adults, Classification, Error of Measurement, Item Response Theory

Conditional Standard Errors, Reliability and Decision Consistency of Performance Levels Using Polytomous IRT.

Wang, Tianyou; And Others – 1996

M. J. Kolen, B. A. Hanson, and R. L. Brennan (1992) presented a procedure for assessing the conditional standard error of measurement (CSEM) of scale scores using a strong true-score model. They also investigated the ways of using nonlinear transformation from number-correct raw score to scale score to equalize the conditional standard error along…

Descriptors: Ability, Classification, Error of Measurement, Goodness of Fit

Evaluation of Linking Methods for Placing Three-Parameter Logistic Item Parameter Estimates onto a One-Parameter Scale

Download full text

Karkee, Thakur B.; Wright, Karen R. – Online Submission, 2004

Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…

Descriptors: Measures (Individuals), Evaluation Criteria, Testing, Integrity

Previous Page | Next Page »

Pages: 1 | 2

Wang, Tianyou	2
Anwyll, Steve	1
Ben Kelcey	1
Bramley, Tom	1
Choi, Jiwon	1
D'Costa, Ayres G.	1
Depaoli, Sarah	1
Dolan, Conor V.	1
Dorans, Neil J.	1
Emons, Wilco H. M.	1
Glanville, Matthew	1
Harris, Deborah J.	1
He, Qingping	1
Jiao, Hong	1
Jin, Ying	1
Jones, Andrew T.	1
Kamata, Akihito	1
Kang, Yujin	1
Kaplan, David	1
Karkee, Thakur B.	1
Kim, Stella Y.	1
Kolen, Michael J.	1
Kopp, Jason P.	1
Lee, Won-Chan	1
More ▼