ERIC - Search Results

Publication Date

In 2025	2
Since 2024	5
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	24

Descriptor

Classification	28
Evaluation Methods	28
Item Response Theory	28
Models	14
Diagnostic Tests	9
Psychometrics	9
Test Items	8
Measurement Techniques	7
Accuracy	6
Educational Assessment	6
Statistical Analysis	6
Correlation	5
Factor Analysis	5
Measurement	5
Student Evaluation	5
Criterion Referenced Tests	4
Cutting Scores	4
Educational Testing	4
Evaluation Criteria	4
Evaluation Problems	4
Evidence	4
Item Analysis	4
Simulation	4
State of the Art Reviews	4
Computer Assisted Testing	3
More ▼

Source

Measurement:…	7
Applied Psychological…	2
International Educational…	2
Journal of Educational…	2
Applied Measurement in…	1
ETS Research Report Series	1
Educational Research and…	1
Educational and Psychological…	1
Grantee Submission	1
International Journal of…	1
Journal of Educational and…	1
Journal of Experimental Child…	1
Journal of Experimental…	1
Journal of Outcome Measurement	1
Online Submission	1
Practical Assessment,…	1
ProQuest LLC	1
Structural Equation Modeling:…	1
More ▼

Publication Type

Journal Articles	22
Reports - Research	14
Reports - Evaluative	6
Opinion Papers	4
Speeches/Meeting Papers	3
Collected Works - Proceedings	2
Book/Product Reviews	1
Dissertations/Theses -…	1
Reports - Descriptive	1

Education Level

Higher Education	2
Junior High Schools	2
Middle Schools	2
Postsecondary Education	2
Secondary Education	2
Elementary Education	1
Grade 6	1
High Schools	1
Intermediate Grades	1

Audience

Location

Afghanistan	1
Finland	1
France	1
Illinois (Chicago)	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

Classification Consistency and Accuracy Indices for Simple Structure Multidimensional Item Response Theory Model

Direct link

Huan Liu – ProQuest LLC, 2024

In many large-scale testing programs, examinees are frequently categorized into different performance levels. These classifications are then used to make high-stakes decisions about examinees in contexts such as in licensure, certification, and educational assessments. Numerous approaches to estimating the consistency and accuracy of this…

Descriptors: Classification, Accuracy, Item Response Theory, Decision Making

Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model

Peer reviewed

Direct link

Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…

Descriptors: Testing, Computation, Classification, Accuracy

Estimating the Reliability of Skill Transitions in Longitudinal Diagnostic Classification Models

Peer reviewed

Direct link

Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024

Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…

Descriptors: Diagnostic Tests, Classification, Models, Psychometrics

A Validation Study of the Extended Relevance Scale Using the D3mirt Package for R

Peer reviewed

Direct link

Erik Forsberg; Anders Sjöberg – Measurement: Interdisciplinary Research and Perspectives, 2025

This paper reports a validation study based on descriptive multidimensional item response theory (DMIRT), implemented in the R package "D3mirt" by using the ERS-C, an extended version of the Relevance subscale from the Moral Foundations Questionnaire including two new items for collectivism (17 items in total). Two latent models are…

Descriptors: Evaluation Methods, Programming Languages, Altruism, Collectivism

Detecting Compromised Items with Response Times Using a Bayesian Change-Point Approach

Peer reviewed

Direct link

Yang Du; Susu Zhang – Journal of Educational and Behavioral Statistics, 2025

Item compromise has long posed challenges in educational measurement, jeopardizing both test validity and test security of continuous tests. Detecting compromised items is therefore crucial to address this concern. The present literature on compromised item detection reveals two notable gaps: First, the majority of existing methods are based upon…

Descriptors: Item Response Theory, Item Analysis, Bayesian Statistics, Educational Assessment

Comparing Mimic and Mimic-Interaction to Alignment Methods for Investigating Measurement Invariance Concerning a Continuous Violator

Peer reviewed

Direct link

Yuanfang Liu; Mark H. C. Lai; Ben Kelcey – Structural Equation Modeling: A Multidisciplinary Journal, 2024

Measurement invariance holds when a latent construct is measured in the same way across different levels of background variables (continuous or categorical) while controlling for the true value of that construct. Using Monte Carlo simulation, this paper compares the multiple indicators, multiple causes (MIMIC) model and MIMIC-interaction to a…

Descriptors: Classification, Accuracy, Error of Measurement, Correlation

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

flexMIRT: A Flexible Modeling Package for Multidimensional Item Response Models

Peer reviewed

Direct link

Chung, Seungwon; Houts, Carrie – Measurement: Interdisciplinary Research and Perspectives, 2020

Advanced modeling of item response data through the item response theory (IRT) or item factor analysis frameworks is becoming increasingly popular. In the social and behavioral sciences, the underlying structure of tests/assessments is often multidimensional (i.e., more than 1 latent variable/construct is represented in the items). This review…

Descriptors: Item Response Theory, Evaluation Methods, Models, Factor Analysis

A Practical Comparison of Selected Methods of Evaluating Multiple-Choice Options through Classical Item Analysis

Peer reviewed
PDF on ERIC

Download full text

Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020

The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…

Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

The Legacies of R. A. Fisher and K. Pearson in the Application of the Polytomous Rasch Model for Assessing the Empirical Ordering of Categories

Peer reviewed

Direct link

Andrich, David – Educational and Psychological Measurement, 2013

Assessments in response formats with ordered categories are ubiquitous in the social and health sciences. Although the assumption that the ordering of the categories is working as intended is central to any interpretation that arises from such assessments, testing that this assumption is valid is not standard in psychometrics. This is surprising…

Descriptors: Item Response Theory, Classification, Statistical Analysis, Models

Test-Induced Priming Increases False Recognition in Older but Not Younger Children

Peer reviewed

Direct link

Dewhurst, Stephen A.; Howe, Mark L.; Berry, Donna M.; Knott, Lauren M. – Journal of Experimental Child Psychology, 2012

The effect of test-induced priming on false recognition was investigated in children aged 5, 7, 9, and 11 years using lists of semantic associates, category exemplars, and phonological associates. In line with effects previously observed in adults, nine- and eleven-year-olds showed increased levels of false recognition when critical lures were…

Descriptors: Priming, Semantics, Classification, Semiotics

The Examination of the Classification of Students into Performance Categories by Two Different Equating Methods

Peer reviewed

Direct link

Keller, Lisa A.; Keller, Robert R.; Parker, Pauline A. – Journal of Experimental Education, 2011

This study investigates the comparability of two item response theory based equating methods: true score equating (TSE), and estimated true equating (ETE). Additionally, six scaling methods were implemented within each equating method: mean-sigma, mean-mean, two versions of fixed common item parameter, Stocking and Lord, and Haebara. Empirical…

Descriptors: Scaling, Program Effectiveness, Classification, True Scores

A New Approach for Testing the Rasch Model

Peer reviewed

Direct link

Kubinger, Klaus D.; Rasch, Dieter; Yanagida, Takuya – Educational Research and Evaluation, 2011

Though calibration of an achievement test within psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009) recently suggested an approach for the determination of sample size according to a given Type I and…

Descriptors: Sample Size, Simulation, Testing, Achievement Tests

Cognitive Diagnostic Attribute-Level Discrimination Indices

Peer reviewed

Direct link

Henson, Robert; Roussos, Louis; Douglas, Jeff; He, Xuming – Applied Psychological Measurement, 2008

Cognitive diagnostic models (CDMs) model the probability of correctly answering an item as a function of an examinee's attribute mastery pattern. Because estimation of the mastery pattern involves more than a continuous measure of ability, reliability concepts introduced by classical test theory and item response theory do not apply. The cognitive…

Descriptors: Diagnostic Tests, Classification, Probability, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2

Dorans, Neil J.	2
Anders Sjöberg	1
Andrich, David	1
Barnes, Tiffany, Ed.	1
Bechger, Timo	1
Ben Kelcey	1
Berry, Donna M.	1
Carstensen, Claus H.	1
Cheng, Ying	1
Chi, Min, Ed.	1
Chung, Seungwon	1
Dewhurst, Stephen A.	1
Douglas, Jeff	1
Dwyer, Andrew C.	1
Erik Forsberg	1
Farin, Erik	1
Feng, Mingyu, Ed.	1
Fitzpatrick, Anne R.	1
Fleitz, Annette	1
Frey, Andreas	1
Furter, Robert T.	1
Haberman, Shelby J.	1
He, Xuming	1
Henson, Robert	1
Houts, Carrie	1
More ▼