Publication Date
In 2025 | 2 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 24 |
Descriptor
Classification | 28 |
Evaluation Methods | 28 |
Item Response Theory | 28 |
Models | 14 |
Diagnostic Tests | 9 |
Psychometrics | 9 |
Test Items | 8 |
Measurement Techniques | 7 |
Accuracy | 6 |
Educational Assessment | 6 |
Statistical Analysis | 6 |
More ▼ |
Source
Author
Dorans, Neil J. | 2 |
Anders Sjöberg | 1 |
Andrich, David | 1 |
Barnes, Tiffany, Ed. | 1 |
Bechger, Timo | 1 |
Ben Kelcey | 1 |
Berry, Donna M. | 1 |
Carstensen, Claus H. | 1 |
Cheng, Ying | 1 |
Chi, Min, Ed. | 1 |
Chung, Seungwon | 1 |
More ▼ |
Publication Type
Journal Articles | 22 |
Reports - Research | 14 |
Reports - Evaluative | 6 |
Opinion Papers | 4 |
Speeches/Meeting Papers | 3 |
Collected Works - Proceedings | 2 |
Book/Product Reviews | 1 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 6 | 1 |
High Schools | 1 |
Intermediate Grades | 1 |
Audience
Location
Afghanistan | 1 |
Finland | 1 |
France | 1 |
Illinois (Chicago) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Huan Liu – ProQuest LLC, 2024
In many large-scale testing programs, examinees are frequently categorized into different performance levels. These classifications are then used to make high-stakes decisions about examinees in contexts such as in licensure, certification, and educational assessments. Numerous approaches to estimating the consistency and accuracy of this…
Descriptors: Classification, Accuracy, Item Response Theory, Decision Making
Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023
Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…
Descriptors: Testing, Computation, Classification, Accuracy
Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024
Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…
Descriptors: Diagnostic Tests, Classification, Models, Psychometrics
Erik Forsberg; Anders Sjöberg – Measurement: Interdisciplinary Research and Perspectives, 2025
This paper reports a validation study based on descriptive multidimensional item response theory (DMIRT), implemented in the R package "D3mirt" by using the ERS-C, an extended version of the Relevance subscale from the Moral Foundations Questionnaire including two new items for collectivism (17 items in total). Two latent models are…
Descriptors: Evaluation Methods, Programming Languages, Altruism, Collectivism
Yang Du; Susu Zhang – Journal of Educational and Behavioral Statistics, 2025
Item compromise has long posed challenges in educational measurement, jeopardizing both test validity and test security of continuous tests. Detecting compromised items is therefore crucial to address this concern. The present literature on compromised item detection reveals two notable gaps: First, the majority of existing methods are based upon…
Descriptors: Item Response Theory, Item Analysis, Bayesian Statistics, Educational Assessment
Yuanfang Liu; Mark H. C. Lai; Ben Kelcey – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Measurement invariance holds when a latent construct is measured in the same way across different levels of background variables (continuous or categorical) while controlling for the true value of that construct. Using Monte Carlo simulation, this paper compares the multiple indicators, multiple causes (MIMIC) model and MIMIC-interaction to a…
Descriptors: Classification, Accuracy, Error of Measurement, Correlation
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Chung, Seungwon; Houts, Carrie – Measurement: Interdisciplinary Research and Perspectives, 2020
Advanced modeling of item response data through the item response theory (IRT) or item factor analysis frameworks is becoming increasingly popular. In the social and behavioral sciences, the underlying structure of tests/assessments is often multidimensional (i.e., more than 1 latent variable/construct is represented in the items). This review…
Descriptors: Item Response Theory, Evaluation Methods, Models, Factor Analysis
Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020
The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…
Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis
Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics
Andrich, David – Educational and Psychological Measurement, 2013
Assessments in response formats with ordered categories are ubiquitous in the social and health sciences. Although the assumption that the ordering of the categories is working as intended is central to any interpretation that arises from such assessments, testing that this assumption is valid is not standard in psychometrics. This is surprising…
Descriptors: Item Response Theory, Classification, Statistical Analysis, Models
Dewhurst, Stephen A.; Howe, Mark L.; Berry, Donna M.; Knott, Lauren M. – Journal of Experimental Child Psychology, 2012
The effect of test-induced priming on false recognition was investigated in children aged 5, 7, 9, and 11 years using lists of semantic associates, category exemplars, and phonological associates. In line with effects previously observed in adults, nine- and eleven-year-olds showed increased levels of false recognition when critical lures were…
Descriptors: Priming, Semantics, Classification, Semiotics
Keller, Lisa A.; Keller, Robert R.; Parker, Pauline A. – Journal of Experimental Education, 2011
This study investigates the comparability of two item response theory based equating methods: true score equating (TSE), and estimated true equating (ETE). Additionally, six scaling methods were implemented within each equating method: mean-sigma, mean-mean, two versions of fixed common item parameter, Stocking and Lord, and Haebara. Empirical…
Descriptors: Scaling, Program Effectiveness, Classification, True Scores
Kubinger, Klaus D.; Rasch, Dieter; Yanagida, Takuya – Educational Research and Evaluation, 2011
Though calibration of an achievement test within psychological and educational context is very often carried out by the Rasch model, data sampling is hardly designed according to statistical foundations. However, Kubinger, Rasch, and Yanagida (2009) recently suggested an approach for the determination of sample size according to a given Type I and…
Descriptors: Sample Size, Simulation, Testing, Achievement Tests
Henson, Robert; Roussos, Louis; Douglas, Jeff; He, Xuming – Applied Psychological Measurement, 2008
Cognitive diagnostic models (CDMs) model the probability of correctly answering an item as a function of an examinee's attribute mastery pattern. Because estimation of the mastery pattern involves more than a continuous measure of ability, reliability concepts introduced by classical test theory and item response theory do not apply. The cognitive…
Descriptors: Diagnostic Tests, Classification, Probability, Item Response Theory
Previous Page | Next Page »
Pages: 1 | 2