ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	20

Descriptor

Reliability	57
Error of Measurement	16
Item Response Theory	13
Scores	12
Classification	8
Comparative Analysis	8
Models	8
True Scores	8
Statistical Analysis	7
Computation	6
Correlation	6
Estimation (Mathematics)	6
Higher Education	6
Measurement Techniques	6
Test Items	6
Accuracy	5
Measurement	5
Sampling	5
Academic Achievement	4
Equations (Mathematics)	4
Generalizability Theory	4
Item Analysis	4
Mathematical Models	4
Research Methodology	4
Scaling	4
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	47
Reports - Research	23
Reports - Evaluative	16
Reports - Descriptive	6
Guides - Non-Classroom	2
Information Analyses	2
Book/Product Reviews	1
Numerical/Quantitative Data	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	2
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

SAT (College Admission Test)	3
ACT Assessment	1
Kaufman Assessment Battery…	1
Metropolitan Achievement Tests	1
National Longitudinal Study…	1
United States Medical…	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 57 results Save | Export

A Computationally Simple Method for Estimating Decision Consistency

Peer reviewed

Direct link

Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021

Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…

Descriptors: Decision Making, Reliability, Classification, Scores

Detecting Nonadditivity in Single-Facet Generalizability Theory Applications: Tukey's Test

Peer reviewed

Direct link

Lin, Chih-Kai; Zhang, Jinming – Journal of Educational Measurement, 2018

Under the generalizability-theory (G-theory) framework, the estimation precision of variance components (VCs) is of significant importance in that they serve as the foundation of estimating reliability. Zhang and Lin advanced the discussion of nonadditivity in data from a theoretical perspective and showed the adverse effects of nonadditivity on…

Descriptors: Generalizability Theory, Reliability, Computation, Statistical Analysis

A Comparison of Procedures for Estimating Person Reliability Parameters in the Graded Response Model

Peer reviewed

Direct link

LaHuis, David M.; Bryant-Lees, Kinsey B.; Hakoyama, Shotaro; Barnes, Tyler; Wiemann, Andrea – Journal of Educational Measurement, 2018

Person reliability parameters (PRPs) model temporary changes in individuals' attribute level perceptions when responding to self-report items (higher levels of PRPs represent less fluctuation). PRPs could be useful in measuring careless responding and traitedness. However, it is unclear how well current procedures for estimating PRPs can recover…

Descriptors: Comparative Analysis, Reliability, Error of Measurement, Measurement Techniques

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Development of Information Functions and Indices for the GGUM-RANK Multidimensional Forced Choice IRT Model

Peer reviewed

Direct link

Joo, Seang-Hwane; Lee, Philseok; Stark, Stephen – Journal of Educational Measurement, 2018

This research derived information functions and proposed new scalar information indices to examine the quality of multidimensional forced choice (MFC) items based on the RANK model. We also explored how GGUM-RANK information, latent trait recovery, and reliability varied across three MFC formats: pairs (two response alternatives), triplets (three…

Descriptors: Item Response Theory, Models, Item Analysis, Reliability

IRT-Estimated Reliability for Tests Containing Mixed Item Formats

Peer reviewed

Direct link

Shu, Lianghua; Schwarz, Richard D. – Journal of Educational Measurement, 2014

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…

Descriptors: Item Response Theory, Reliability, Models, Computation

The Accuracy and Consistency of a Series of IRT True Score Equatings

Peer reviewed

Direct link

Li, Deping; Jiang, Yanlin; von Davier, Alina A. – Journal of Educational Measurement, 2012

This study investigates a sequence of item response theory (IRT) true score equatings based on various scale transformation approaches and evaluates equating accuracy and consistency over time. The results show that the biases and sample variances for the IRT true score equating (both direct and indirect) are quite small (except for the mean/sigma…

Descriptors: True Scores, Equated Scores, Item Response Theory, Accuracy

Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua – Journal of Educational Measurement, 2012

This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…

Descriptors: Classification, Accuracy, Cognitive Tests, Diagnostic Tests

Psychometric Equivalence of Ratings for Repeat Examinees on a Performance Assessment for Physician Licensure

Peer reviewed

Direct link

Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012

Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…

Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition

Relationships of Measurement Error and Prediction Error in Observed-Score Regression

Peer reviewed

Direct link

Moses, Tim – Journal of Educational Measurement, 2012

The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…

Descriptors: Error of Measurement, Prediction, Regression (Statistics), True Scores

Validating the Interpretations and Uses of Test Scores

Peer reviewed

Direct link

Kane, Michael T. – Journal of Educational Measurement, 2013

To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…

Descriptors: Test Interpretation, Validity, Scores, Test Use

How Often Do Subscores Have Added Value? Results from Operational and Simulated Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational Measurement, 2010

Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman suggested a method based on classical test theory to determine whether subscores have added value over total scores. In this article I first provide a rich collection of results regarding when subscores were found to have added…

Descriptors: Scores, Test Theory, Simulation, Reliability

Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

Peer reviewed

Direct link

Lee, Won-Chan – Journal of Educational Measurement, 2010

In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…

Descriptors: Classification, Item Response Theory, Comparative Analysis, Models

Statistical Process Control Charts for Measuring and Monitoring Temporal Consistency of Ratings

Peer reviewed

Direct link

Omar, M. Hafidz – Journal of Educational Measurement, 2010

Methods of statistical process control were briefly investigated in the field of educational measurement as early as 1999. However, only the use of a cumulative sum chart was explored. In this article other methods of statistical quality control are introduced and explored. In particular, methods in the form of Shewhart mean and standard deviation…

Descriptors: Charts, Quality Control, Measurement, Test Items

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Kolen, Michael J.	3
Brennan, Robert L.	2
Cui, Ying	2
Gierl, Mark J.	2
Hanson, Bradley A.	2
Lee, Won-Chan	2
Livingston, Samuel A.	2
Smith, Philip L.	2
Zimmerman, Donald W.	2
Barnes, Tyler	1
Bausell, R. Barker	1
Bergan, John R.	1
Betebenner, Damian W.	1
Bryant-Lees, Kinsey B.	1
Camilli, Gregory	1
Cardinet, Jean	1
Caruso, John C.	1
Chang, Hua-Hua	1
Chen, Ping	1
Choi, Jiwon	1
Culpepper, Steven A.	1
Davenport, Ernest C.	1
DeMars, Christine E.	1
Ding, Shuliang	1
More ▼