ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	8
Computation	8
Test Theory	8
Item Response Theory	5
Reliability	4
Scores	4
Scaling	3
Test Items	3
Bias	2
College Entrance Examinations	2
Error of Measurement	2
Methods	2
Monte Carlo Methods	2
Prediction	2
Scoring	2
Accuracy	1
Anxiety	1
Classification	1
Cognitive Tests	1
Computer Software	1
Correlation	1
Educational Research	1
Educational Testing	1
Efficiency	1
Elementary School Students	1
More ▼

Source

Applied Measurement in…	2
Applied Psychological…	2
ACT, Inc.	1
Educational and Psychological…	1
Journal of Educational and…	1
ProQuest LLC	1

Author

Almehrizi, Rashid S.	1
Beretvas, S. Natasha	1
Cui, Zhongmin	1
Culpepper, Steven Andrew	1
Deng, Nina	1
Fang, Yu	1
Haberman, Shelby	1
Larkin, Kevin	1
Murphy, Daniel L.	1
Puhan, Gautam	1
Ramsay, James O.	1
Sinharay, Sandip	1
Stone, Clement A.	1
Traynor, Anne	1
Wiberg, Marie	1
Woodruff, David	1
Xu, Ting	1
More ▼

Publication Type

Journal Articles	6
Reports - Research	4
Reports - Evaluative	2
Dissertations/Theses -…	1
Reports - Descriptive	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Education	2
Early Childhood Education	1
Grade 2	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Location

Colorado	1
Florida	1
New York	1
North Carolina	1
Sweden	1
Tennessee	1
Texas	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing all 8 results Save | Export

A Strategy for Replacing Sum Scoring

Peer reviewed

Direct link

Ramsay, James O.; Wiberg, Marie – Journal of Educational and Behavioral Statistics, 2017

This article promotes the use of modern test theory in testing situations where sum scores for binary responses are now used. It directly compares the efficiencies and biases of classical and modern test analyses and finds an improvement in the root mean squared error of ability estimates of about 5% for two designed multiple-choice tests and…

Descriptors: Scoring, Test Theory, Computation, Maximum Likelihood Statistics

The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2013

A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…

Descriptors: Item Response Theory, Reliability, Scores, Error of Measurement

A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement. ACT Research Report Series, 2013 (7)

Download full text

Woodruff, David; Traynor, Anne; Cui, Zhongmin; Fang, Yu – ACT, Inc., 2013

Professional standards for educational testing recommend that both the overall standard error of measurement and the conditional standard error of measurement (CSEM) be computed on the score scale used to report scores to examinees. Several methods have been developed to compute scale score CSEMs. This paper compares three methods, based on…

Descriptors: Comparative Analysis, Error of Measurement, Scores, Scaling

A Comparison of Teacher Effectiveness Measures Calculated Using Three Multilevel Models for Raters Effects

Peer reviewed

Direct link

Murphy, Daniel L.; Beretvas, S. Natasha – Applied Measurement in Education, 2015

This study examines the use of cross-classified random effects models (CCrem) and cross-classified multiple membership random effects models (CCMMrem) to model rater bias and estimate teacher effectiveness. Effect estimates are compared using CTT versus item response theory (IRT) scaling methods and three models (i.e., conventional multilevel…

Descriptors: Teacher Effectiveness, Comparative Analysis, Hierarchical Linear Modeling, Test Theory

Using IRT Trait Estimates versus Summated Scores in Predicting Outcomes

Peer reviewed

Direct link

Xu, Ting; Stone, Clement A. – Educational and Psychological Measurement, 2012

It has been argued that item response theory trait estimates should be used in analyses rather than number right (NR) or summated scale (SS) scores. Thissen and Orlando postulated that IRT scaling tends to produce trait estimates that are linearly related to the underlying trait being measured. Therefore, IRT trait estimates can be more useful…

Descriptors: Educational Research, Monte Carlo Methods, Measures (Individuals), Item Response Theory

Coefficient Alpha and Reliability of Scale Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Psychological Measurement, 2013

The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (a; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw…

Descriptors: Raw Scores, Scaling, Reliability, Computation

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Peer reviewed

Direct link

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…

Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods

Evaluating IRT- and CTT-Based Methods of Estimating Classification Consistency and Accuracy Indices from Single Administrations

Direct link

Deng, Nina – ProQuest LLC, 2011

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…

Descriptors: Item Response Theory, Test Theory, Computation, Classification