ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	19

Descriptor

Error of Measurement	27
Item Response Theory	27
Reliability	27
Scores	8
Computation	7
Classification	6
Comparative Analysis	6
Psychometrics	5
Computer Assisted Testing	4
Scaling	4
Test Items	4
Accuracy	3
Correlation	3
Evaluation	3
Factor Analysis	3
Foreign Countries	3
Mathematics Tests	3
True Scores	3
Ability	2
Adaptive Testing	2
Adults	2
Difficulty Level	2
Elementary School Students	2
Estimation (Mathematics)	2
Goodness of Fit	2
More ▼

Source

Educational and Psychological…	4
Journal of Educational…	4
Applied Psychological…	3
International Journal of…	2
Advances in Health Sciences…	1
Assessment	1
CALICO Journal	1
ETS Research Report Series	1
Educational Research	1
International Journal of…	1
Journal of Educational and…	1
Measurement and Evaluation in…	1
Mid-Western Educational…	1
Psychological Methods	1
Research Papers in Education	1
More ▼

Publication Type

Journal Articles	24
Reports - Research	12
Reports - Evaluative	8
Reports - Descriptive	6
Speeches/Meeting Papers	4
Book/Product Reviews	1

Education Level

Higher Education	3
Elementary Education	1
Postsecondary Education	1

Audience

Location

United Kingdom (England)	2
China	1

Laws, Policies, & Programs

Assessments and Surveys

Work Keys (ACT)	2
ACT Assessment	1
Iowa Tests of Basic Skills	1
Wechsler Preschool and…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 27 results Save | Export

Using Regularization to Identify Measurement Bias across Multiple Background Characteristics: A Penalized Expectation-Maximization Algorithm

Peer reviewed

Direct link

William C. M. Belzak; Daniel J. Bauer – Journal of Educational and Behavioral Statistics, 2024

Testing for differential item functioning (DIF) has undergone rapid statistical developments recently. Moderated nonlinear factor analysis (MNLFA) allows for simultaneous testing of DIF among multiple categorical and continuous covariates (e.g., sex, age, ethnicity, etc.), and regularization has shown promising results for identifying DIF among…

Descriptors: Test Bias, Algorithms, Factor Analysis, Error of Measurement

Accounting for Standard Errors of Measurement When Modeling Change

Peer reviewed

Direct link

Grimm, Kevin J.; Fine, Kimberly; Stegmann, Gabriela – International Journal of Behavioral Development, 2021

Modeling within-person change over time and between-person differences in change over time is a primary goal in prevention science. When modeling change in an observed score over time with multilevel or structural equation modeling approaches, each observed score counts toward the estimation of model parameters equally. However, observed scores…

Descriptors: Error of Measurement, Weighted Scores, Accuracy, Item Response Theory

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2013

A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…

Descriptors: Item Response Theory, Reliability, Scores, Error of Measurement

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Conditional Standard Errors of Measurement for Composite Scores Using IRT

Peer reviewed

Direct link

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan – International Journal of Testing, 2012

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…

Descriptors: Mathematics Tests, Achievement Tests, College Entrance Examinations, Error of Measurement

The Assessment of Reliability Under Range Restriction: A Comparison of [Alpha], [Omega], and Test-Retest Reliability for Dichotomous Data

Peer reviewed

Direct link

Fife, Dustin A.; Mendoza, Jorge L.; Terry, Robert – Educational and Psychological Measurement, 2012

Though much research and attention has been directed at assessing the correlation coefficient under range restriction, the assessment of reliability under range restriction has been largely ignored. This article uses item response theory to simulate dichotomous item-level data to assess the robustness of KR-20 ([alpha]), [omega], and test-retest…

Descriptors: Reliability, Computation, Comparative Analysis, Item Response Theory

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

Contemporary Treatment of Reliability and Validity in Educational Assessment

Peer reviewed

Direct link

Dimitrov, Dimiter M. – Mid-Western Educational Researcher, 2010

The focus of this presidential address is on the contemporary treatment of reliability and validity in educational assessment. Highlights on reliability are provided under the classical true-score model using tools from latent trait modeling to clarify important assumptions and procedures for reliability estimation. In addition to reliability,…

Descriptors: Educational Assessment, Validity, Item Response Theory, Reliability

The Value of Item Response Theory in Clinical Assessment: A Review

Peer reviewed

Direct link

Thomas, Michael L. – Assessment, 2011

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

Descriptors: Item Response Theory, Psychological Evaluation, Reliability, Error of Measurement

A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

Peer reviewed

Direct link

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M. – International Journal of Testing, 2010

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…

Descriptors: Monte Carlo Methods, Simulation, Computer Assisted Testing, Adaptive Testing

Reliability of Scaled Scores. Research Report. ETS RR-08-70

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2008

The reliability of a scaled score can be computed by use of item response theory. Estimated reliability can be obtained even if the item response model selected is not valid.

Descriptors: Reliability, Scores, Item Response Theory, Computation

A Response to an Article Published in "Educational Research"'s Special Issue on Assessment (June 2009). What Can Be Inferred about Classification Accuracy from Classification Consistency?

Peer reviewed

Direct link

Bramley, Tom – Educational Research, 2010

Background: A recent article published in "Educational Research" on the reliability of results in National Curriculum testing in England (Newton, "The reliability of results from national curriculum testing in England," "Educational Research" 51, no. 2: 181-212, 2009) suggested that: (1) classification accuracy can be…

Descriptors: National Curriculum, Educational Research, Testing, Measurement

Standardized Conditional "SEM": A Case for Conditional Reliability

Peer reviewed

Direct link

Raju, Nambury S.; Price, Larry R.; Oshima, T. C.; Nering, Michael L. – Applied Psychological Measurement, 2007

An examinee-level (or conditional) reliability is proposed for use in both classical test theory (CTT) and item response theory (IRT). The well-known group-level reliability is shown to be the average of conditional reliabilities of examinees in a group or a population. This relationship is similar to the known relationship between the square of…

Descriptors: Item Response Theory, Error of Measurement, Reliability, Test Theory

Evaluation of Weighted Scale Reliability and Criterion Validity: A Latent Variable Modeling Approach

Peer reviewed

Direct link

Raykov, Tenko – Measurement and Evaluation in Counseling and Development, 2007

A method is outlined for evaluating the reliability and criterion validity of weighted scales based on sets of unidimensional measures. The approach is developed within the framework of latent variable modeling methodology and is useful for point and interval estimation of these measurement quality coefficients in counseling and education…

Descriptors: Predictive Validity, Computation, Reliability, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2

Kolen, Michael J.	3
Wang, Tianyou	3
Lee, Won-Chan	2
Anwyll, Steve	1
Bergstrom, Betty A.	1
Bramley, Tom	1
Camilli, Gregory	1
Choi, Jiwon	1
Cohen-Charash, Yochi	1
Culpepper, Steven Andrew	1
Cunning, Leslie	1
Daniel J. Bauer	1
Dimitrov, Dimiter M.	1
Doran, Harold C.	1
Emons, Wilco H. M.	1
Fife, Dustin A.	1
Fine, Kimberly	1
Glanville, Matthew	1
Grimm, Kevin J.	1
Haberman, Shelby J.	1
Hanson, Bradley A.	1
Harasym, Peter H.	1
Harris, Deborah J.	1
He, Qingping	1
More ▼