ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	4

Descriptor

Error of Measurement	11
Probability	11
Test Reliability	11
Item Analysis	4
Equated Scores	3
Mathematical Models	3
Statistical Analysis	3
Comparative Analysis	2
Criterion Referenced Tests	2
Elementary Education	2
Goodness of Fit	2
Guessing (Tests)	2
Measurement	2
Measurement Techniques	2
Multiple Choice Tests	2
Raw Scores	2
Reading Comprehension	2
Reading Tests	2
Sample Size	2
Sampling	2
Scores	2
Standardized Tests	2
Test Construction	2
Test Items	2
Test Validity	2
More ▼

Source

Educational and Psychological…	2
Applied Measurement in…	1
Educ Psychol Meas	1
Journal of Educational…	1
Oxford Review of Education	1
Practical Assessment,…	1

Author

Bashaw, W. L.	2
Rentz, R. Robert	2
Zimmerman, Donald W.	2
Dawis, Rene V.	1
Ellis, Jules L.	1
Frary, Robert B.	1
Hutchison, Dougal	1
Livingston, Samuel A.	1
Metsämuuronen, Jari	1
Phillips, Gary W.	1
Whitely, Susan E.	1
van der Linden, Wim J.	1
More ▼

Publication Type

Reports - Research	7
Journal Articles	5
Numerical/Quantitative Data	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 11 results Save | Export

How to Obtain the Most Error-Free Estimate of Reliability? Eight Sources of Deflation in the Estimates of Reliability to Avoid

Peer reviewed
PDF on ERIC

Download full text

Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022

The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…

Descriptors: Test Reliability, Scores, Test Items, Correlation

A Simple Model to Determine the Efficient Duration of Exams

Peer reviewed

Direct link

Ellis, Jules L. – Educational and Psychological Measurement, 2021

This study develops a theoretical model for the costs of an exam as a function of its duration. Two kind of costs are distinguished: (1) the costs of measurement errors and (2) the costs of the measurement. Both costs are expressed in time of the student. Based on a classical test theory model, enriched with assumptions on the context, the costs…

Descriptors: Test Length, Models, Error of Measurement, Measurement

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

On the Conceptualisation of Measurement Error

Peer reviewed

Direct link

Hutchison, Dougal – Oxford Review of Education, 2008

There is a degree of instability in any measurement, so that if it is repeated, it is possible that a different result may be obtained. Such instability, generally described as "measurement error", may affect the conclusions drawn from an investigation, and methods exist for allowing it. It is less widely known that different disciplines, and…

Descriptors: Measurement Techniques, Data Analysis, Error of Measurement, Test Reliability

The Nature of Objectivity with the Rasch Model

Peer reviewed

Whitely, Susan E.; Dawis, Rene V. – Journal of Educational Measurement, 1974

Descriptors: Error of Measurement, Item Analysis, Matrices, Measurement Techniques

Variability of Deviation IQ's Based on Multiple-Choice Test Scores.

Peer reviewed

Zimmerman, Donald W. – Educational and Psychological Measurement, 1985

A computer program simulated guessing on multiple-choice test items and calculated deviation IQ's from observed scores which contained a guessing component. Extensive variability in deviation IQ's due entirely to chance was found. (Author/LMO)

Descriptors: Computer Simulation, Error of Measurement, Guessing (Tests), Intelligence Quotient

The Criterion-Referenced Reliability of a Single Score. Report 76-01.

Livingston, Samuel A. – 1976

A distinction is made between reliability of measurement and reliability of classification; the "criterion-referenced reliability coefficient" describes the former. Application of this coefficient to the probability distribution of possible scores for a single student yields a meaningful way to describe the reliability of a single score. (Author)

Descriptors: Classification, Criterion Referenced Tests, Error of Measurement, Measurement

Effect of Variation in Probability of Guessing Correctly on Reliability of Multiple-Choice Tests

Frary, Robert B.; Zimmerman, Donald W. – Educ Psychol Meas, 1970

Descriptors: Error of Measurement, Guessing (Tests), Multiple Choice Tests, Probability

Assessing Inconsistencies in Standard Setting with the Angoff or Nedelsky Technique.

Download full text

van der Linden, Wim J. – 1982

A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The…

Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Interrater Reliability

Equating Reading Tests With the Rasch Model. Volume I, Final Report.

Download full text

Rentz, R. Robert; Bashaw, W. L. – 1975

In order to determine if Rasch Model procedures have any utility for equating pre-existing tests, this study reanalyzed the data from the equating phase of the Anchor Test Study which used a variety of equipercentile and linear model methods. The tests involved included seven reading test batteries, each having from one to three levels and two…

Descriptors: Comparative Analysis, Elementary Education, Equated Scores, Error of Measurement

Equating Reading Tests With the Rasch Model. Volume II, Technical Reference Tables.

Download full text

Rentz, R. Robert; Bashaw, W. L. – 1975

This volume contains tables of item analysis results obtained by following procedures associated with the Rasch Model for those reading tests used in the Anchor Test Study. Appendix I gives the test names and their corresponding analysis code numbers. Section I (Basic Item Analyses) presents data for the item analysis of each test in a two part…

Descriptors: Comparative Analysis, Elementary Education, Equated Scores, Error of Measurement