ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Descriptor

Error of Measurement	14
Test Reliability	6
True Scores	6
Reliability	5
Statistical Analysis	4
Criterion Referenced Tests	3
Cutting Scores	3
Equated Scores	3
Estimation (Mathematics)	3
Mathematical Formulas	3
Measurement	3
Scores	3
Scoring	3
Test Construction	3
Test Interpretation	3
Test Items	3
Test Results	3
Accuracy	2
Classification	2
Cluster Grouping	2
Comparative Analysis	2
Decision Making	2
Efficiency	2
Mastery Tests	2
Norm Referenced Tests	2
More ▼

Source

ETS Research Report Series	4
Journal of Educational…	4

Author

Livingston, Samuel A.	14
Kim, Sooyeon	2
Casabianca, Jodi	1
Grant, Mary C.	1
Holland, Paul W.	1
Lewis, Charles	1
Martin, Kathleen	1
Stanley, Julian C.	1
Wingersky, Marilyn A.	1
von Davier, Alina A.	1

Publication Type

Reports - Research	7
Journal Articles	6
Reports - Evaluative	2
Speeches/Meeting Papers	2
Guides - Non-Classroom	1
Numerical/Quantitative Data	1

Education Level

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing

Demographically Adjusted Groups for Equating Test Scores. Research Report. ETS RR-14-30

Peer reviewed
PDF on ERIC

Download full text

Livingston, Samuel A. – ETS Research Report Series, 2014

In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…

Descriptors: Demography, Equated Scores, Cluster Grouping, Ability Grouping

Methods of Linking with Small Samples in a Common-Item Design: An Empirical Comparison. Research Report. ETS RR-09-38

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009

A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…

Descriptors: Sampling, Sample Size, Accuracy, Test Items

An Evaluation of the Kernel Equating Method: A Special Study with Pseudotests Constructed from Real Test Data. Research Report. ETS RR-06-02

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A.; Holland, Paul W.; Livingston, Samuel A.; Casabianca, Jodi; Grant, Mary C.; Martin, Kathleen – ETS Research Report Series, 2006

This study examines how closely the kernel equating (KE) method (von Davier, Holland, & Thayer, 2004a) approximates the results of other observed-score equating methods--equipercentile and linear equatings. The study used pseudotests constructed of item responses from a real test to simulate three equating designs: an equivalent groups (EG)…

Descriptors: Equated Scores, Statistical Analysis, Simulation, Tests

Estimation of the Conditional Standard Error of Measurement for Stratified Tests.

Peer reviewed

Livingston, Samuel A. – Journal of Educational Measurement, 1982

For tests used to make pass/fail decisions, the relevant standard error of measurement (SEM) is the SEM at the passing score. If the test is highly stratified, this SEM should be estimated by a split-halves approach. A formula and its derivation are provided. (Author)

Descriptors: Cutting Scores, Error of Measurement, Estimation (Mathematics), Mathematical Formulas

Estimating the Consistency and Accuracy of Classifications Based on Test Scores.

Download full text

Livingston, Samuel A.; Lewis, Charles – 1993

This paper presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including the formation of a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate its effective test length in terms of…

Descriptors: Classification, Error of Measurement, Estimation (Mathematics), Reliability

Estimation of the Conditional Standard Error of Measurement for Stratified Tests.

Download full text

Livingston, Samuel A. – 1981

The standard error of measurement (SEM) is a measure of the inconsistency in the scores of a particular group of test-takers. It is largest for test-takers with scores ranging in the 50 percent correct bracket; with nearly perfect scores, it is smaller. On tests used to make pass/fail decisions, the test-takers' scores tend to cluster in the range…

Descriptors: Error of Measurement, Estimation (Mathematics), Mathematical Formulas, Pass Fail Grading

The Criterion-Referenced Reliability of a Single Score. Report 76-01.

Livingston, Samuel A. – 1976

A distinction is made between reliability of measurement and reliability of classification; the "criterion-referenced reliability coefficient" describes the former. Application of this coefficient to the probability distribution of possible scores for a single student yields a meaningful way to describe the reliability of a single score. (Author)

Descriptors: Classification, Criterion Referenced Tests, Error of Measurement, Measurement

Reliability of Tests Used to Make Pass/Fail Decisions: Answering the Right Questions.

Download full text

Livingston, Samuel A. – 1978

The traditional reliability coefficient and standard error of measurement are not adequate measures of reliability for tests used to make pass/fail decisions. Answering the important reliability questions requires estimation of the joint distribution of true and observed scores. Lord's "Method 20" estimates this distribution without the…

Descriptors: Cutting Scores, Decision Making, Efficiency, Error of Measurement

Correcting Four Similar Correlational Measures for Attenuation Due to Errors of Measurement in the Dependent Variable: Eta, Epsilon, Omega, and Intraclass r.

Download full text

Stanley, Julian C.; Livingston, Samuel A. – 1971

Besides the ubiquitous Pearson product-moment r, there are a number of other measures of relationship that are attenuated by errors of measurement and for which the relationship between true measures can be estimated. Among these are the correlation ratio (eta squared), Kelley's unbiased correlation ratio (epsilon squared), Hays' omega squared,…

Descriptors: Analysis of Variance, Cluster Grouping, Correlation, Data Analysis

A Reply to Harris's "An Interpretation of Livingston's Reliability Coefficient for Criterion-Referenced Tests"

Peer reviewed

Livingston, Samuel A. – Journal of Educational Measurement, 1972

This article is a reply to a previous paper (see TM 500 488) interpreting Livingston's original article (see TM 500 487). (CK)

Descriptors: Criterion Referenced Tests, Error of Measurement, Norm Referenced Tests, Test Construction

Assessing the Reliability of Tests Used to Make Pass/Fail Decisions.

Peer reviewed

Livingston, Samuel A.; Wingersky, Marilyn A. – Journal of Educational Measurement, 1979

Procedures are described for studying the reliability of decisions based on specific passing scores with tests made up of discrete items and designed to measure continuous rather than categorical traits. These procedures are based on the estimation of the joint distribution of true scores and observed scores. (CTM)

Descriptors: Cutting Scores, Decision Making, Efficiency, Error of Measurement

Criterion-Referenced Applications of Classical Test Theory

Peer reviewed

Livingston, Samuel A. – Journal of Educational Measurement, 1972

A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. The coefficient is based on deviations of scores from the criterion score, rather than from the mean. (Author/CK)

Descriptors: Criterion Referenced Tests, Error of Measurement, Mathematical Applications, Norm Referenced Tests

Adjusting Scores on Examinations Offering a Choice of Questions.

Download full text

Livingston, Samuel A. – 1986

This paper deals with test fairness regarding a test consisting of two parts: (1) a "common" section, taken by all students; and (2) a "variable" section, in which some students may answer a different set of questions from other students. For example, a test taken by several thousand students each year contains a common multiple-choice portion and…

Descriptors: Difficulty Level, Error of Measurement, Essay Tests, Mathematical Models