ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Descriptor

Error of Measurement	14
Raw Scores	14
Test Reliability	14
Statistical Analysis	7
Mathematical Models	6
True Scores	6
Comparative Analysis	4
Goodness of Fit	4
Item Response Theory	4
Scoring	4
Standardized Tests	4
Test Validity	4
Elementary Education	3
Measures (Individuals)	3
Scores	3
Test Construction	3
Test Interpretation	3
Test Items	3
Testing Problems	3
Academic Standards	2
Accuracy	2
Achievement Gains	2
Computation	2
Cutting Scores	2
English	2
More ▼

Source

Educational Measurement:…	2
New Mexico Public Education…	2
Psychometrika	2
ETS Research Report Series	1
Educational and Psychological…	1

Author

Bashaw, W. L.	2
Rentz, R. Robert	2
Ackerman, Terry A.	1
Allison, Paul A.	1
Barker, Pierce	1
Cureton, Edward E.	1
Evans, John A.	1
Griph, Gerald W.	1
Harvill, Leo M.	1
Kim, Sooyeon	1
Kolen, Michael J.	1
Livingston, Samuel A.	1
Marston, Paul T., Borich,…	1
Novick, Melvin R.	1
Pelavin, Sol H.	1
Perry, Dallis	1
Tong, Ye	1
More ▼

Publication Type

Reports - Research	4
Journal Articles	3
Numerical/Quantitative Data	3
Reports - Descriptive	3
Reports - Evaluative	2
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education

Audience

Location

New Mexico

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Iowa Tests of Basic Skills	1
Iowa Tests of Educational…	1
Metropolitan Achievement Tests	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing

Scaling: An Items Module

Peer reviewed

Direct link

Tong, Ye; Kolen, Michael J. – Educational Measurement: Issues and Practice, 2010

"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…

Descriptors: Test Results, Scaling, Measures (Individuals), Raw Scores

A Simple Proof of the Spearman-Brown Formula for Continuous Test Lengths

Peer reviewed

Allison, Paul A. – Psychometrika, 1976

A direct proof is given for the generalized Spearman-Brown formula for any real multiple of test length. (Author)

Descriptors: Correlation, Error of Measurement, Raw Scores, Test Length

The Stability Coefficient

Peer reviewed

Cureton, Edward E. – Educational and Psychological Measurement, 1971

A derivation of a formula for the stability coefficient is presented and discussed in terms of test reliability over time. (PR)

Descriptors: Error of Measurement, Raw Scores, Statistical Analysis, Test Reliability

Bayesian Inference and the Classical Test Theory Model: Reliability and True Scores

Peer reviewed

Novick, Melvin R.; And Others – Psychometrika, 1971

Descriptors: Analysis of Variance, Bayesian Statistics, Error of Measurement, Mathematical Models

NCME Instructional Module: Standard Error of Measurement.

Peer reviewed

Harvill, Leo M. – Educational Measurement: Issues and Practice, 1991

This paper discusses standard error of measurement (SEM), the amount of variation or spread in the measurement errors for a test, and gives information needed to interpret test scores using SEMs. SEMs at various score levels should be used in calculating score bands rather than a single SEM value. (SLD)

Descriptors: Definitions, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)

An Investigation of the Relationship between Reliability, Power, and the Type I Error Rate of the Mantel-Haenszel and Simultaneous Item Bias Detection Procedures.

Download full text

Ackerman, Terry A.; Evans, John A. – 1992

The relationship between levels of reliability and the power of two bias and differential item functioning (DIF) detection methods is examined. Both methods, the Mantel-Haenszel (MH) procedure of P. W. Holland and D. T. Thayer (1988) and the Simultaneous Item Bias (SIB) procedure of R. Shealy and W. Stout (1991), use examinees' raw scores as a…

Descriptors: Comparative Analysis, Equations (Mathematics), Error of Measurement, Item Bias

New Mexico Standards-Based Assessment Technical Report: Spring 2007 Administration

Download full text

New Mexico Public Education Department, 2007

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…

Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Analysis of Covariance: Is It the Appropriate Model to Study Change?

Download full text

Marston, Paul T., Borich, Gary D. – 1977

The four main approaches to measuring treatment effects in schools; raw gain, residual gain, covariance, and true scores; were compared. A simulation study showed true score analysis produced a large number of Type-I errors. When corrected for this error, this method showed the least power of the four. This outcome was clearly the result of the…

Descriptors: Achievement Gains, Analysis of Covariance, Comparative Analysis, Error of Measurement

Equating Reading Tests With the Rasch Model. Volume I, Final Report.

Download full text

Rentz, R. Robert; Bashaw, W. L. – 1975

In order to determine if Rasch Model procedures have any utility for equating pre-existing tests, this study reanalyzed the data from the equating phase of the Anchor Test Study which used a variety of equipercentile and linear model methods. The tests involved included seven reading test batteries, each having from one to three levels and two…

Descriptors: Comparative Analysis, Elementary Education, Equated Scores, Error of Measurement

Equating Reading Tests With the Rasch Model. Volume II, Technical Reference Tables.

Download full text

Rentz, R. Robert; Bashaw, W. L. – 1975

This volume contains tables of item analysis results obtained by following procedures associated with the Rasch Model for those reading tests used in the Anchor Test Study. Appendix I gives the test names and their corresponding analysis code numbers. Section I (Basic Item Analyses) presents data for the item analysis of each test in a two part…

Descriptors: Comparative Analysis, Elementary Education, Equated Scores, Error of Measurement

Issues of Reliability and Directional Bias in Standardized Achievement Tests: The Case of Mat70. P-5689.

Download full text

Barker, Pierce; Pelavin, Sol H. – 1976

This study was mounted to assess the validity of standard score transformations of raw test scores and test bias on the 1970 edition of the Metropolitan Achievement Test Battery, in the context of a controversial federally funded compensatory education program, the Educational Voucher Demonstration (EVD). On an individual level the validity of the…

Descriptors: Achievement Gains, Achievement Tests, Educationally Disadvantaged, Elementary Education

Interpreting Standardized Test Scores.

Download full text

Perry, Dallis – 1971

Principles of test administration, test validity, and accuracy of measurement underlying interpretation of standardized test scores in educational administration, instruction, and guidance are presented. Types of norm-referenced score transformations, including percentiles, standard scores, and grade equivalents, and of criterion referenced…

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation, Expectancy Tables

New Mexico Standards Based Assessment (NMSBA) Technical Report: 2006 Spring Administration

Download full text

Griph, Gerald W. – New Mexico Public Education Department, 2006

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…

Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring