ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	6

Descriptor

Criterion Referenced Tests	49
Error of Measurement	49
Test Reliability	30
Test Construction	19
Cutting Scores	16
Norm Referenced Tests	15
Test Interpretation	13
Scores	11
Test Validity	11
Item Analysis	9
True Scores	9
Mastery Tests	8
Test Theory	8
Higher Education	7
Mathematical Models	7
Statistical Analysis	7
Scoring	6
Standard Setting (Scoring)	6
Test Items	6
Achievement Tests	5
Comparative Analysis	5
Decision Making	5
Elementary Secondary Education	5
Generalizability Theory	5
Measurement	5
More ▼

Source

Journal of Educational…	7
American Psychologist	2
Psychometrika	2
Applied Measurement in…	1
Educational Measurement:…	1
Evaluation Review	1
Evaluation and the Health…	1
GED Testing Service	1
Journal of Early Adolescence	1
Language Assessment Quarterly	1
Practical Assessment,…	1
Review of Educational Research	1
More ▼

Publication Type

Reports - Research	24
Journal Articles	13
Speeches/Meeting Papers	12
Reports - Evaluative	5
Reports - Descriptive	3
Information Analyses	2
Opinion Papers	2
Guides - Non-Classroom	1
Tests/Questionnaires	1

Education Level

Higher Education	2
Postsecondary Education	2
High Schools	1

Audience

Researchers

Location

Australia	1
Japan	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Alabama High School…	1
College Level Academic Skills…	1
General Educational…	1
Lexile Scale of Reading	1
Texas Assessment of Academic…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 49 results Save | Export

Analyzing Complete Generalizability Theory Designs Using Structural Equation Models

Peer reviewed

Direct link

Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023

We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…

Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement

Generalizability Theory as a Unifying Framework of Measurement Reliability in Adolescent Research

Peer reviewed

Direct link

Fan, Xitao; Sun, Shaojing – Journal of Early Adolescence, 2014

In adolescence research, the treatment of measurement reliability is often fragmented, and it is not always clear how different reliability coefficients are related. We show that generalizability theory (G-theory) is a comprehensive framework of measurement reliability, encompassing all other reliability methods (e.g., Pearson "r,"…

Descriptors: Generalizability Theory, Measurement, Reliability, Correlation

Guessing and the Rasch Model

Peer reviewed

Direct link

Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests

The Absence of Underprediction Does Not Imply the Absence of Measurement Bias

Peer reviewed

Direct link

Wicherts, Jelte M.; Millsap, Roger E. – American Psychologist, 2009

Sacked, Borne man, and Connelly recently discussed several criticisms that are often raised against the use of cognitive tests in selection. One criticism concerns the issue of measurement bias in cognitive ability tests with respect to specific groups in society. Sacked et AL. (2008) stated that "absent additional information, one cannot…

Descriptors: Prediction, Cognitive Tests, Cognitive Ability, Statistical Bias

Responses to Issues Raised about Validity, Bias, and Fairness in High-Stakes Testing

Peer reviewed

Direct link

Sackett, Paul R.; Borneman, Matthew J.; Connelly, Brian S. – American Psychologist, 2009

We are pleased that our article prompted this series of four commentaries and that we have this opportunity to respond. We address each in turn. Duckworth and Kaufman and Agars discussed, respectively, two broad issues concerning the validity of selection systems, namely, the expansion of the predictor domain to include noncognitive predictors of…

Descriptors: High Stakes Tests, Reader Response, Error of Measurement, Test Bias

Reliability and Validity Evidence for the GED[R] English as a Second Language Test. GED Testing Service[R] Research Studies, 2009-4

Download full text

Setzer, J. Carl – GED Testing Service, 2009

The GED[R] English as a Second Language (GED ESL) Test was designed to serve as an adjunct to the GED test battery when an examinee takes either the Spanish- or French-language version of the tests. The GED ESL Test is a criterion-referenced, multiple-choice instrument that assesses the functional, English reading skills of adults whose first…

Descriptors: Language Tests, High School Equivalency Programs, Psychometrics, Reading Skills

An Investigation of Full-And Subscale Reliabilities of Criterion-Referenced Tests.

Download full text

Haladyna, Thomas M. – 1974

Classical test theory has been rejected for application to criterion-referenced (CR) tests by most psychometricians due to an expected lack of variance in scores and other difficulties. The present study was conceived to resolve the variance problem and explore the possibility that classical test theory is both appropriate and desirable for some…

Descriptors: Criterion Referenced Tests, Error of Measurement, Sampling, Test Construction

Consequences of (Mis)use of the Texas Assessment of Academic Skills (TAAS) for High-Stakes Decisions: A Comment on Haney and the Texas Miracle in Education.

Peer reviewed

Kellow, J. Thomas; Willson, Victor L. – Practical Assessment, Research & Evaluation, 2001

Explores the consequence of failing to incorporate measurement error in the development of cut scores in criterion-referenced measures, using the example of Texas and the Texas Assessment of Academic Skills to illustrate the impact of measurement error on false negative decisions. Findings support those of W. Haney (2000). (SLD)

Descriptors: Criterion Referenced Tests, Cutting Scores, Decision Making, Error of Measurement

The Role of Reliability in Criterion-Referenced Tests.

Peer reviewed

Kane, Michael T. – Journal of Educational Measurement, 1986

These analyses suggest that if a criterion-referenced test had a reliability (defined in terms of internal consistency) below 0.5, a simple a priori procedure would provide better estimates of students' universe scores than would individual observed scores. (Author/LMO)

Descriptors: Criterion Referenced Tests, Educational Research, Error of Measurement, Generalizability Theory

A Bayesian Procedure for Mastery Decisions Based on Multivariate Normal Test Data.

Peer reviewed

Huynh, Huynh – Psychometrika, 1982

A Bayesian framework for making mastery/nonmastery decisions based on multivariate test data is described. Overall, mastery is granted if the posterion expected loss associated with such action is smaller than the one incurred by denying mastery. (Author/JKS)

Descriptors: Bayesian Statistics, Criterion Referenced Tests, Cutting Scores, Error of Measurement

Passing Scores and Test Lengths for Domain-Referenced Measures.

Download full text

Millman, Jason – 1972

Procedures for establishing standards and determining the number of items needed in criterion-referenced measures are reviewed. The discussion of setting a passing score is organized around five factors: performance of others, item content, educational consequences, psychological and financial costs, and measurement error. Classical test theory,…

Descriptors: Academic Achievement, Criterion Referenced Tests, Error of Measurement, Models

The Use of Aggregate Scoring for a Recertifying Examination.

Peer reviewed

Norcini, John J.; And Others – Evaluation and the Health Professions, 1990

Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)

Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory

The Criterion-Referenced Reliability of a Single Score. Report 76-01.

Livingston, Samuel A. – 1976

A distinction is made between reliability of measurement and reliability of classification; the "criterion-referenced reliability coefficient" describes the former. Application of this coefficient to the probability distribution of possible scores for a single student yields a meaningful way to describe the reliability of a single score. (Author)

Descriptors: Classification, Criterion Referenced Tests, Error of Measurement, Measurement

Errors of Measurement and Standard Setting in Mastery Testing.

Kane, Michael; Wilson, Jennifer – 1982

This paper evaluates the magnitude of the total error in estimates of the difference between an examinee's domain score and the cutoff score. An observed score based on a random sample of items from the domain, and an estimated cutoff score derived from a judgmental standard setting procedure are assumed. The work of Brennan and Lockwood (1980) is…

Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Mastery Tests

A Monte Carlo Comparison of Phi and Kappa as Measures of Criterion-Referenced Reliability.

Reid, Jerry B.; Roberts, Dennis M. – 1978

Comparisons of corresponding values of phi and kappa coefficients were made for 270 instances of data generated by a Monte Carlo technique to simulate a test-retest situation. Data were generated for distributions with the same mean but three different levels of standard deviation, standard error of measurement and cutting score. Ten samples of…

Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Haladyna, Tom	4
Brennan, Robert L.	3
Livingston, Samuel A.	3
Roid, Gale	3
Haladyna, Thomas M.	2
Harris, Chester W.	2
Kane, Michael T.	2
Schaeffer, Gary A.	2
Bateman, Andrea	1
Belcher, Marcia	1
Berk, Ronald A.	1
Borneman, Matthew J.	1
Bramble, William	1
Bridgeman, Brent	1
Burdick, Donald S.	1
Busch, John Christian	1
Cangelosi, James S.	1
Connelly, Brian S.	1
Divgi, D. R.	1
Emrick, John A.	1
Fan, Xitao	1
Gillis, Shelley	1
Holster, Trevor A.	1
Huynh, Huynh	1
More ▼