ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	5

Descriptor

Criterion Referenced Tests	13
Error of Measurement	13
Cutting Scores	5
Generalizability Theory	5
Scores	3
College Students	2
Elementary Secondary Education	2
Evaluation Criteria	2
High Stakes Tests	2
Measurement	2
Norm Referenced Tests	2
Reader Response	2
Standard Setting (Scoring)	2
Statistical Analysis	2
Test Construction	2
Test Reliability	2
Accountability	1
Achievement Tests	1
Adaptive Testing	1
Adolescents	1
Answer Keys	1
Bayesian Statistics	1
Cognitive Ability	1
Cognitive Tests	1
Correlation	1
More ▼

Source

American Psychologist	2
Journal of Educational…	2
Applied Measurement in…	1
Educational Measurement:…	1
Evaluation Review	1
Evaluation and the Health…	1
Journal of Early Adolescence	1
Language Assessment Quarterly	1
Practical Assessment,…	1
Psychometrika	1
Review of Educational Research	1
More ▼

Publication Type

Journal Articles	13
Reports - Research	6
Opinion Papers	2
Reports - Descriptive	2
Reports - Evaluative	2
Information Analyses	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Location

Japan	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Texas Assessment of Academic…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Analyzing Complete Generalizability Theory Designs Using Structural Equation Models

Peer reviewed

Direct link

Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023

We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…

Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement

Generalizability Theory as a Unifying Framework of Measurement Reliability in Adolescent Research

Peer reviewed

Direct link

Fan, Xitao; Sun, Shaojing – Journal of Early Adolescence, 2014

In adolescence research, the treatment of measurement reliability is often fragmented, and it is not always clear how different reliability coefficients are related. We show that generalizability theory (G-theory) is a comprehensive framework of measurement reliability, encompassing all other reliability methods (e.g., Pearson "r,"…

Descriptors: Generalizability Theory, Measurement, Reliability, Correlation

Guessing and the Rasch Model

Peer reviewed

Direct link

Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests

The Absence of Underprediction Does Not Imply the Absence of Measurement Bias

Peer reviewed

Direct link

Wicherts, Jelte M.; Millsap, Roger E. – American Psychologist, 2009

Sacked, Borne man, and Connelly recently discussed several criticisms that are often raised against the use of cognitive tests in selection. One criticism concerns the issue of measurement bias in cognitive ability tests with respect to specific groups in society. Sacked et AL. (2008) stated that "absent additional information, one cannot…

Descriptors: Prediction, Cognitive Tests, Cognitive Ability, Statistical Bias

Responses to Issues Raised about Validity, Bias, and Fairness in High-Stakes Testing

Peer reviewed

Direct link

Sackett, Paul R.; Borneman, Matthew J.; Connelly, Brian S. – American Psychologist, 2009

We are pleased that our article prompted this series of four commentaries and that we have this opportunity to respond. We address each in turn. Duckworth and Kaufman and Agars discussed, respectively, two broad issues concerning the validity of selection systems, namely, the expansion of the predictor domain to include noncognitive predictors of…

Descriptors: High Stakes Tests, Reader Response, Error of Measurement, Test Bias

Consequences of (Mis)use of the Texas Assessment of Academic Skills (TAAS) for High-Stakes Decisions: A Comment on Haney and the Texas Miracle in Education.

Peer reviewed

Kellow, J. Thomas; Willson, Victor L. – Practical Assessment, Research & Evaluation, 2001

Explores the consequence of failing to incorporate measurement error in the development of cut scores in criterion-referenced measures, using the example of Texas and the Texas Assessment of Academic Skills to illustrate the impact of measurement error on false negative decisions. Findings support those of W. Haney (2000). (SLD)

Descriptors: Criterion Referenced Tests, Cutting Scores, Decision Making, Error of Measurement

The Role of Reliability in Criterion-Referenced Tests.

Peer reviewed

Kane, Michael T. – Journal of Educational Measurement, 1986

These analyses suggest that if a criterion-referenced test had a reliability (defined in terms of internal consistency) below 0.5, a simple a priori procedure would provide better estimates of students' universe scores than would individual observed scores. (Author/LMO)

Descriptors: Criterion Referenced Tests, Educational Research, Error of Measurement, Generalizability Theory

A Bayesian Procedure for Mastery Decisions Based on Multivariate Normal Test Data.

Peer reviewed

Huynh, Huynh – Psychometrika, 1982

A Bayesian framework for making mastery/nonmastery decisions based on multivariate test data is described. Overall, mastery is granted if the posterion expected loss associated with such action is smaller than the one incurred by denying mastery. (Author/JKS)

Descriptors: Bayesian Statistics, Criterion Referenced Tests, Cutting Scores, Error of Measurement

The Use of Aggregate Scoring for a Recertifying Examination.

Peer reviewed

Norcini, John J.; And Others – Evaluation and the Health Professions, 1990

Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)

Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory

Another Answer to the Cut-Off Score Question.

Peer reviewed

Cangelosi, James S. – Educational Measurement: Issues and Practice, 1984

Test development procedures and six methods for determining cut-off scores are briefly described. An alternate method, appropriate when the test developer also determines the cut-off score, is suggested. Unlike other methods, the standard is set during the test development stage. Its computations are intelligible to nonstatistically-oriented…

Descriptors: Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education, Error of Measurement

Assessing the Reliability of Criterion-Referenced Measures Used to Evaluate Health-Education Programs.

Peer reviewed

Schaeffer, Gary A.; And Others – Evaluation Review, 1986

The reliability of criterion-referenced tests (CRTs) used in health program evaluation can be conceptualized in different ways. Formulas are presented for estimating appropriate standard error of measurement (SEM) for CRTs. The SEM can be used in computing confidence intervals for domain score estimates and for a cut-score. (Author/LMO)

Descriptors: Accountability, Criterion Referenced Tests, Cutting Scores, Error of Measurement

A Comparison of Two Approaches to Criterion-Referenced Test Construction.

Peer reviewed

Haladyna, Thomas M.; Roid, Gale H. – Journal of Educational Measurement, 1983

The present study showed that Rasch-based adaptive tests--when item domains were finite and specifiable--had greater precision in domain score estimation than test forms created by random sampling of items. Results were replicated across four data sources representing a variety of criterion-referenced, domain-based tests varying in length.…

Descriptors: Adaptive Testing, Criterion Referenced Tests, Error of Measurement, Estimation (Mathematics)

A Consumer's Guide to Setting Performance Standards on Criterion-Referenced Tests.

Peer reviewed

Berk, Ronald A. – Review of Educational Research, 1986

Thirty-eight methods are presented for either setting standards or adjusting them based on an analysis of classification error rates. A trilevel classification scheme is used to categorize the methods, and 10 criteria of technical adequacy and practicability are proposed to evaluate them. (Author/LMO)

Descriptors: Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education, Error of Measurement

Berk, Ronald A.	1
Borneman, Matthew J.	1
Cangelosi, James S.	1
Connelly, Brian S.	1
Fan, Xitao	1
Haladyna, Thomas M.	1
Holster, Trevor A.	1
Huynh, Huynh	1
Hyeri Hong	1
Hyeryung Lee	1
Kane, Michael T.	1
Kellow, J. Thomas	1
Lake, J.	1
Millsap, Roger E.	1
Norcini, John J.	1
Roid, Gale H.	1
Sackett, Paul R.	1
Schaeffer, Gary A.	1
Sun, Shaojing	1
Terrence D. Jorgensen	1
Walter P. Vispoel	1
Wicherts, Jelte M.	1
Willson, Victor L.	1
More ▼