Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 6 |
Descriptor
Criterion Referenced Tests | 49 |
Error of Measurement | 49 |
Test Reliability | 30 |
Test Construction | 19 |
Cutting Scores | 16 |
Norm Referenced Tests | 15 |
Test Interpretation | 13 |
Scores | 11 |
Test Validity | 11 |
Item Analysis | 9 |
True Scores | 9 |
More ▼ |
Source
Author
Haladyna, Tom | 4 |
Brennan, Robert L. | 3 |
Livingston, Samuel A. | 3 |
Roid, Gale | 3 |
Haladyna, Thomas M. | 2 |
Harris, Chester W. | 2 |
Kane, Michael T. | 2 |
Schaeffer, Gary A. | 2 |
Bateman, Andrea | 1 |
Belcher, Marcia | 1 |
Berk, Ronald A. | 1 |
More ▼ |
Publication Type
Reports - Research | 24 |
Journal Articles | 13 |
Speeches/Meeting Papers | 12 |
Reports - Evaluative | 5 |
Reports - Descriptive | 3 |
Information Analyses | 2 |
Opinion Papers | 2 |
Guides - Non-Classroom | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
High Schools | 1 |
Audience
Researchers | 4 |
Laws, Policies, & Programs
Assessments and Surveys
Alabama High School… | 1 |
College Level Academic Skills… | 1 |
General Educational… | 1 |
Lexile Scale of Reading | 1 |
Texas Assessment of Academic… | 1 |
What Works Clearinghouse Rating
Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023
We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…
Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement
Fan, Xitao; Sun, Shaojing – Journal of Early Adolescence, 2014
In adolescence research, the treatment of measurement reliability is often fragmented, and it is not always clear how different reliability coefficients are related. We show that generalizability theory (G-theory) is a comprehensive framework of measurement reliability, encompassing all other reliability methods (e.g., Pearson "r,"…
Descriptors: Generalizability Theory, Measurement, Reliability, Correlation
Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016
Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests
Wicherts, Jelte M.; Millsap, Roger E. – American Psychologist, 2009
Sacked, Borne man, and Connelly recently discussed several criticisms that are often raised against the use of cognitive tests in selection. One criticism concerns the issue of measurement bias in cognitive ability tests with respect to specific groups in society. Sacked et AL. (2008) stated that "absent additional information, one cannot…
Descriptors: Prediction, Cognitive Tests, Cognitive Ability, Statistical Bias
Sackett, Paul R.; Borneman, Matthew J.; Connelly, Brian S. – American Psychologist, 2009
We are pleased that our article prompted this series of four commentaries and that we have this opportunity to respond. We address each in turn. Duckworth and Kaufman and Agars discussed, respectively, two broad issues concerning the validity of selection systems, namely, the expansion of the predictor domain to include noncognitive predictors of…
Descriptors: High Stakes Tests, Reader Response, Error of Measurement, Test Bias
Setzer, J. Carl – GED Testing Service, 2009
The GED[R] English as a Second Language (GED ESL) Test was designed to serve as an adjunct to the GED test battery when an examinee takes either the Spanish- or French-language version of the tests. The GED ESL Test is a criterion-referenced, multiple-choice instrument that assesses the functional, English reading skills of adults whose first…
Descriptors: Language Tests, High School Equivalency Programs, Psychometrics, Reading Skills
Haladyna, Thomas M. – 1974
Classical test theory has been rejected for application to criterion-referenced (CR) tests by most psychometricians due to an expected lack of variance in scores and other difficulties. The present study was conceived to resolve the variance problem and explore the possibility that classical test theory is both appropriate and desirable for some…
Descriptors: Criterion Referenced Tests, Error of Measurement, Sampling, Test Construction

Kellow, J. Thomas; Willson, Victor L. – Practical Assessment, Research & Evaluation, 2001
Explores the consequence of failing to incorporate measurement error in the development of cut scores in criterion-referenced measures, using the example of Texas and the Texas Assessment of Academic Skills to illustrate the impact of measurement error on false negative decisions. Findings support those of W. Haney (2000). (SLD)
Descriptors: Criterion Referenced Tests, Cutting Scores, Decision Making, Error of Measurement

Kane, Michael T. – Journal of Educational Measurement, 1986
These analyses suggest that if a criterion-referenced test had a reliability (defined in terms of internal consistency) below 0.5, a simple a priori procedure would provide better estimates of students' universe scores than would individual observed scores. (Author/LMO)
Descriptors: Criterion Referenced Tests, Educational Research, Error of Measurement, Generalizability Theory

Huynh, Huynh – Psychometrika, 1982
A Bayesian framework for making mastery/nonmastery decisions based on multivariate test data is described. Overall, mastery is granted if the posterion expected loss associated with such action is smaller than the one incurred by denying mastery. (Author/JKS)
Descriptors: Bayesian Statistics, Criterion Referenced Tests, Cutting Scores, Error of Measurement
Millman, Jason – 1972
Procedures for establishing standards and determining the number of items needed in criterion-referenced measures are reviewed. The discussion of setting a passing score is organized around five factors: performance of others, item content, educational consequences, psychological and financial costs, and measurement error. Classical test theory,…
Descriptors: Academic Achievement, Criterion Referenced Tests, Error of Measurement, Models

Norcini, John J.; And Others – Evaluation and the Health Professions, 1990
Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)
Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory
Livingston, Samuel A. – 1976
A distinction is made between reliability of measurement and reliability of classification; the "criterion-referenced reliability coefficient" describes the former. Application of this coefficient to the probability distribution of possible scores for a single student yields a meaningful way to describe the reliability of a single score. (Author)
Descriptors: Classification, Criterion Referenced Tests, Error of Measurement, Measurement
Kane, Michael; Wilson, Jennifer – 1982
This paper evaluates the magnitude of the total error in estimates of the difference between an examinee's domain score and the cutoff score. An observed score based on a random sample of items from the domain, and an estimated cutoff score derived from a judgmental standard setting procedure are assumed. The work of Brennan and Lockwood (1980) is…
Descriptors: Criterion Referenced Tests, Cutting Scores, Error of Measurement, Mastery Tests
Reid, Jerry B.; Roberts, Dennis M. – 1978
Comparisons of corresponding values of phi and kappa coefficients were made for 270 instances of data generated by a Monte Carlo technique to simulate a test-retest situation. Data were generated for distributions with the same mean but three different levels of standard deviation, standard error of measurement and cutting score. Ten samples of…
Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores