NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Location
Japan1
Texas1
Laws, Policies, & Programs
Assessments and Surveys
Texas Assessment of Academic…1
What Works Clearinghouse Rating
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023
We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…
Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Fan, Xitao; Sun, Shaojing – Journal of Early Adolescence, 2014
In adolescence research, the treatment of measurement reliability is often fragmented, and it is not always clear how different reliability coefficients are related. We show that generalizability theory (G-theory) is a comprehensive framework of measurement reliability, encompassing all other reliability methods (e.g., Pearson "r,"…
Descriptors: Generalizability Theory, Measurement, Reliability, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016
Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wicherts, Jelte M.; Millsap, Roger E. – American Psychologist, 2009
Sacked, Borne man, and Connelly recently discussed several criticisms that are often raised against the use of cognitive tests in selection. One criticism concerns the issue of measurement bias in cognitive ability tests with respect to specific groups in society. Sacked et AL. (2008) stated that "absent additional information, one cannot…
Descriptors: Prediction, Cognitive Tests, Cognitive Ability, Statistical Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Sackett, Paul R.; Borneman, Matthew J.; Connelly, Brian S. – American Psychologist, 2009
We are pleased that our article prompted this series of four commentaries and that we have this opportunity to respond. We address each in turn. Duckworth and Kaufman and Agars discussed, respectively, two broad issues concerning the validity of selection systems, namely, the expansion of the predictor domain to include noncognitive predictors of…
Descriptors: High Stakes Tests, Reader Response, Error of Measurement, Test Bias
Peer reviewed Peer reviewed
Kellow, J. Thomas; Willson, Victor L. – Practical Assessment, Research & Evaluation, 2001
Explores the consequence of failing to incorporate measurement error in the development of cut scores in criterion-referenced measures, using the example of Texas and the Texas Assessment of Academic Skills to illustrate the impact of measurement error on false negative decisions. Findings support those of W. Haney (2000). (SLD)
Descriptors: Criterion Referenced Tests, Cutting Scores, Decision Making, Error of Measurement
Peer reviewed Peer reviewed
Kane, Michael T. – Journal of Educational Measurement, 1986
These analyses suggest that if a criterion-referenced test had a reliability (defined in terms of internal consistency) below 0.5, a simple a priori procedure would provide better estimates of students' universe scores than would individual observed scores. (Author/LMO)
Descriptors: Criterion Referenced Tests, Educational Research, Error of Measurement, Generalizability Theory
Peer reviewed Peer reviewed
Huynh, Huynh – Psychometrika, 1982
A Bayesian framework for making mastery/nonmastery decisions based on multivariate test data is described. Overall, mastery is granted if the posterion expected loss associated with such action is smaller than the one incurred by denying mastery. (Author/JKS)
Descriptors: Bayesian Statistics, Criterion Referenced Tests, Cutting Scores, Error of Measurement
Peer reviewed Peer reviewed
Norcini, John J.; And Others – Evaluation and the Health Professions, 1990
Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)
Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory
Peer reviewed Peer reviewed
Cangelosi, James S. – Educational Measurement: Issues and Practice, 1984
Test development procedures and six methods for determining cut-off scores are briefly described. An alternate method, appropriate when the test developer also determines the cut-off score, is suggested. Unlike other methods, the standard is set during the test development stage. Its computations are intelligible to nonstatistically-oriented…
Descriptors: Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education, Error of Measurement
Peer reviewed Peer reviewed
Schaeffer, Gary A.; And Others – Evaluation Review, 1986
The reliability of criterion-referenced tests (CRTs) used in health program evaluation can be conceptualized in different ways. Formulas are presented for estimating appropriate standard error of measurement (SEM) for CRTs. The SEM can be used in computing confidence intervals for domain score estimates and for a cut-score. (Author/LMO)
Descriptors: Accountability, Criterion Referenced Tests, Cutting Scores, Error of Measurement
Peer reviewed Peer reviewed
Haladyna, Thomas M.; Roid, Gale H. – Journal of Educational Measurement, 1983
The present study showed that Rasch-based adaptive tests--when item domains were finite and specifiable--had greater precision in domain score estimation than test forms created by random sampling of items. Results were replicated across four data sources representing a variety of criterion-referenced, domain-based tests varying in length.…
Descriptors: Adaptive Testing, Criterion Referenced Tests, Error of Measurement, Estimation (Mathematics)
Peer reviewed Peer reviewed
Berk, Ronald A. – Review of Educational Research, 1986
Thirty-eight methods are presented for either setting standards or adjusting them based on an analysis of classification error rates. A trilevel classification scheme is used to categorize the methods, and 10 criteria of technical adequacy and practicability are proposed to evaluate them. (Author/LMO)
Descriptors: Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education, Error of Measurement