Publication Date
In 2025 | 2 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 18 |
Descriptor
Error of Measurement | 21 |
Scores | 21 |
Testing | 21 |
Reliability | 7 |
Measurement | 6 |
Test Reliability | 5 |
Test Validity | 5 |
Academic Achievement | 4 |
Evaluation | 4 |
Test Interpretation | 4 |
Test Results | 4 |
More ▼ |
Source
Author
Kane, Michael | 2 |
Li, Min | 2 |
Solano-Flores, Guillermo | 2 |
Aucejo, Esteban | 1 |
Benjamin W. Domingue | 1 |
Briggs, Derek C. | 1 |
Brockmann, Frank | 1 |
Chen, Hui-Mei | 1 |
Cronbach, Lee J. | 1 |
Doorey, Nancy A. | 1 |
Fletcher, Jack M. | 1 |
More ▼ |
Publication Type
Journal Articles | 12 |
Reports - Research | 8 |
Reports - Descriptive | 7 |
Opinion Papers | 4 |
Reports - Evaluative | 2 |
Guides - General | 1 |
Numerical/Quantitative Data | 1 |
Speeches/Meeting Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 3 |
Elementary Education | 2 |
High Schools | 2 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
Intermediate Grades | 1 |
Audience
Administrators | 1 |
Counselors | 1 |
Policymakers | 1 |
Teachers | 1 |
Location
China (Beijing) | 1 |
North Carolina | 1 |
United States | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
ACT Assessment | 3 |
SAT (College Admission Test) | 1 |
Wechsler Adult Intelligence… | 1 |
Wechsler Intelligence Scale… | 1 |
What Works Clearinghouse Rating
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Jeff Allen; Ty Cruce – ACT Education Corp., 2025
This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…
Descriptors: College Entrance Examinations, Testing, Change, Scores
Aucejo, Esteban; Romano, Teresa; Taylor, Eric S. – Centre for Economic Performance, 2019
Performance evaluation may change employee effort and decisions in unintended ways, for example, in multitask jobs where the evaluation measure captures only a subset of (differentially weights) the job tasks. We show evidence of this multitask distortion in schools, with teachers allocating effort across students (tasks). Teachers are evaluated…
Descriptors: Teacher Evaluation, Student Evaluation, Mathematics Tests, Scores
Hsiao, Yu-Yu; Kwok, Oi-Man; Lai, Mark H. C. – Educational and Psychological Measurement, 2018
Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural…
Descriptors: Error of Measurement, Testing, Scores, Models
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Kane, Michael – Journal of Educational Measurement, 2011
Errors don't exist in our data, but they serve a vital function. Reality is complicated, but our models need to be simple in order to be manageable. We assume that attributes are invariant over some conditions of observation, and once we do that we need some way of accounting for the variability in observed scores over these conditions of…
Descriptors: Error of Measurement, Scores, Test Interpretation, Testing
Liaw, Lih-Jiun; Hsieh, Ching-Lin; Hsu, Miao-Ju; Chen, Hui-Mei; Lin, Jau-Hong; Lo, Sing-Kai – International Journal of Rehabilitation Research, 2012
The aim of this study is to determine the test-retest reproducibility of the seven-item Short-Form Berg Balance Scale (SFBBS) and the five-item Short-Form Postural Assessment Scale for Stroke Patients (SFPASS) in individuals with chronic stroke. Fifty-two chronic stroke patients from two rehabilitation departments were included in the study. Both…
Descriptors: Measurement, Measures (Individuals), Correlation, Patients
Solano-Flores, Guillermo; Li, Min – Educational Research and Evaluation, 2013
We discuss generalizability (G) theory and the fair and valid assessment of linguistic minorities, especially emergent bilinguals. G theory allows examination of the relationship between score variation and language variation (e.g., variation of proficiency across languages, language modes, and social contexts). Studies examining score variation…
Descriptors: Measurement, Testing, Language Proficiency, Test Construction
Sijtsma, Klaas – Psychometrika, 2009
This discussion paper argues that both the use of Cronbach's alpha as a reliability estimate and as a measure of internal consistency suffer from major problems. First, alpha always has a value, which cannot be equal to the test score's reliability given the inter-item covariance matrix and the usual assumptions about measurement error. Second, in…
Descriptors: Measurement, Error of Measurement, Scores, Computation
Kane, Michael – Educational Testing Service, 2010
The 12th annual William H. Angoff Memorial Lecture was presented by Dr. Michael T. Kane, ETS's (Educational Testing Service) Samuel J. Messick Chair in Test Validity and the former Director of Research at the National Conference of Bar Examiners. Dr. Kane argues that it is important for policymakers to recognize the impact of errors of measurement…
Descriptors: Error of Measurement, Scores, Public Policy, Test Theory
Brockmann, Frank – Council of Chief State School Officers, 2011
State testing programs today are more extensive than ever, and their results are required to serve more purposes and high-stakes decisions than one might have imagined. Assessment results are used to hold schools, districts, and states accountable for student performance and to help guide a multitude of important decisions. This report describes…
Descriptors: Accuracy, Measurement, Testing, Expertise
Fletcher, Jack M.; Stuebing, Karla K.; Hughes, Lisa C. – Journal of Psychoeducational Assessment, 2010
IQ test scores should be corrected for high stakes decisions that employ these assessments, including capital offense cases. If scores are not corrected, then diagnostic standards must change with each generation. Arguments against corrections, based on standards of practice, information present and absent in test manuals, and related issues,…
Descriptors: Testing, Mental Retardation, Validity, Intelligence Quotient
Doorey, Nancy A. – Council of Chief State School Officers, 2011
The work reported in this paper reflects a collaborative effort of many individuals representing multiple organizations. It began during a session at the October 2008 meeting of TILSA when a representative of a member state asked the group if any of their programs had experienced unexpected fluctuations in the annual state assessment scores, and…
Descriptors: Testing, Sampling, Expertise, Testing Programs
Tienken, Christopher H. – Educational Forum, 2011
Test score validity takes center stage in the debate over the use of high school exit exams. Scant literature addresses the amount of conditional standard error of measurement (CSEM) present in individual student results on high school exit exams. The purpose of this study is to fill a void in the literature and add a national review of the CSEM,…
Descriptors: High Schools, Exit Examinations, State Departments of Education, Error of Measurement
Previous Page | Next Page ยป
Pages: 1 | 2