Publication Date
In 2025 | 2 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 10 |
Descriptor
Evaluation Methods | 82 |
Test Interpretation | 82 |
Test Validity | 82 |
Test Reliability | 35 |
Student Evaluation | 23 |
Test Construction | 22 |
Elementary Secondary Education | 20 |
Test Use | 19 |
Measurement Techniques | 18 |
Test Results | 17 |
Evaluation Criteria | 15 |
More ▼ |
Source
Author
Linn, Robert L. | 4 |
Fleming, Dan B. | 2 |
Allen, R. R. | 1 |
An, Lily Shiao | 1 |
Anderson, Colette | 1 |
Arreola, Raoul A. | 1 |
Arter, Judith A. | 1 |
Athelstan, Gary T. | 1 |
Bailey, Jennifer | 1 |
Baird, Jo-Anne | 1 |
Baker, Eva L. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 5 |
Higher Education | 2 |
Junior High Schools | 1 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Location
Australia | 3 |
United Kingdom | 2 |
United Kingdom (England) | 2 |
China | 1 |
Connecticut | 1 |
Greece | 1 |
Kentucky (Louisville) | 1 |
Michigan | 1 |
United Kingdom (Wales) | 1 |
United States | 1 |
Laws, Policies, & Programs
Elementary and Secondary… | 3 |
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Stephen M. Leach; Jason C. Immekus; Jeffrey C. Valentine; Prathiba Batley; Dena Dossett; Tamara Lewis; Thomas Reece – Assessment for Effective Intervention, 2025
Educators commonly use school climate survey scores to inform and evaluate interventions for equitably improving learning and reducing educational disparities. Unfortunately, validity evidence to support these (and other) score uses often falls short. In response, Whitehouse et al. proposed a collaborative, two-part validity testing framework for…
Descriptors: School Surveys, Measurement, Hierarchical Linear Modeling, Educational Environment
Eirini M. Mitropoulou; Leonidas A. Zampetakis; Ioannis Tsaousis – Evaluation Review, 2024
Unfolding item response theory (IRT) models are important alternatives to dominance IRT models in describing the response processes on self-report tests. Their usage is common in personality measures, since they indicate potential differentiations in test score interpretation. This paper aims to gain a better insight into the structure of trait…
Descriptors: Foreign Countries, Adults, Item Response Theory, Personality Traits
An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022
Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…
Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies
Geisinger, Kurt F. – International Journal of Testing, 2012
This article sets the stage for the description of a variety of approaches to test reviewing worldwide. It describes the importance of test reviewing as a protection of the public and of society and also the benefits of this activity for test users, who must choose measures to use in particular situations with particular clients at a particular…
Descriptors: Test Reviews, Evaluation Methods, Evaluation Criteria, Global Approach
Bramley, Tom; Gill, Tim – Research Papers in Education, 2010
The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…
Descriptors: Scores, Psychometrics, Measurement Techniques, Foreign Countries
Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010
"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…
Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques
Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010
Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…
Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics
Bailey, Jennifer; Little, Chelsea; Rigney, Rex; Thaler, Anna; Weiderman, Ken; Yorkovich, Ben – Online Submission, 2010
This handbook is designed as a quick reference for first-year teachers who find themselves in an assessment driven environment with little experience to help make sense of the language, underlying philosophy, or organizational structure of the assessment system. The handbook begins with advice on developing and evaluating effective learning…
Descriptors: Student Evaluation, Portfolio Assessment, Elementary Secondary Education, Performance Based Assessment

Replogle, William H.; Eicke, F. J. – Journal of School Psychology, 1985
Evaluated an automated analysis system for the Wechsler Intelligence Scale Revised. Results indicated significantly higher ratings for the automated analysis on an overall item and on items addressing Verbal-Performance, discrepancies, relative weaknesses, and relative lack of irresponsible interpretation. These results support cautious use of the…
Descriptors: Automation, Data Processing, Evaluation Methods, Test Interpretation
Dielman, T. E.; Wilson, Warner R. – J Educ Meas, 1970
Descriptors: Ability, Achievement, Aspiration, Evaluation Methods
Osburn, H. G.; Shoemaker, David M. – 1968
A computer program generating question series for achievement examinations was presented and the relative reliability of computer-generated and instructor-selected items was investigated. To provide validity for examinations generated by an original computer program, representative processes of construction and sampling were operationally defined.…
Descriptors: Achievement Tests, Evaluation Methods, Measurement Techniques, Test Construction
Popp, Jerome A. – 1975
In this paper it is argued that the problem of construct validation in the construction of instruments and indicators is an important problem for educational researchers and practitioners; moreover, it is claimed that the popular notion of operational definition is a misleading idea which has obscured the problem of construct validity in…
Descriptors: Evaluation Methods, Statistical Analysis, Statistical Significance, Test Construction
Fortna, Richard O. – 1981
Measurement terms used in Title I evaluation are contained in this glossary. Several types of measurement techniques are identified and defined. Other measurement terms which are defined include those relating to validity, reliability, statistical analysis, test interpretation, and program effectiveness. (DWH)
Descriptors: Educational Testing, Evaluation Methods, Glossaries, Program Evaluation