ERIC - Search Results

Publication Date

In 2025	2
Since 2024	4
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	10

Descriptor

Evaluation Methods	82
Test Interpretation	82
Test Validity	82
Test Reliability	35
Student Evaluation	23
Test Construction	22
Elementary Secondary Education	20
Test Use	19
Measurement Techniques	18
Test Results	17
Evaluation Criteria	15
Educational Assessment	14
Scores	14
Testing Problems	14
Achievement Tests	13
Testing	12
Educational Testing	11
Standardized Tests	11
Tests	10
Test Bias	9
Test Selection	9
Comparative Analysis	8
Foreign Countries	7
Program Evaluation	7
Test Format	7
More ▼

Education Level

Elementary Secondary Education	5
Higher Education	2
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Practitioners	13
Teachers	7
Administrators	3
Policymakers	1
Researchers	1
Students	1

Location

Australia	3
United Kingdom	2
United Kingdom (England)	2
China	1
Connecticut	1
Greece	1
Kentucky (Louisville)	1
Michigan	1
United Kingdom (Wales)	1
United States	1

Laws, Policies, & Programs

Elementary and Secondary…	3
Elementary and Secondary…	1

Assessments and Surveys

National Assessment of…	4
Aberrant Behavior Checklist	1
Adjective Check List	1
Advanced Placement…	1
Child Abuse Potential…	1
Group Embedded Figures Test	1
Iowa Tests of Educational…	1
Learning Style Inventory	1
Myers Briggs Type Indicator	1
Pennsylvania Educational…	1
Productivity Environmental…	1
SAT (College Admission Test)	1
Self Directed Learning…	1
Sequential Tests of…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 82 results Save | Export

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Raters' Scoring Process in Assessment of Interpreting: An Empirical Study Based on Eye Tracking and Retrospective Verbalisation

Peer reviewed

Direct link

Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024

Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…

Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability

Re-Examining Measurement Invariance of School Climate Surveys across Race/Ethnicity

Peer reviewed

Direct link

Stephen M. Leach; Jason C. Immekus; Jeffrey C. Valentine; Prathiba Batley; Dena Dossett; Tamara Lewis; Thomas Reece – Assessment for Effective Intervention, 2025

Educators commonly use school climate survey scores to inform and evaluate interventions for equitably improving learning and reducing educational disparities. Unfortunately, validity evidence to support these (and other) score uses often falls short. In response, Whitehouse et al. proposed a collaborative, two-part validity testing framework for…

Descriptors: School Surveys, Measurement, Hierarchical Linear Modeling, Educational Environment

Calibrating Items Using an Unfolding Model of Item Response Theory: The Case of the Trait Personality Questionnaire 5 (TPQue5)

Peer reviewed

Direct link

Eirini M. Mitropoulou; Leonidas A. Zampetakis; Ioannis Tsaousis – Evaluation Review, 2024

Unfolding item response theory (IRT) models are important alternatives to dominance IRT models in describing the response processes on self-report tests. Their usage is common in personality measures, since they indicate potential differentiations in test score interpretation. This paper aims to gain a better insight into the structure of trait…

Descriptors: Foreign Countries, Adults, Item Response Theory, Personality Traits

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

Worldwide Test Reviewing at the Beginning of the Twenty-First Century

Peer reviewed

Direct link

Geisinger, Kurt F. – International Journal of Testing, 2012

This article sets the stage for the description of a variety of approaches to test reviewing worldwide. It describes the importance of test reviewing as a protection of the public and of society and also the benefits of this activity for test users, who must choose measures to use in particular situations with particular clients at a particular…

Descriptors: Test Reviews, Evaluation Methods, Evaluation Criteria, Global Approach

Evaluating the Rank-Ordering Method for Standard Maintaining

Peer reviewed

Direct link

Bramley, Tom; Gill, Tim – Research Papers in Education, 2010

The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…

Descriptors: Scores, Psychometrics, Measurement Techniques, Foreign Countries

Linking through Improved Design, Not Redefinition: Commentary on Newton

Peer reviewed

Direct link

Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010

"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…

Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques

What Constitutes Legitimate Causal Linking?

Peer reviewed

Direct link

Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010

Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…

Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics

Assessment 101: Assessment Made Easy for First-Year Teachers

Download full text

Bailey, Jennifer; Little, Chelsea; Rigney, Rex; Thaler, Anna; Weiderman, Ken; Yorkovich, Ben – Online Submission, 2010

This handbook is designed as a quick reference for first-year teachers who find themselves in an assessment driven environment with little experience to help make sense of the language, underlying philosophy, or organizational structure of the assessment system. The handbook begins with advice on developing and evaluating effective learning…

Descriptors: Student Evaluation, Portfolio Assessment, Elementary Secondary Education, Performance Based Assessment

Automated Analysis of the WISC-R: A Validation Study.

Peer reviewed

Replogle, William H.; Eicke, F. J. – Journal of School Psychology, 1985

Evaluated an automated analysis system for the Wechsler Intelligence Scale Revised. Results indicated significantly higher ratings for the automated analysis on an overall item and on items addressing Verbal-Performance, discrepancies, relative weaknesses, and relative lack of irresponsible interpretation. These results support cautious use of the…

Descriptors: Automation, Data Processing, Evaluation Methods, Test Interpretation

Convergent and Discriminant Validity of Three Measures of Ability, Aspiration-Level, Achievement, Adjustment and Dominance

Dielman, T. E.; Wilson, Warner R. – J Educ Meas, 1970

Descriptors: Ability, Achievement, Aspiration, Evaluation Methods

Pilot Project on Computer Generated Test Items.

Download full text

Osburn, H. G.; Shoemaker, David M. – 1968

A computer program generating question series for achievement examinations was presented and the relative reliability of computer-generated and instructor-selected items was investigated. To provide validity for examinations generated by an original computer program, representative processes of construction and sampling were operationally defined.…

Descriptors: Achievement Tests, Evaluation Methods, Measurement Techniques, Test Construction

What is the Problem of Construct Validity?

Download full text

Popp, Jerome A. – 1975

In this paper it is argued that the problem of construct validation in the construction of instruments and indicators is an important problem for educational researchers and practitioners; moreover, it is claimed that the popular notion of operational definition is a misleading idea which has obscured the problem of construct validity in…

Descriptors: Evaluation Methods, Statistical Analysis, Statistical Significance, Test Construction

A Glossary of Measurement Terms Used in Title I Evaluation.

Download full text

Fortna, Richard O. – 1981

Measurement terms used in Title I evaluation are contained in this glossary. Several types of measurement techniques are identified and defined. Other measurement terms which are defined include those relating to validity, reliability, statistical analysis, test interpretation, and program effectiveness. (DWH)

Descriptors: Educational Testing, Evaluation Methods, Glossaries, Program Evaluation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Educational Measurement:…	5
Assessment for Effective…	2
Journal of Educational…	2
Measurement:…	2
Alberta Journal of…	1
American Journal of Education	1
American Journal on Mental…	1
Annual Review of Applied…	1
Counseling Psychologist	1
Early Child Development and…	1
Education Economics	1
Evaluation Review	1
Focus on Exceptional Children	1
High School Magazine	1
Instructional Science	1
International Journal of…	1
Interpreter and Translator…	1
J Educ Meas	1
Journal of Personality…	1
Journal of Reading	1
Journal of School Psychology	1
Language, Speech, and Hearing…	1
Online Submission	1
Peabody Journal of Education	1
Reading in a Foreign Language	1
More ▼

Linn, Robert L.	4
Fleming, Dan B.	2
Allen, R. R.	1
An, Lily Shiao	1
Anderson, Colette	1
Arreola, Raoul A.	1
Arter, Judith A.	1
Athelstan, Gary T.	1
Bailey, Jennifer	1
Baird, Jo-Anne	1
Baker, Eva L.	1
Banta, Trudy W.	1
Benavidez, Charlotte	1
Bihm, Elson M.	1
Binghan Zheng	1
Bracey, Gerald W.	1
Bramley, Tom	1
Campbell, Vicki L.	1
Cancelli, Anthony A.	1
Chao Han	1
Cook, Allison A.	1
Crewe, Nancy M.	1
Davis, Laurie Laughlin	1
Dena Dossett	1
More ▼

Journal Articles	29
Reports - Research	20
Opinion Papers	14
Reports - Evaluative	12
Guides - Non-Classroom	11
Information Analyses	9
Speeches/Meeting Papers	8
Reports - Descriptive	6
Books	5
Guides - Classroom - Teacher	2
Tests/Questionnaires	2
Book/Product Reviews	1
Collected Works - Proceedings	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - Classroom - Learner	1
Guides - General	1
Reference Materials -…	1
Reference Materials -…	1
More ▼