Publication Date
| In 2026 | 0 |
| Since 2025 | 55 |
| Since 2022 (last 5 years) | 261 |
| Since 2017 (last 10 years) | 508 |
| Since 2007 (last 20 years) | 1258 |
Descriptor
| Evaluation Methods | 2743 |
| Test Reliability | 1408 |
| Test Validity | 991 |
| Reliability | 964 |
| Student Evaluation | 567 |
| Validity | 515 |
| Interrater Reliability | 502 |
| Foreign Countries | 444 |
| Test Construction | 364 |
| Higher Education | 359 |
| Measurement Techniques | 305 |
| More ▼ | |
Source
Author
| Raykov, Tenko | 9 |
| Epstein, Michael H. | 7 |
| Jaeger, Richard M. | 7 |
| Matson, Johnny L. | 7 |
| Amrein-Beardsley, Audrey | 6 |
| Follman, John | 6 |
| Gill, Brian | 6 |
| Gresham, Frank M. | 6 |
| Thompson, Bruce | 6 |
| Fink, Arlene | 5 |
| Marcoulides, George A. | 5 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 137 |
| Practitioners | 99 |
| Teachers | 41 |
| Administrators | 32 |
| Policymakers | 17 |
| Students | 13 |
| Counselors | 5 |
| Support Staff | 3 |
| Community | 1 |
| Media Staff | 1 |
| Parents | 1 |
| More ▼ | |
Location
| Australia | 45 |
| United Kingdom | 41 |
| Canada | 31 |
| United Kingdom (England) | 29 |
| China | 28 |
| United States | 28 |
| Turkey | 27 |
| California | 22 |
| Florida | 21 |
| Netherlands | 19 |
| Israel | 16 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Ilona Rinne – Assessment & Evaluation in Higher Education, 2024
It is widely acknowledged in research that common criteria and aligned standards do not result in consistent assessment of such a complex performance as the final undergraduate thesis. Assessment is determined by examiners' understanding of rubrics and their views on thesis quality. There is still a gap in the research literature about how…
Descriptors: Foreign Countries, Undergraduate Students, Teacher Education Programs, Evaluation Criteria
Scott F. Marion, Editor; James W. Pellegrino, Editor; Amy I. Berman, Editor – National Academy of Education, 2024
High-quality assessments are crucial to many aspects of the educational process. They can help policymakers monitor long-term educational trends, assist state educational agencies (SEAs) and local educational agencies (LEAs) in allocating resources and professional development opportunities, provide insights to teachers about how well students…
Descriptors: Educational Assessment, Educational Policy, Equal Education, Test Validity
Yang Yang – Shanlax International Journal of Education, 2024
This paper explores the reliability of using ChatGPT in evaluating EFL writing by assessing its intra- and inter-rater reliability. Eighty-two compositions were randomly sampled from the Written English Corpus of Chinese Learners. These compositions were rated by three experienced raters with regard to 'language', 'content', and 'organization'.…
Descriptors: English (Second Language), Second Language Instruction, Writing (Composition), Evaluation Methods
Nicole D. Martin; Stephanie N. Baker; Madeline Haynes; Jayce R. Warner – Computer Science Education, 2024
Background and Context: As computer science (CS) education expands and the need for well-prepared CS teachers grows, understanding what motivates teachers to teach CS can help address challenges to recruiting, preparing, and retaining teachers. Objective: The goal of this work was to develop and validate a scale that measures teachers' motivation…
Descriptors: Computer Science Education, Teacher Motivation, Measurement Techniques, Construct Validity
Kazuya Saito; Adam Tierney – Studies in Second Language Acquisition, 2024
This article proposes a conceptual and measurement framework for postpubertal, L2 speech learning aptitude that is centered around domain-general auditory processing (i.e., representing spectral and temporal characteristics of sounds). To this end, we examine the construct and reliability of a battery of auditory processing tests by presenting the…
Descriptors: Second Language Learning, Auditory Tests, Auditory Perception, Listening Comprehension Tests
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Bang Quan Zheng; Peter M. Bentler – Structural Equation Modeling: A Multidisciplinary Journal, 2025
This paper aims to advocate for a balanced approach to model fit evaluation in structural equation modeling (SEM). The ongoing debate surrounding chi-square test statistics and fit indices has been characterized by ambiguity and controversy. Despite the acknowledged limitations of relying solely on the chi-square test, its careful application can…
Descriptors: Monte Carlo Methods, Structural Equation Models, Goodness of Fit, Robustness (Statistics)
Stacey Havlik; Peter Wiens; Arash Ghafoori; Melissa Jacobowitz; Kelly-Jo Sheback; Hannah Hudson – Journal of Education for Students Placed at Risk, 2025
While many teachers are unaware that students in their classes are experiencing homelessness, others may not know how to support students who are identified as lacking consistent housing (Wright et al., 2019). Thus, there is a critical need to better assess, understand, and enhance teachers' knowledge and attitudes toward homelessness. Therefore,…
Descriptors: Preservice Teachers, Preservice Teacher Education, Homeless People, Student Characteristics
Marianne Berg Halvorsen; Arvid Nikolai Kildahl; Sabine Kaiser; Brynhildur Axelsdottir; Michael G. Aman; Sissel Berge Helverschou – Journal of Autism and Developmental Disorders, 2025
In recent years, there has been a proliferation of instruments for assessing mental health (MH) among autistic people. This study aimed to review the psychometric properties of broadband instruments used to assess MH problems among autistic people. In accordance with the PRISMA guidelines (PROSPERO: CRD42022316571) we searched the APA PsycINFO via…
Descriptors: Psychometrics, Mental Health, Clinical Diagnosis, Evaluation Methods
Daryl Close – Journal of Academic Ethics, 2025
For decades, student ratings of university faculty have been used by administrators in high stakes faculty employment decisions such as tenure, promotion, contract renewal and reappointment, and merit pay. However, virtually no attention has been paid to the ethical questions of using ratings in employment decisions. Instead, the ratings…
Descriptors: Student Evaluation of Teacher Performance, Ethics, College Students, College Faculty
Jennifer Sdunzik; Ann M. Bessenbacher; Wilella D. Burgess; Asia M. Mohamud; Abdirisak Dalmar – American Journal of Evaluation, 2025
The success of development projects and evaluations hinges on having access to research protocols and methodologies that consider the needs and characteristics of stakeholders, subjects, and context while remaining rigorous and culturally sound. These efforts are often complicated by a dearth of tools that have been tested for validity and…
Descriptors: Foreign Countries, Program Evaluation, International Programs, Data Collection
Lambert, Richard G.; Holcomb, T. Scott; Bottoms, Bryndle L. – Center for Educational Measurement and Evaluation, 2021
The validity of the Kappa coefficient of chance-corrected agreement has been questioned when the prevalence of specific rating scale categories is low and agreement between raters is high. The researchers proposed the Lambda Coefficient of Rater-Mediated Agreement as an alternative to Kappa to address these concerns. Lambda corrects for chance…
Descriptors: Interrater Reliability, Teacher Evaluation, Test Validity, Evaluation Methods
Power, Jason Richard; Tanner, David – European Journal of Engineering Education, 2023
Self and peer assessments have been identified as effective strategies to develop a deeper understanding of complex concepts, enhance meta-cognitive capacity, and support learner self-efficacy. This study examines data related to peer and self-assessment exercises completed within a university engineering programme (n=61). Data related to…
Descriptors: Peer Evaluation, Self Evaluation (Individuals), Feedback (Response), Engineering Education
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Courtney M. Koletar – ProQuest LLC, 2024
For decades, evaluators have noted that it is difficult for stakeholders to accept negative evaluation results (Carter, 1971; Taut & Brauns, 2003). There is a need for additional research on evaluation to better understand when and why stakeholders reject negative or critical evaluation findings. Drawing on social identity theory (SIT), the…
Descriptors: Evaluation Methods, Interrater Reliability, Criticism, Positive Reinforcement

Peer reviewed
Direct link
