ERIC - Search Results

Publication Date

In 2025	10
Since 2024	25
Since 2021 (last 5 years)	60
Since 2016 (last 10 years)	98
Since 2006 (last 20 years)	275

Descriptor

Evaluation Methods	501
Interrater Reliability	501
Student Evaluation	90
Test Reliability	90
Evaluators	78
Foreign Countries	78
Test Validity	70
Scoring	67
Correlation	59
Higher Education	56
Comparative Analysis	53
Measurement Techniques	51
Evaluation Criteria	50
Rating Scales	47
Validity	44
Scores	41
Teacher Evaluation	41
Performance Based Assessment	40
Measures (Individuals)	39
Psychometrics	39
Observation	38
Statistical Analysis	38
Writing Evaluation	35
Reliability	33
Scoring Rubrics	33
More ▼

Education Level

Higher Education	91
Postsecondary Education	54
Elementary Education	26
Elementary Secondary Education	22
Secondary Education	17
Adult Education	14
Early Childhood Education	11
Middle Schools	9
High Schools	8
Preschool Education	7
Grade 4	6
Grade 6	6
Grade 8	4
Grade 7	3
Intermediate Grades	3
Junior High Schools	3
Primary Education	3
Grade 1	2
Grade 2	2
Grade 3	2
Grade 5	2
Two Year Colleges	2
Kindergarten	1
More ▼

Audience

Researchers	39
Practitioners	13
Teachers	8
Administrators	3

Location

Australia	9
China	7
Netherlands	7
North Carolina	7
United Kingdom (England)	7
Canada	6
Florida	6
Israel	6
United Kingdom	6
California	5
Turkey	5
United States	5
Pennsylvania	4
Texas	4
Germany	3
India	3
Italy	3
Ohio	3
South Korea	3
Sweden	3
Tennessee	3
Asia	2
Brazil	2
Colombia	2
Connecticut	2
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	2
Race to the Top	2
Americans with Disabilities…	1
Rehabilitation Act 1973…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 501 results Save | Export

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Constructing a Roadmap to Measure the Quality of Business Assessments Aimed at Curriculum Management

Peer reviewed

Direct link

Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023

Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…

Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education

Design of a Simple Rubric to Peer-Evaluate the Teamwork Skills of Engineering Students

Peer reviewed

Direct link

Swapneel Thite; Jayashri Ravishankar; Inmaculada Tomeo-Reyes; Araceli Martinez Ortiz – European Journal of Engineering Education, 2024

Effectively working in an engineering workplace requires strong teamwork skills, yet the existing literature within various disciplines reveals discrepancies in evaluating these skills. This complicates the design of a generic teamwork peer evaluation tool for engineering students. This study aims to address this gap by introducing the DRIVE…

Descriptors: Scoring Rubrics, Evaluation Methods, Peer Evaluation, Teamwork

The Value of Expanding Perspectives on Assessment

Peer reviewed

Direct link

Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024

In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…

Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods

Interdisciplinary Thinking among Seventh-Grade Students in Lower-Secondary Science Education

Peer reviewed
PDF on ERIC

Download full text

Shasha Chen; Shaohui Chi; Zuhao Wang – Journal of Baltic Science Education, 2025

Interdisciplinary thinking is critical for equipping students to apply scientific knowledge and tackle societal challenges across various disciplines, which has been recognized as a key objective of twenty-first century science education. However, research on effective interdisciplinary assessment in secondary school science education is still…

Descriptors: Thinking Skills, Interdisciplinary Approach, Science Instruction, Grade 7

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Psychometric Properties of the Behavior Assessment System for Children Student Observation System (BASC-3 SOS) with Young Children in Special Education

Peer reviewed

Direct link

Schmidt, Ellyn M.; Rothenberg, W. Andrew; Davidson, Bridget C.; Barnett, Miya; Jent, Jason; Cadenas, Heleny; Fernandez, Corina; Davis, Eileen – Journal of Behavioral Education, 2023

Measuring classroom behavior among young children is important to guide assessment and intervention decisions, yet there is limited literature on appropriate direct observation tools for this purpose. This article describes the psychometric properties of the Behavior Assessment System for Children, Student Observation System (BASC-3 SOS) with 135…

Descriptors: Young Children, Special Education, Child Behavior, Psychometrics

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

"Rater Training" Re-Imagined for Work-Based Assessment in Medical Education

Peer reviewed

Direct link

Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023

In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…

Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training

Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021

Peer reviewed

Direct link

Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023

Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…

Descriptors: Chemistry, Periodicals, Journal Articles, Science Education

A Unified Approach to Estimating the Intraclass Correlation Coefficient and Its Bias: An Exploratory Study

Direct link

Kelvin Terrell Pompey – ProQuest LLC, 2021

Many methods are used to measure interrater reliability for studies where each target receives ratings by a different set of judges. The purpose of this study is to explore the use of hierarchical modeling for estimating interrater reliability using the intraclass correlation coefficient. This study provides a description of how the ICC can be…

Descriptors: Interrater Reliability, Evaluation Methods, Test Reliability, Correlation

Agree to Disagree: Multiple Methods to Assess Rater Agreement during Student Teaching

Peer reviewed

Direct link

Elayne P. Colón; Lori M. Dassa; Thomas M. Dana; Nathan P. Hanson – Action in Teacher Education, 2024

To meet accreditation expectations, teacher preparation programs must demonstrate their candidates are evaluated using summative assessment tools that yield sound, reliable, and valid data. These tools are primarily used by the clinical experience team -- university supervisors and mentor teachers. Institutional beliefs regarding best practices…

Descriptors: Student Teachers, Teacher Interns, Evaluation Methods, Interrater Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 34

ProQuest LLC	18
Assessment & Evaluation in…	9
Educational and Psychological…	9
Journal of Autism and…	8
Advances in Health Sciences…	7
Applied Measurement in…	7
Journal of Speech and Hearing…	7
Personnel Psychology	6
Research in Developmental…	6
Journal of Educational…	5
Journal of Speech, Language,…	5
Online Submission	5
Assessment and Evaluation in…	4
Early Childhood Research…	4
Educational Assessment	4
Evaluation and the Health…	4
Language Testing	4
Multivariate Behavioral…	4
Psychology in the Schools	4
Studies in Higher Education	4
American Journal of…	3
American Journal on Mental…	3
Computers & Education	3
Educational Measurement:…	3
Gerontologist	3
More ▼

Jaeger, Richard M.	5
Cason, Carolyn L.	3
Matson, Johnny L.	3
Myford, Carol M.	3
Plake, Barbara S.	3
Wind, Stefanie A.	3
Baer, John	2
Bejar, Isaac I.	2
Bottoms, Bryndle L.	2
Brown, William H.	2
Bursac, Zoran	2
Busch, John Christian	2
Cason, Gerald J.	2
Cordes, Anne K.	2
Dowda, Marsha	2
Einfeld, S. L.	2
Evenhuis, Heleen M.	2
Friedman, Greg	2
Gearhart, Maryl	2
Godbout, Paul	2
Hambleton, Ronald K.	2
Herman, Joan L.	2
Hermans, Heidi	2
Holcomb, T. Scott	2
More ▼

Journal Articles	380
Reports - Research	297
Reports - Evaluative	133
Speeches/Meeting Papers	64
Reports - Descriptive	39
Tests/Questionnaires	27
Dissertations/Theses -…	18
Information Analyses	18
Opinion Papers	11
Numerical/Quantitative Data	7
Books	2
Guides - Non-Classroom	2
Book/Product Reviews	1
Collected Works - General	1
Collected Works - Proceedings	1
Collected Works - Serials	1
Dissertations/Theses	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - General	1
Non-Print Media	1
Reports -…	1
More ▼

National Assessment of…	4
Advanced Placement…	3
Child Behavior Checklist	3
Teacher Performance…	3
Autism Diagnostic Observation…	2
Developmental Behavior…	2
Graduate Record Examinations	2
Hamilton Rating Scale for…	2
National Teacher Examinations	2
Praxis Series	2
Test of English as a Foreign…	2
Aberrant Behavior Checklist	1
Adjustment Scales for…	1
Bayley Scales of Infant…	1
Beck Anxiety Inventory	1
Behavior Assessment System…	1
Behavioral and Emotional…	1
Childrens Depression Inventory	1
Goal Attainment Scale	1
Group Assessment of Logical…	1
MacArthur Communicative…	1
Mullen Scales of Early…	1
NEO Personality Inventory	1
Raven Progressive Matrices	1
Reading Miscue Inventory	1
More ▼