ERIC - Search Results

Publication Date

In 2025	2
Since 2024	10
Since 2021 (last 5 years)	42
Since 2016 (last 10 years)	101
Since 2006 (last 20 years)	168

Descriptor

Evaluators	291
Interrater Reliability	291
Evaluation Methods	78
Scoring	72
Foreign Countries	67
English (Second Language)	50
Second Language Learning	50
Scores	48
Comparative Analysis	47
Correlation	43
Rating Scales	42
Language Tests	41
Writing Evaluation	36
Student Evaluation	34
Evaluation Criteria	33
Scoring Rubrics	33
Performance Based Assessment	31
Higher Education	30
Oral Language	29
Statistical Analysis	26
Language Proficiency	25
Accuracy	22
Reliability	21
Test Items	21
Test Reliability	21
More ▼

Education Level

Higher Education	56
Postsecondary Education	51
Secondary Education	16
Elementary Education	12
Elementary Secondary Education	11
Middle Schools	7
Grade 6	5
Grade 7	5
High Schools	5
Junior High Schools	5
Adult Education	3
Early Childhood Education	3
Grade 4	3
Grade 8	3
Intermediate Grades	2
Kindergarten	2
Primary Education	2
Grade 1	1
Grade 11	1
Grade 2	1
Grade 3	1
Grade 5	1
Preschool Education	1
More ▼

Audience

Researchers	9
Practitioners	6
Teachers	2
Administrators	1
Policymakers	1

Location

California	6
China	5
Iran	5
Japan	5
United Kingdom	5
Australia	4
Hong Kong	4
Israel	4
Europe	3
Netherlands	3
Canada	2
Finland	2
India	2
Michigan	2
New Zealand	2
Singapore	2
South Africa	2
Sweden	2
Tennessee	2
Texas	2
Turkey	2
Vietnam	2
China (Beijing)	1
Cuba	1
European Union	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	3
Race to the Top	1

Assessments and Surveys

Test of English as a Foreign…	13
National Assessment of…	3
International English…	2
ACTFL Oral Proficiency…	1
Alabama High School…	1
Flesch Kincaid Grade Level…	1
General Educational…	1
Graduate Record Examinations	1
Modern Language Aptitude Test	1
Praxis Series	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1
Test of English for…	1
Torrance Tests of Creative…	1
United States Medical…	1
edTPA (Teacher Performance…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 291 results Save | Export

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

Examining Inter-Rater Reliability of Evaluators Judging Teacher Performance: Proposing an Alternative to Cohen's Kappa. CEME Technical Report. CEMETR-2022-02

Download full text

Lambert, Richard G.; Holcomb, T. Scott; Bottoms, Bryndle – Center for Educational Measurement and Evaluation, 2022

The validity of the Kappa coefficient of chance-corrected agreement has been questioned when the prevalence of specific rating scale categories is low and agreement between raters is high. The researchers proposed the Lambda Coefficient of Rater-Mediated Agreement as an alternative to Kappa to address these concerns. Lambda corrects for chance…

Descriptors: Interrater Reliability, Evaluators, Rating Scales, Teacher Evaluation

Investigating the Effect of Classroom-Based Feedback on Speaking Assessment: A Multifaceted Rasch Analysis

Peer reviewed

Direct link

Bijani, Houman; Hashempour, Bahareh; Ibrahim, Khaled Ahmed Abdel-Al; Orabah, Salim Said Bani; Heydarnejad, Tahereh – Language Testing in Asia, 2022

Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently,…

Descriptors: Oral Language, Language Tests, Feedback (Response), Bias

Same Grade for Different Reasons, Different Grades for the Same Reason?

Peer reviewed

Direct link

Ilona Rinne – Assessment & Evaluation in Higher Education, 2024

It is widely acknowledged in research that common criteria and aligned standards do not result in consistent assessment of such a complex performance as the final undergraduate thesis. Assessment is determined by examiners' understanding of rubrics and their views on thesis quality. There is still a gap in the research literature about how…

Descriptors: Foreign Countries, Undergraduate Students, Teacher Education Programs, Evaluation Criteria

Does Reviewing Experience Reduce Disagreement in Proposals Evaluation? Insights from Marie Sklodowska-Curie and COST Actions

Peer reviewed

Direct link

Seeber, Marco; Vlegels, Jef; Reimink, Elwin; Marusic, Ana; Pina, David G. – Research Evaluation, 2021

We have limited understanding of why reviewers tend to strongly disagree when scoring the same research proposal. Thus far, research that explored disagreement has focused on the characteristics of the proposal or the applicants, while ignoring the characteristics of the reviewers themselves. This article aims to address this gap by exploring…

Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Research Proposals

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Assessing Raters: What Factors Predict Discernment in Novice Creativity Raters?

Peer reviewed

Direct link

Ceh, Simon Majed; Edelmann, Carina; Hofer, Gabriela; Benedek, Mathias – Journal of Creative Behavior, 2022

Creativity research crucially relies on creativity evaluations by external raters, but it is not clear what properties characterize good raters. In the present study, we investigated whether rater personality and rater creativity are related to discernment (i.e., the ability to distinguish creative from uncreative responses) when evaluating…

Descriptors: Novices, Evaluators, Creativity, Personality Traits

Raters' Scoring Process in Assessment of Interpreting: An Empirical Study Based on Eye Tracking and Retrospective Verbalisation

Peer reviewed

Direct link

Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024

Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…

Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability

Agreement between Visual Inspection and Objective Analysis Methods: A Replication and Extension

Peer reviewed

Direct link

Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022

Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…

Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators

Examining Rating Quality in Rater-Mediated Activities for Standard-Item Alignment Research

Direct link

Yvette Jackson – ProQuest LLC, 2023

Rater-mediated activities in educational research occur when an expert judge or rater utilizes an instrument to judge persons or items and generates scale scores. Scale scores are from a subjective judgment and must undergo a quality control measure called rating quality. Rating quality in this study is broadly defined as the extent to which…

Descriptors: Educational Research, Evaluators, Test Theory, Item Response Theory

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

The Value of Expanding Perspectives on Assessment

Peer reviewed

Direct link

Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024

In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…

Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods

Exploring an Alternative to Record Motor Competence Assessment: Interrater and Intrarater Audio-Video Reliability

Peer reviewed

Direct link

Cristina Menescardi; Aida Carballo-Fazanes; Núria Ortega-Benavent; Isaac Estevan – Journal of Motor Learning and Development, 2024

The Canadian Agility and Movement Skill Assessment (CAMSA) is a valid and reliable circuit-based test of motor competence which can be used to assess children's skills in a live or recorded performance and then coded. We aimed to analyze the intrarater reliability of the CAMSA scores (total, time, and skill score) and time measured, by comparing…

Descriptors: Interrater Reliability, Evaluators, Scoring, Psychomotor Skills

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

The Whole Is More than the Sum of Its Parts -- Assessing Writing Using the Consensual Assessment Technique

Peer reviewed

Direct link

Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021

Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…

Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 20

Language Testing	15
ProQuest LLC	10
Educational and Psychological…	8
ETS Research Report Series	7
Educational Measurement:…	7
Applied Measurement in…	6
Language Assessment Quarterly	6
Language Testing in Asia	4
Studies in Second Language…	4
Advances in Health Sciences…	3
Applied Psychological…	3
Assessment & Evaluation in…	3
English Language Teaching	3
Journal of Educational…	3
Journal of Speech, Language,…	3
Personnel Psychology	3
Advances in Language and…	2
Assessment Update	2
Assessment in Education:…	2
Autism: The International…	2
Education and Information…	2
Educational Assessment	2
Evaluation and the Health…	2
International Educational…	2
International Journal of Art…	2
More ▼

Wind, Stefanie A.	5
Jaeger, Richard M.	4
Lunz, Mary E.	4
Plake, Barbara S.	4
Coniam, David	3
Engelhard, George, Jr.	3
Raymond, Mark R.	3
Ahmadi, Alireza	2
Bejar, Isaac I.	2
Bijani, Houman	2
Goe, Laura	2
Hertz, Norman R.	2
Holdheide, Lynn	2
Houston, Walter M.	2
Johnson, Martin	2
Lowry, Phillip E.	2
Miller, Tricia	2
Myford, Carol M.	2
Saito, Kazuya	2
Sireci, Stephen G.	2
Stahl, John A.	2
Strong, Michael	2
Thompson, Irene	2
Weigle, Sara Cushing	2
More ▼

Journal Articles	200
Reports - Research	200
Reports - Evaluative	56
Speeches/Meeting Papers	55
Tests/Questionnaires	24
Reports - Descriptive	15
Information Analyses	11
Dissertations/Theses -…	10
Opinion Papers	4
Guides - Non-Classroom	3
Numerical/Quantitative Data	2
Books	1
Collected Works - Serials	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - General	1
Non-Print Media	1
Reference Materials - General	1
Reports - General	1
More ▼