ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	11
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	32

Descriptor

Comparative Analysis	47
Evaluators	47
Interrater Reliability	47
Second Language Learning	14
Correlation	12
English (Second Language)	12
Evaluation Methods	12
Foreign Countries	11
Scoring	11
Essays	10
Student Evaluation	10
Rating Scales	9
Second Language Instruction	9
Writing Evaluation	9
Evaluation Criteria	8
Language Tests	8
Accuracy	7
Computer Assisted Testing	7
Scores	7
Computer Software	6
Decision Making	6
Language Proficiency	6
Performance Based Assessment	6
Statistical Analysis	6
Oral Language	5
More ▼

Publication Type

Journal Articles	35
Reports - Research	35
Reports - Evaluative	10
Speeches/Meeting Papers	10
Tests/Questionnaires	6
Information Analyses	3
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1

Education Level

Higher Education	11
Postsecondary Education	11
Secondary Education	3
Adult Education	1
Elementary Education	1
Elementary Secondary Education	1
Grade 1	1
Grade 11	1
High Schools	1
Kindergarten	1

Audience

Practitioners	1
Researchers	1

Location

China	3
Iran	2
Europe	1
Hong Kong	1
Singapore	1
South Africa	1
Sweden	1
Tennessee	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 47 results Save | Export

Agreement between Visual Inspection and Objective Analysis Methods: A Replication and Extension

Peer reviewed

Direct link

Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022

Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…

Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Do You Mean What I Mean? Comparing Teacher Performance Self-Scores and Evaluator-Generated Scores

Peer reviewed

Direct link

Hunter, Seth B. – Journal of Education Human Resources, 2023

Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability

Reliability of the Reflective Learning Framework for Assessing Higher-Order Thinking in Geography and Sustainability Courses

Peer reviewed

Direct link

Whalen, Kate; Paez, Antonio – Journal of Geography, 2022

Experiential education partnered with guided reflection is thought to support students with higher-order thinking skills. In this study, 44 reflections from two university-level sustainability courses were compared. In both courses students were asked to write a reflection, but only one course used the Reflective Learning Framework (RLF). Tests of…

Descriptors: Geography Instruction, Thinking Skills, Experiential Learning, Sustainability

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment

Peer reviewed
PDF on ERIC

Download full text

Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022

The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…

Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Writing Scale Effects on Raters: An Exploratory Study

Peer reviewed

Direct link

Jeong, Heejeong – Language Testing in Asia, 2019

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Doctorates by Thesis and Publication in Clinical Medicine: An Analysis of Examiners' Reports

Peer reviewed

Direct link

Ramlall, Suvira; Singaram, V. S.; Sommerville, T. E. – Perspectives in Education, 2019

National and institutional policies to escalate the production of doctorates have raised concerns about the quality of PhDs in South Africa. This study evaluates examiner reports of doctorates by thesis and publication in clinical medicine to ascertain the criteria that examiners used to define a successful doctoral thesis. A qualitative…

Descriptors: Doctoral Dissertations, Educational Policy, Medical Research, Foreign Countries

Rater Effects on L2 Oral Assessment: Focusing on Accent Familiarity of L2 Teachers

Peer reviewed

Direct link

Park, Mi Sun – Language Assessment Quarterly, 2020

In the present study, I examined the effects of rater characteristics, in particular, raters' familiarity with a foreign accent, on the assessment of second language (L2) pronunciation. Forty-three native English-speaking teachers were divided into three groups according to their reported types of familiarity with Korean accents: heritage,…

Descriptors: Evaluators, Familiarity, Second Language Learning, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Language Assessment Quarterly	3
Language Testing	2
Language Testing in Asia	2
Advances in Physiology…	1
Applied Measurement in…	1
Applied Psychological…	1
British Journal of…	1
Canadian Modern Language…	1
ETS Research Report Series	1
Educational Assessment,…	1
Educational Measurement:…	1
Educational Research Quarterly	1
Educational Research and…	1
English Language Teaching	1
English Teaching	1
International Journal of…	1
JALT CALL Journal	1
Journal of Applied Behavior…	1
Journal of Baltic Science…	1
Journal of Education Human…	1
Journal of Geography	1
Journal of Speech, Language,…	1
Journal of University…	1
Measurement:…	1
Perspectives in Education	1
More ▼

Coniam, David	2
Myford, Carol M.	2
Wind, Stefanie A.	2
Ahmadi, Alireza	1
Amanda Huee-Ping Wong	1
Armijo-Olivo, Susan	1
Arslan Mancar, Sinem	1
Attali, Yigal	1
Beasley, T. Mark	1
Beilinson, Jill S.	1
Bell, John F.	1
Breyer, F. Jay	1
Campbell, Sandy	1
Coggins, Truman E.	1
Collins, Angelo	1
Craig, Rodger	1
Crews, William E., Jr.	1
Dionne, Jean-Paul	1
Donaldson, Amy L.	1
Erickson, Gudrun	1
Firmin, Michael W.	1
Guangtian Zhu	1
Gulleroglu, H. Deniz	1
Gustafsson, Jan-Eric	1
More ▼