Publication Date
In 2025 | 2 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 17 |
Since 2016 (last 10 years) | 38 |
Since 2006 (last 20 years) | 53 |
Descriptor
Comparative Analysis | 57 |
Evaluators | 57 |
Scores | 57 |
English (Second Language) | 25 |
Second Language Learning | 24 |
Foreign Countries | 20 |
Language Tests | 17 |
Essays | 12 |
Computer Software | 11 |
Second Language Instruction | 11 |
Writing Evaluation | 11 |
More ▼ |
Source
Author
Attali, Yigal | 2 |
Ahmadi, Alireza | 1 |
Alexander James Kwako | 1 |
Baldwin, Peter | 1 |
Beaudin, Barbara | 1 |
Bowler, Mark C. | 1 |
Brannen, Kathleen | 1 |
Breyer, F. Jay | 1 |
Canfield, Allison R. | 1 |
Cardoso, Walcir | 1 |
Chafouleas, Sandra M. | 1 |
More ▼ |
Publication Type
Journal Articles | 47 |
Reports - Research | 45 |
Tests/Questionnaires | 7 |
Dissertations/Theses -… | 6 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 3 |
Education Level
Higher Education | 18 |
Postsecondary Education | 14 |
Secondary Education | 4 |
Elementary Secondary Education | 2 |
High Schools | 2 |
Grade 12 | 1 |
Middle Schools | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
International English… | 2 |
National Assessment of… | 1 |
Test of English as a Foreign… | 1 |
Test of English for… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Timothy J. Wood; Vijay J. Daniels; Debra Pugh; Claire Touchie; Samantha Halman; Susan Humphrey-Murto – Advances in Health Sciences Education, 2024
First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to…
Descriptors: Evaluators, Work Environment, Decision Making, Video Technology
Elizabeth L. Wetzler; Kenneth S. Cassidy; Margaret J. Jones; Chelsea R. Frazier; Nickalous A. Korbut; Chelsea M. Sims; Shari S. Bowen; Michael Wood – Teaching of Psychology, 2025
Background: Generative artificial intelligence (AI) represents a potentially powerful, time-saving tool for grading student essays. However, little is known about how AI-generated essay scores compare to human instructor scores. Objective: The purpose of this study was to compare the essay grading scores produced by AI with those of human…
Descriptors: Essays, Writing Evaluation, Scores, Evaluators
Christopher D. Daniel – ProQuest LLC, 2024
Districts spend thousands of dollars on computerized teacher screeners without knowing if they are identifying the most effective teacher. Hiring quality staff is one of the most important job functions of a principal, and many times a teacher screener score may eliminate an effective teacher. The current study examined the value of teacher…
Descriptors: Teacher Evaluation, Scores, Screening Tests, Teacher Effectiveness
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Jia, Qinjin; Young, Mitchell; Xiao, Yunkai; Cui, Jialin; Liu, Chengyuan; Rashid, Parvez; Gehringer, Edward – Journal of Educational Data Mining, 2022
Instant feedback plays a vital role in promoting academic achievement and student success. In practice, however, delivering timely feedback to students can be challenging for instructors for a variety of reasons (e.g., limited teaching resources). In many cases, feedback arrives too late for learners to act on the advice and reinforce their…
Descriptors: Student Projects, Learning Analytics, Intelligent Tutoring Systems, Feedback (Response)
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Nagle, Charlie L.; Trofimovich, Pavel; O'Brien, Mary Grantham; Kennedy, Sara – Modern Language Journal, 2022
Comprehensibility has emerged as a useful and intuitive means of globally evaluating second language (L2) speakers in many research and instructional contexts. In most cases, L2 speakers' comprehensibility is assessed by external listeners who do not engage in extensive communication with the speakers, even though the degree to which a speaker is…
Descriptors: Evaluators, Intelligibility, Pronunciation, Task Analysis
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Alexander James Kwako – ProQuest LLC, 2023
Automated assessment using Natural Language Processing (NLP) has the potential to make English speaking assessments more reliable, authentic, and accessible. Yet without careful examination, NLP may exacerbate social prejudices based on gender or native language (L1). Current NLP-based assessments are prone to such biases, yet research and…
Descriptors: Gender Bias, Natural Language Processing, Native Language, Computational Linguistics
Vasfiye Geçkin; Ebru Kiziltas; Çagatay Çinar – Journal of Educational Technology and Online Learning, 2023
The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of…
Descriptors: Writing Evaluation, Second Language Learning, Second Language Instruction, Language Proficiency
Yuko Hayashi; Yusuke Kondo; Yutaka Ishii – Innovation in Language Learning and Teaching, 2024
Purpose: This study builds a new system for automatically assessing learners' speech elicited from an oral discourse completion task (DCT), and evaluates the prediction capability of the system with a view to better understanding factors deemed influential in predicting speaking proficiency scores and the pedagogical implications of the system.…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Japanese
Seedhouse, Paul – ELT Journal, 2019
This article investigates the central role of topic in the IELTS Speaking Test (IST). Topic has developed a dual personality in this interactional setting: topic-as-script is the scripted statement of topic on the examiner's cards prior to the interaction, whereas topic-as-action is how topic is developed by the candidate during the course of the…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Personality Traits
Gingerich, Andrea; Schokking, Edward; Yeates, Peter – Advances in Health Sciences Education, 2018
Recent literature places more emphasis on assessment comments rather than relying solely on scores. Both are variable, however, emanating from assessment judgements. One established source of variability is "contrast effects": scores are shifted away from the depicted level of competence in a preceding encounter. The shift could arise…
Descriptors: Evaluation Methods, Scores, Intervention, Schemata (Cognition)