ERIC - Search Results

Publication Date

In 2025	3
Since 2024	5
Since 2021 (last 5 years)	23

Descriptor

Evaluators	23
Models	23
Accuracy	8
Comparative Analysis	7
Computer Software	7
Decision Making	6
Item Analysis	6
Item Response Theory	6
Evaluation Methods	5
Scoring	5
Artificial Intelligence	4
College Faculty	4
Essays	4
Foreign Countries	4
Mathematics Tests	4
Rating Scales	4
Specialists	4
Writing Evaluation	4
Computational Linguistics	3
Creativity	3
Higher Education	3
Learning Processes	3
Simulation	3
Student Attitudes	3
Teacher Attitudes	3
More ▼

Publication Type

Journal Articles	18
Reports - Research	18
Speeches/Meeting Papers	3
Dissertations/Theses -…	2
Tests/Questionnaires	2
Information Analyses	1
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Higher Education	6
Postsecondary Education	6
Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 4	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Location

China	1
Massachusetts	1
Saudi Arabia	1
Spain (Barcelona)	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Employing a Hierarchical Rater Models for Automated Scoring: Scope Review on the Application in Educational Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Akif Avcu – Malaysian Online Journal of Educational Technology, 2025

This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…

Descriptors: Automation, Scoring, Models, Educational Assessment

Planning Missing Data Designs for Human Ratings in Creativity Research: A Practical Guide

Peer reviewed

Direct link

Boris Forthmann; Benjamin Goecke; Roger E. Beaty – Creativity Research Journal, 2025

Human ratings are ubiquitous in creativity research. Yet, the process of rating responses to creativity tasks -- typically several hundred or thousands of responses, per rater -- is often time-consuming and expensive. Planned missing data designs, where raters only rate a subset of the total number of responses, have been recently proposed as one…

Descriptors: Creativity, Research, Researchers, Research Methodology

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Exploring Difficult-to-Score Essays with a Hyperbolic Cosine Accuracy Model and Coh-Metrix Indices

Peer reviewed

Direct link

Wang, Jue; Engelhard, George; Combs, Trenton – Journal of Experimental Education, 2023

Unfolding models are frequently used to develop scales for measuring attitudes. Recently, unfolding models have been applied to examine rater severity and accuracy within the context of rater-mediated assessments. One of the problems in applying unfolding models to rater-mediated assessments is that the substantive interpretations of the latent…

Descriptors: Writing Evaluation, Scoring, Accuracy, Computational Linguistics

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

An Item Response Theory Approach to Enhance Peer Assessment Effectiveness in Massive Open Online Courses

Peer reviewed

Direct link

Nakayama, Minoru; Sciarrone, Filippo; Temperini, Marco; Uto, Masaki – International Journal of Distance Education Technologies, 2022

Massive open on-line courses (MOOCs) are effective and flexible resources to educate, train, and empower populations. Peer assessment (PA) provides a powerful pedagogical strategy to support educational activities and foster learners' success, also where a huge number of learners is involved. Item response theory (IRT) can model students'…

Descriptors: Item Response Theory, Peer Evaluation, MOOCs, Models

Evaluation Is Creation: Self and Social Judgments of Creativity across the Four-C Model

Peer reviewed

Direct link

Denis Dumas; James C. Kaufman – Educational Psychology Review, 2024

Who should evaluate the originality and task-appropriateness of a given idea has been a perennial debate among psychologists of creativity. Here, we argue that the most relevant evaluator of a given idea depends crucially on the level of expertise of the person who generated it. To build this argument, we draw on two complimentary theoretical…

Descriptors: Decision Making, Creativity, Task Analysis, Psychologists

A School of the Arts Embedded Evaluation: Defining a System of Values as Curricular Design

Direct link

Renato Britto Ferreira – ProQuest LLC, 2024

Modern-day educators often share a sense they lack voice and agency with school administration. A classic example is curriculum development, where third-party designers develop uncontextualized curricula, and teachers then must implement the design even if inefficient and ineffective. What happens if this identical situation occurs at the program…

Descriptors: Art Education, School Administration, Curriculum Design, Program Design

Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023

Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…

Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests

Forced-Choice Ranking Models for Raters' Ranking Data

Peer reviewed

Direct link

Hung, Su-Pin; Huang, Hung-Yu – Journal of Educational and Behavioral Statistics, 2022

To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees' performance also contribute to rater bias or errors; consequently, forced-choice items have recently…

Descriptors: Evaluation Methods, Rating Scales, Item Analysis, Preferences

Developing Competency Frameworks Using Natural Language Processing: An Exploratory Study

Peer reviewed

Direct link

Garman, Andrew N.; Erwin, Taylor S.; Garman, Tyler R.; Kim, Dae Hyun – Journal of Competency-Based Education, 2021

Background: Competency models provide useful frameworks for organizing learning and assessment programs, but their construction is both time intensive and subject to perceptual biases. Some aspects of model development may be particularly well-suited to automation, specifically natural language processing (NLP), which could also help make them…

Descriptors: Natural Language Processing, Automation, Guidelines, Leadership Effectiveness

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Teacher Leaders' Participation in Teacher Evaluation

Peer reviewed
PDF on ERIC

Download full text

Bradley-Levine, Jill – International Journal of Teacher Leadership, 2022

This article shares the findings of a qualitative case study examining the experiences of teacher leaders as they engaged as teacher evaluators alongside school principals. Data collection included observations and interviews with four teacher leaders and three school principals to answer these research questions: (1) What TDEM structures have…

Descriptors: Teacher Leadership, Teacher Evaluation, Teacher Participation, Principals

How to Embed SRL in Online Learning Settings? Design through Learning Analytics and Personalized Learning Design in Moodle

Peer reviewed
PDF on ERIC

Download full text

Lluch Molins, Laia; Cano García, Elena – Journal of New Approaches in Educational Research, 2023

One of the main generic competencies in Higher Education is "Learning to Learn". The key component of this competence is the capacity for self-regulated learning (SRL). For this competence to be developed, peer feedback seems useful because it fosters evaluative judgement. Following the principles of peer feedback processes, an online…

Descriptors: Learning Analytics, Learning Management Systems, Peer Evaluation, Higher Education

Previous Page | Next Page »

Pages: 1 | 2

International Educational…	2
ProQuest LLC	2
Applied Measurement in…	1
Arab World English Journal	1
Creativity Research Journal	1
Educational Psychology Review	1
Educational and Psychological…	1
English Language Teaching	1
Grantee Submission	1
International Journal of…	1
International Journal of…	1
Journal of Competency-Based…	1
Journal of Educational Data…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Experimental…	1
Journal of New Approaches in…	1
Language Testing	1
Malaysian Online Journal of…	1
Practical Assessment,…	1
Quality in Higher Education	1
More ▼

Akif Avcu	1
Bardesi, Hisham Jameel	1
Benjamin Goecke	1
Boris Forthmann	1
Bosch, Nigel	1
Bradley-Levine, Jill	1
Cano García, Elena	1
Carl Westine	1
Cicek, Kadir	1
Combs, Trenton	1
Denis Dumas	1
Doewes, Afrizal	1
Engelhard, George	1
Erwin, Taylor S.	1
Finch, Holmes	1
Garba, Ibrahim	1
Garman, Andrew N.	1
Garman, Tyler R.	1
Gierl, Mark J.	1
Heffernan, Neil	1
Huang, Hung-Yu	1
Hung, Su-Pin	1
James C. Kaufman	1
Khorramdel, Lale	1
Kim, Dae Hyun	1
More ▼