Publication Date
In 2025 | 3 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 18 |
Since 2016 (last 10 years) | 43 |
Since 2006 (last 20 years) | 94 |
Descriptor
Evaluators | 137 |
Models | 137 |
Evaluation Methods | 59 |
Program Evaluation | 39 |
Foreign Countries | 26 |
Evaluation | 16 |
Higher Education | 16 |
Decision Making | 15 |
Case Studies | 12 |
Item Response Theory | 12 |
Teaching Methods | 11 |
More ▼ |
Source
Author
Brown, Robert D. | 3 |
Engelhard, George, Jr. | 3 |
Wang, Jue | 3 |
Wind, Stefanie A. | 3 |
Coryn, Chris L. S. | 2 |
Cousins, J. Bradley | 2 |
Good, H. M. | 2 |
King, Jean A. | 2 |
Morris, Michael | 2 |
Scheirer, Mary Ann | 2 |
Scriven, Michael | 2 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 1 |
Teachers | 1 |
Location
Texas | 3 |
Australia | 2 |
Canada | 2 |
New York | 2 |
United States | 2 |
Afghanistan | 1 |
Asia | 1 |
China | 1 |
Denmark | 1 |
Finland | 1 |
Germany | 1 |
More ▼ |
Laws, Policies, & Programs
Americans with Disabilities… | 1 |
Education for All Handicapped… | 1 |
Elementary and Secondary… | 1 |
Higher Education Act Title IX | 1 |
No Child Left Behind Act 2001 | 1 |
Rehabilitation Act 1973 | 1 |
Assessments and Surveys
National Assessment of… | 1 |
Test of English as a Foreign… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Akif Avcu – Malaysian Online Journal of Educational Technology, 2025
This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…
Descriptors: Automation, Scoring, Models, Educational Assessment
Boris Forthmann; Benjamin Goecke; Roger E. Beaty – Creativity Research Journal, 2025
Human ratings are ubiquitous in creativity research. Yet, the process of rating responses to creativity tasks -- typically several hundred or thousands of responses, per rater -- is often time-consuming and expensive. Planned missing data designs, where raters only rate a subset of the total number of responses, have been recently proposed as one…
Descriptors: Creativity, Research, Researchers, Research Methodology
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Wang, Jue; Engelhard, George; Combs, Trenton – Journal of Experimental Education, 2023
Unfolding models are frequently used to develop scales for measuring attitudes. Recently, unfolding models have been applied to examine rater severity and accuracy within the context of rater-mediated assessments. One of the problems in applying unfolding models to rater-mediated assessments is that the substantive interpretations of the latent…
Descriptors: Writing Evaluation, Scoring, Accuracy, Computational Linguistics
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Nakayama, Minoru; Sciarrone, Filippo; Temperini, Marco; Uto, Masaki – International Journal of Distance Education Technologies, 2022
Massive open on-line courses (MOOCs) are effective and flexible resources to educate, train, and empower populations. Peer assessment (PA) provides a powerful pedagogical strategy to support educational activities and foster learners' success, also where a huge number of learners is involved. Item response theory (IRT) can model students'…
Descriptors: Item Response Theory, Peer Evaluation, MOOCs, Models
Denis Dumas; James C. Kaufman – Educational Psychology Review, 2024
Who should evaluate the originality and task-appropriateness of a given idea has been a perennial debate among psychologists of creativity. Here, we argue that the most relevant evaluator of a given idea depends crucially on the level of expertise of the person who generated it. To build this argument, we draw on two complimentary theoretical…
Descriptors: Decision Making, Creativity, Task Analysis, Psychologists
Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019
The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…
Descriptors: Evaluative Thinking, Preferences, Evaluators, Models
Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019
When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…
Descriptors: Measurement, Models, Evaluators, Simulation
Hung, Su-Pin; Huang, Hung-Yu – Journal of Educational and Behavioral Statistics, 2022
To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees' performance also contribute to rater bias or errors; consequently, forced-choice items have recently…
Descriptors: Evaluation Methods, Rating Scales, Item Analysis, Preferences
Zamir, Sara – Quality Assurance in Education: An International Perspective, 2019
Purpose: As the school evaluator's role is multifaceted and the school elevator is the school principal's subordinate, this paper aims to present the school evaluator's complex conduct to achieve a better understanding of his or her functioning. Design/methodology/approach: Theoretical paper. Findings: The two critical dimensions connected to the…
Descriptors: Institutional Evaluation, Accountability, Schools, Evaluators
Garman, Andrew N.; Erwin, Taylor S.; Garman, Tyler R.; Kim, Dae Hyun – Journal of Competency-Based Education, 2021
Background: Competency models provide useful frameworks for organizing learning and assessment programs, but their construction is both time intensive and subject to perceptual biases. Some aspects of model development may be particularly well-suited to automation, specifically natural language processing (NLP), which could also help make them…
Descriptors: Natural Language Processing, Automation, Guidelines, Leadership Effectiveness
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Bradley-Levine, Jill – International Journal of Teacher Leadership, 2022
This article shares the findings of a qualitative case study examining the experiences of teacher leaders as they engaged as teacher evaluators alongside school principals. Data collection included observations and interviews with four teacher leaders and three school principals to answer these research questions: (1) What TDEM structures have…
Descriptors: Teacher Leadership, Teacher Evaluation, Teacher Participation, Principals