Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 23 |
Descriptor
Interrater Reliability | 272 |
Scoring | 67 |
Evaluation Methods | 64 |
Higher Education | 64 |
Test Reliability | 59 |
Evaluators | 55 |
Test Construction | 44 |
Elementary Secondary Education | 43 |
Performance Based Assessment | 41 |
Test Validity | 38 |
Writing Evaluation | 37 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 10 |
Postsecondary Education | 7 |
Elementary Secondary Education | 4 |
Secondary Education | 3 |
Grade 4 | 2 |
Grade 6 | 2 |
Grade 8 | 2 |
High Schools | 2 |
Elementary Education | 1 |
Grade 10 | 1 |
Audience
Researchers | 58 |
Practitioners | 7 |
Teachers | 3 |
Administrators | 2 |
Counselors | 1 |
Location
Australia | 3 |
California | 3 |
Nevada | 3 |
Illinois | 2 |
Netherlands | 2 |
Pennsylvania | 2 |
California (Berkeley) | 1 |
Canada | 1 |
Cuba | 1 |
Denmark | 1 |
Egypt | 1 |
More ▼ |
Laws, Policies, & Programs
Education Consolidation… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Lambert, Richard G.; Holcomb, T. Scott; Bottoms, Bryndle L. – Center for Educational Measurement and Evaluation, 2021
The validity of the Kappa coefficient of chance-corrected agreement has been questioned when the prevalence of specific rating scale categories is low and agreement between raters is high. The researchers proposed the Lambda Coefficient of Rater-Mediated Agreement as an alternative to Kappa to address these concerns. Lambda corrects for chance…
Descriptors: Interrater Reliability, Teacher Evaluation, Test Validity, Evaluation Methods
Fromm, Davida; Katta, Saketh; Paccione, Mason; Hecht, Sophia; Greenhouse, Joel; MacWhinney, Brian; Schnur, Tatiana T. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: Analysis of connected speech in the field of adult neurogenic communication disorders is essential for research and clinical purposes, yet time and expertise are often cited as limiting factors. The purpose of this project was to create and evaluate an automated program to score and compute the measures from the Quantitative Production…
Descriptors: Speech, Automation, Statistical Analysis, Adults
Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023
Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…
Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy
McCarthy, Kathryn S.; Magliano, Joseph P.; Snyder, Jacob O.; Kenney, Elizabeth A.; Newton, Natalie N.; Perret, Cecile A.; Knezevic, Melanie; Allen, Laura K.; McNamara, Danielle S. – Grantee Submission, 2021
The objective in the current paper is to examine the processes of how our research team negotiated meaning using an iterative design approach as we established, developed, and refined a rubric to capture comprehension processes and strategies evident in students' verbal protocols. The overarching project comprises multiple data sets, multiple…
Descriptors: Scoring Rubrics, Interrater Reliability, Design, Learning Processes
Schack, Edna O.; Dueber, David; Thomas, Jonathan Norris; Fisher, Molly H.; Jong, Cindy – AERA Online Paper Repository, 2019
Scoring of teachers' noticing responses is typically burdened with rater bias and reliance upon interrater consensus. The authors sought to make the scoring process more objective, equitable, and generalizable. The development process began with a description of response characteristics for each professional noticing component disconnected from…
Descriptors: Models, Teacher Evaluation, Observation, Bias
Siqi Huang – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
The goal of this paper is twofold. First, the paper clarifies and elaborates on an important theoretical construct called orientation with respect to understanding in mathematics, which denotes the degree to which students exhibit an inclination towards and demonstrate an earnest concern for understanding in mathematical learning. Second, the…
Descriptors: Mathematics Instruction, Teaching Methods, Problem Solving, Reliability
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Yarbro, Jeffrey T.; Olney, Andrew M. – Grantee Submission, 2021
This paper explores the concept of dynamically generating definitions using a deep-learning model. We do this by creating a dataset that contains definition entries and contexts associated with each definition. We then fine-tune a GPT-2 based model on the dataset to allow the model to generate contextual definitions. We evaluate our model with…
Descriptors: Definitions, Learning Processes, Models, Context Effect
Klecker, Beverly M. – Online Submission, 2018
The Council for the Accreditation of Educator Preparation Programs (CAEP), required evidence of reliability and validity of measures used in a university's Educator Preparation Program (EPP). This paper describes processes that provided this evidence for the Teacher Performance Assessment (TPA). Literature examined included Messick (1989), Linn…
Descriptors: College Faculty, Teacher Evaluation, Performance Based Assessment, Test Validity
Bosch, Nigel; Crues, R. Wes; Shaik, Najmuddin; Paquette, Luc – Grantee Submission, 2020
Online courses often include discussion forums, which provide a rich source of data to better understand and improve students' learning experiences. However, forum messages frequently contain private information that prevents researchers from analyzing these data. We present a method for discovering and redacting private information including…
Descriptors: Privacy, Discussion Groups, Asynchronous Communication, Methods
Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021
Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…
Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists
McGough, David J. – AERA Online Paper Repository, 2017
This paper describes the implementation of an inter-rater reliability measure for assessing portfolio scores in a teacher education program. The reliability coefficient for the portfolio scores from completers of a newly revised program were compared with the reliability coefficient of the scores from a second set of reviewers who discussed the…
Descriptors: Interrater Reliability, Teacher Education Programs, Program Evaluation, Portfolio Assessment
Allen, Laura K.; Crossley, Scott A.; McNamara, Danielle S. – Grantee Submission, 2015
We investigated linguistic factors that relate to misalignment between students' and teachers' ratings of essay quality. Students (n = 126) wrote essays and rated the quality of their work. Teachers then provided their own ratings of the essays. Results revealed that students who were less accurate in their self-assessments produced essays that…
Descriptors: Essays, Scores, Natural Language Processing, Interrater Reliability
Milanowski, Anthony T. – Online Submission, 2011
After decades of disinterest, evaluation of the performance of elementary and secondary teachers in the United States has become an important educational policy issue. As U.S. states and districts have tried to upgrade their evaluation processes, one of the models that has been increasingly used is the Framework for Teaching. This paper summarizes…
Descriptors: Evidence, Teacher Effectiveness, Teacher Evaluation, Observation
Al-Gawhary, Wedad; Kambouri, Maria – International Association for Development of the Information Society, 2012
The purpose of this case study was to measure the impact of using ICT in Individual Learning Programmes of students with learning disabilities. Twenty five students and thirteen teachers took part in the research which was based on classroom observations. The Kappa coefficient was employed as a measure to statistically quantify the students'…
Descriptors: Foreign Countries, Special Needs Students, Down Syndrome, Autism