Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 26 |
Since 2016 (last 10 years) | 95 |
Since 2006 (last 20 years) | 236 |
Descriptor
Comparative Analysis | 327 |
Interrater Reliability | 327 |
Foreign Countries | 84 |
Correlation | 65 |
Evaluation Methods | 53 |
Statistical Analysis | 53 |
Evaluators | 47 |
Scores | 44 |
Second Language Learning | 42 |
Scoring | 41 |
Student Evaluation | 41 |
More ▼ |
Source
Author
Coniam, David | 3 |
Lunz, Mary E. | 3 |
Attali, Yigal | 2 |
Beach, Kristen D. | 2 |
Bocian, Kathleen M. | 2 |
Bothe, Anne K. | 2 |
Chavez, Oscar | 2 |
Derby, K. Mark | 2 |
Gillan, Nicola | 2 |
Grouws, Douglas A. | 2 |
Hestenes, Linda L. | 2 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 4 |
Researchers | 4 |
Teachers | 2 |
Location
China | 8 |
Netherlands | 7 |
United Kingdom | 7 |
Australia | 6 |
Turkey | 6 |
United States | 6 |
Florida | 5 |
Iran | 5 |
Taiwan | 5 |
United Kingdom (England) | 5 |
Washington | 5 |
More ▼ |
Laws, Policies, & Programs
Improving Americas Schools… | 1 |
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Temporary Assistance for… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Kinnear, George; Bennett, Max; Binnie, Rachel; Bolt, Róisín; Zheng, Yinglan – Teaching Mathematics and Its Applications, 2020
The MATH taxonomy classifies questions according to the mathematical skills required to answer them. It was created to aid the development of more balanced assessments in undergraduate mathematics and has since been used to compare different assessment regimes across school and university. To date, there has been no systematic investigation of the…
Descriptors: Taxonomy, Mathematics Instruction, Teaching Methods, Reliability
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023
Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…
Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development
Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…
Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)
Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022
Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…
Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators
Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018
Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…
Descriptors: Grading, Models, Reliability, Validity
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019
Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…
Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials
Heather Raithel – ProQuest LLC, 2023
A mixed methods action research study was designed to answer three research questions based on inter-rater reliability (IRR) in compliance calls for transition at a state education agency, perceived confidence levels in making and discussing compliance calls, and perceived confidence in sharing transition resources. An innovation based on…
Descriptors: Public Agencies, Interrater Reliability, Compliance (Legal), Comparative Analysis
Whalen, Kate; Paez, Antonio – Journal of Geography, 2022
Experiential education partnered with guided reflection is thought to support students with higher-order thinking skills. In this study, 44 reflections from two university-level sustainability courses were compared. In both courses students were asked to write a reflection, but only one course used the Reflective Learning Framework (RLF). Tests of…
Descriptors: Geography Instruction, Thinking Skills, Experiential Learning, Sustainability
Mandy, William; Clarke, Kiri; McKenner, Michele; Strydom, Andre; Crabtree, Jason; Lai, Meng-Chuan; Allison, Carrie; Baron-Cohen, Simon; Skuse, David – Journal of Autism and Developmental Disorders, 2018
We developed a brief, informant-report interview for assessing autism spectrum conditions (ASC) in adults, called the Developmental, Dimensional and Diagnostic Interview-Adult Version (3Di-Adult); and completed a preliminary evaluation. Informant reports were collected for participants with ASC (n = 39), a non-clinical comparison group (n = 29)…
Descriptors: Autism, Pervasive Developmental Disorders, Adults, Diagnostic Tests
Tülübas, Tijen; Demirkol, Murat; Ozdemir, Tuncay Yavuz; Polat, Hakan; Karakose, Turgut; Yirci, Ramazan – Educational Process: International Journal, 2023
Background/purpose: ChatGPT, a recent form of AI-based language model, have garnered interest among people from diverse backgrounds with its immersive capabilities. Using ChatGPT to support or generate scientific research has also created an ongoing debate over its advantages versus risks. The present study aimed to conduct an AI-enabled research…
Descriptors: Artificial Intelligence, Emergency Programs, Distance Education, COVID-19