Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 10 |
| Since 2017 (last 10 years) | 31 |
| Since 2007 (last 20 years) | 99 |
Descriptor
| Interrater Reliability | 144 |
| Models | 144 |
| Evaluation Methods | 30 |
| Foreign Countries | 26 |
| Correlation | 21 |
| Measurement Techniques | 21 |
| Scores | 20 |
| Evaluators | 18 |
| Scoring | 17 |
| Comparative Analysis | 16 |
| Rating Scales | 16 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 11 |
| Policymakers | 1 |
| Practitioners | 1 |
Location
| Germany | 4 |
| Netherlands | 4 |
| Florida | 3 |
| Sweden | 3 |
| Estonia | 2 |
| Indonesia | 2 |
| Israel | 2 |
| North Carolina | 2 |
| Norway | 2 |
| Oregon | 2 |
| Pennsylvania | 2 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 2 |
| Advanced Placement… | 1 |
| Graduate Record Examinations | 1 |
| Home Observation for… | 1 |
| Praxis Series | 1 |
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Lottridge, Susan; Woolf, Sherri; Young, Mackenzie; Jafari, Amir; Ormerod, Chris – Journal of Computer Assisted Learning, 2023
Background: Deep learning methods, where models do not use explicit features and instead rely on implicit features estimated during model training, suffer from an explainability problem. In text classification, saliency maps that reflect the importance of words in prediction are one approach toward explainability. However, little is known about…
Descriptors: Documentation, Learning Strategies, Models, Prediction
Feldberg, Zachary R. – ProQuest LLC, 2023
Cognitive diagnostic models (CDMs) provide pedagogically relevant information in the form of a student profile of multiple binary categorizations of students into mastery or nonmastery statuses on latent traits called attributes. Federal educational accountability requires accountability measures to designate students into one of at least three…
Descriptors: Accountability, Standards, Cutting Scores, Models
Gilstrap, Donald L.; Whitver, Sara Maurice; Scalfani, Vincent F.; Bray, Nathaniel J. – Innovative Higher Education, 2023
This article explores how well bibliometrics and altmetrics reflect research impact in relation to Boyer's Model of the Scholarship. Indices used for both types of metrics are explored and discussed while including an analysis on primary methodological works performed on each in the literature to date. As confirmatory in nature, we chose as our…
Descriptors: Bibliometrics, Models, Scholarship, Research
Lamprianou, Iasonas – Sociological Methods & Research, 2023
This study investigates inter- and intracoder reliability, proposing a new approach based on social network analysis (SNA) and exponential random graph models (ERGM). During a recent exit poll, the responses of voters to two open-ended questions were recorded. A coding experiment was conducted where a group of coders coded a sample of text…
Descriptors: Interrater Reliability, Coding, Social Networks, Network Analysis
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Lulu Desia Mutiani Rahmayuni; Siti Sriyatib; Diah Kusumawaty – Journal of Biological Education Indonesia (Jurnal Pendidikan Biologi Indonesia), 2024
Business Model Canvas (BMC) is a business model that must be mastered by students in the Bioentrepreneurship course as an initial provision for entering the entrepreneurial world, while in compiling Business Model Canvas (BMC) systematic thinking skills are needed. This study aims to provide an assessment instrument to measure students' system…
Descriptors: Systems Approach, Thinking Skills, Models, Business Administration
Nnamdi Chika Ezike – ProQuest LLC, 2022
Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates…
Descriptors: Prediction, Models, Interrater Reliability, Item Response Theory
Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023
Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…
Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy
Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018
Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…
Descriptors: Grading, Models, Reliability, Validity
Purwadi; Saputra, Wahyu N. E.; Handaka, Irvan B.; Barida, Muya; Wahyudi, Amien; Widyastuti, Dian A.; Agungbudiprabowo; Rodhiya, Zaenab A. – Pegem Journal of Education and Instruction, 2022
This study aims to identify the acceptability and effectiveness of peace guidance based on the perspective of Markesot. This model seeks to reduce student aggressiveness. This study uses the research and development stages by adapting the Borg & Gall model. The participants of this study were 275 students who were taken randomly. The study…
Descriptors: Peace, Guidance, Models, Interrater Reliability
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2018
The Rasch facets model was developed to account for facet data, such as student essays graded by raters, but it accounts for only one kind of rater effect (severity). In practice, raters may exhibit various tendencies such as using middle or extreme scores in their ratings, which is referred to as the rater centrality/extremity response style. To…
Descriptors: Scoring, Models, Interrater Reliability, Computation
Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019
Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…
Descriptors: Tests, Scoring, Responses, Test Items
Schack, Edna O.; Dueber, David; Thomas, Jonathan Norris; Fisher, Molly H.; Jong, Cindy – AERA Online Paper Repository, 2019
Scoring of teachers' noticing responses is typically burdened with rater bias and reliance upon interrater consensus. The authors sought to make the scoring process more objective, equitable, and generalizable. The development process began with a description of response characteristics for each professional noticing component disconnected from…
Descriptors: Models, Teacher Evaluation, Observation, Bias
Ziegler, Wolfram; Staiger, Anja; Schölderle, Theresa; Vogel, Mathias – Journal of Speech, Language, and Hearing Research, 2017
Purpose: Standardized clinical assessment of dysarthria is essential for management and research. We present a new, fully standardized dysarthria assessment, the Bogenhausen Dysarthria Scales (BoDyS). The measurement model of the BoDyS is based on auditory evaluations of connected speech using 9 scales (traits) assessed by 4 elicitation methods.…
Descriptors: Auditory Evaluation, Test Reliability, Test Validity, Rating Scales

Peer reviewed
Direct link
