ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	4
Since 2017 (last 10 years)	11
Since 2007 (last 20 years)	28

Descriptor

Comparative Analysis	53
Evaluation Methods	53
Interrater Reliability	53
Student Evaluation	17
Evaluators	12
Foreign Countries	12
Higher Education	11
Peer Evaluation	10
Correlation	9
Computer Software	8
Evaluation Criteria	8
Writing Evaluation	8
Performance Based Assessment	7
Scoring	7
College Faculty	6
Computer Assisted Testing	6
Decision Making	6
Evaluation Research	6
Second Language Learning	6
Validity	6
Case Studies	5
English (Second Language)	5
High School Students	5
Interviews	5
Scores	5
More ▼

Publication Type

Journal Articles	40
Reports - Research	30
Reports - Evaluative	14
Speeches/Meeting Papers	7
Tests/Questionnaires	4
Information Analyses	3
Reports - Descriptive	3
Dissertations/Theses -…	2
Numerical/Quantitative Data	2
Book/Product Reviews	1
Collected Works - Proceedings	1
More ▼

Education Level

Higher Education	12
Postsecondary Education	7
Secondary Education	7
Elementary Education	4
Elementary Secondary Education	4
Adult Education	3
High Schools	3
Grade 1	1
Grade 2	1
Grade 3	1
Grade 4	1
Grade 5	1
Kindergarten	1
Middle Schools	1
More ▼

Audience

Practitioners	1
Teachers	1

Location

Australia	3
Asia	2
China	2
Netherlands	2
Turkey	2
United Kingdom	2
United Kingdom (England)	2
Belgium	1
Brazil	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Finland	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Hong Kong	1
Ireland	1
Israel	1
Italy	1
Japan	1
Kazakhstan	1
Lebanon	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

NEO Personality Inventory	1
Raven Progressive Matrices	1

What Works Clearinghouse Rating

Showing 1 to 15 of 53 results Save | Export

Do You Mean What I Mean? Comparing Teacher Performance Self-Scores and Evaluator-Generated Scores

Peer reviewed

Direct link

Hunter, Seth B. – Journal of Education Human Resources, 2023

Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

Improving Reliability in Assessing Integrative Learning Using Rubrics: Does Group Norming Help?

Peer reviewed
PDF on ERIC

Download full text

Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023

Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…

Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development

Assessing Language in Unstructured Conversation in People with Aphasia: Methods, Psychometric Integrity, Normative Data, and Comparison to a Structured Narrative Task

Peer reviewed

Direct link

Leaman, Marion C.; Edmonds, Lisa A. – Journal of Speech, Language, and Hearing Research, 2021

Purpose: This study evaluated interrater reliability (IRR) and test-retest stability (TRTS) of seven linguistic measures (percent correct information units, relevance, subject-verb-[object], complete utterance, grammaticality, referential cohesion, global coherence), and communicative success in unstructured conversation and in a story narrative…

Descriptors: Aphasia, Psychometrics, Correlation, Speech Language Pathology

Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment

Peer reviewed

Direct link

Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020

Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…

Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Practicalities of Using a Modified Version of the Cochrane Collaboration Risk of Bias Tool for Randomised and Non-Randomised Study Designs Applied in a Health Technology Assessment Setting

Peer reviewed

Direct link

Robertson, Clare; Ramsay, Craig; Gurung, Tara; Mowatt, Graham; Pickard, Robert; Sharma, Pawana – Research Synthesis Methods, 2014

We describe our experience of using a modified version of the Cochrane risk of bias (RoB) tool for randomised and non-randomised comparative studies. Objectives: (1) To assess time to complete RoB assessment; (2) To assess inter-rater agreement; and (3) To explore the association between RoB and treatment effect size. Methods: Cochrane risk of…

Descriptors: Risk, Randomized Controlled Trials, Research Design, Comparative Analysis

Constructed-Response as an Alternative to Interviews in Conceptual Change Studies: Students' Explanations of Force

Peer reviewed
PDF on ERIC

Download full text

Schleigh, Sharon Price; Clark, Douglas B.; Menekse, Muhsin – International Journal of Education in Mathematics, Science and Technology, 2015

Although interview formats support rich data collection in conceptual change studies, interview formats limit sample sizes. This study explores the possibility of using constructed-response formats as an alternative or supplement for collecting similarly rich data across larger pools of subjects in conceptual change studies. While research in…

Descriptors: Interviews, Sample Size, Change, Concept Formation

Using Calibrated Exemplars in the Teacher-Assessment of Writing: An Empirical Study

Peer reviewed

Direct link

Heldsinger, Sandra A.; Humphry, Stephen M. – Educational Research, 2013

Background: Many in education argue for the importance of incorporating teacher judgements in the assessment and reporting of student performance. Advocates of such an approach are cognisant, though, that obtaining a satisfactory level of consistency in teacher judgements poses a challenge. Purpose: This study investigates the extent to which the…

Descriptors: Evaluation Methods, Student Evaluation, Teacher Attitudes, Comparative Analysis

"Good Writing" in Increasingly Internationalized U.S. Universities: How Instructors Evaluate Different Written Varieties of English

Direct link

Collier, Lizabeth C. – ProQuest LLC, 2014

This study investigates how university instructors from various disciplines at a large, comprehensive university in the United States evaluate different varieties of English from countries considered "outer circle" (OC) countries, formerly colonized countries where English has been transplanted and is now used unofficially and officially…

Descriptors: Universities, Global Approach, College English, Writing Evaluation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Assessment & Evaluation in…	3
Academic Medicine	2
Educational Research	2
ProQuest LLC	2
Research Synthesis Methods	2
ALT-J: Research in Learning…	1
Advances in Health Sciences…	1
American Journal of…	1
Applied Psychological…	1
Assessment in Education:…	1
Canadian Modern Language…	1
Child Welfare	1
Computers & Education	1
Educational Measurement:…	1
Educational Media…	1
Educational and Psychological…	1
English Language Teaching	1
International Association for…	1
International Educational…	1
International Journal of…	1
International Journal of…	1
Journal for the Education of…	1
Journal of Applied Testing…	1
Journal of Child Language	1
Journal of Education Human…	1
More ▼

Myford, Carol M.	2
Armijo-Olivo, Susan	1
Azzam, Tarek	1
Baird, Christopher	1
Barkaoui, Khaled	1
Berry, Kenneth J.	1
Bhola, Dennison S.	1
Bosch, Nigel	1
Buckendahl, Chad W.	1
Burk, John	1
Burmester, Kristen O'Rourke	1
Burns, Matthew K.	1
Bursac, Zoran	1
Campbell, Sandy	1
Centra, John A.	1
Chang, Chi-Cheng	1
Chen, Yen-Yuan	1
Chen, Yi-Hui	1
Chou, Pao-Nan	1
Christie, Christina A.	1
Clark, Douglas B.	1
Clarkeburn, Henriikka	1
Collier, Lizabeth C.	1
Coniam, David	1
More ▼