Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 22 |
Descriptor
Evaluators | 26 |
Interrater Reliability | 26 |
Statistical Analysis | 26 |
Correlation | 11 |
Second Language Learning | 10 |
English (Second Language) | 9 |
Foreign Countries | 8 |
Language Tests | 7 |
Comparative Analysis | 6 |
Oral Language | 6 |
Scores | 6 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 25 |
Reports - Research | 18 |
Reports - Evaluative | 7 |
Tests/Questionnaires | 5 |
Dissertations/Theses -… | 1 |
Information Analyses | 1 |
Education Level
Higher Education | 7 |
Postsecondary Education | 6 |
Secondary Education | 2 |
Grade 1 | 1 |
Grade 11 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Audience
Location
Hong Kong | 2 |
United Kingdom | 2 |
Finland | 1 |
Iran | 1 |
Iran (Tehran) | 1 |
Japan | 1 |
Ohio | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 3 |
Flesch Kincaid Grade Level… | 1 |
What Works Clearinghouse Rating
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students
Al-Harthi, Aisha Salim Ali; Campbell, Chris; Karimi, Arafeh – Computers in the Schools, 2018
This study aimed to develop, validate, and trial a rubric for evaluating the cloud-based learning designs (CBLD) that were developed by teachers using virtual learning environments. The rubric was developed using the technological pedagogical content knowledge (TPACK) framework, with rubric development including content and expert validation of…
Descriptors: Computer Assisted Instruction, Scoring Rubrics, Interrater Reliability, Content Validity
Davis, Larry – Language Testing, 2016
Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…
Descriptors: Evaluators, Oral Language, Scores, Language Tests
Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017
The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…
Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse
Lehan, Tara; Hussey, Heather; Mika, Eva – Journal of University Teaching and Learning Practice, 2016
Throughout the dissertation process, the chair and committee members provide feedback regarding quality to help the doctoral candidate to produce the highest-quality document and become an independent scholar. Nevertheless, results of previous research suggest that overall dissertation quality generally is poor. Because much of the feedback about…
Descriptors: Graduate Students, Doctoral Dissertations, Student Evaluation, Feedback (Response)
Li, Hui – English Language Teaching, 2016
The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…
Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making
Skalicky, Stephen; Berger, Cynthia M.; Crossley, Scott A.; McNamara, Danielle S. – Advances in Language and Literary Studies, 2016
A corpus of 313 freshman college essays was analyzed in order to better understand the forms and functions of humor in academic writing. Human ratings of humor and wordplay were statistically aggregated using Factor Analysis to provide an overall "Humor" component score for each essay in the corpus. In addition, the essays were also…
Descriptors: Discourse Analysis, Academic Discourse, Humor, Writing (Composition)
Ahmadi, Alireza; Sadeghi, Elham – Language Assessment Quarterly, 2016
In the present study we investigated the effect of test format on oral performance in terms of test scores and discourse features (accuracy, fluency, and complexity). Moreover, we explored how the scores obtained on different test formats relate to such features. To this end, 23 Iranian EFL learners participated in three test formats of monologue,…
Descriptors: Oral Language, Comparative Analysis, Language Fluency, Accuracy
Jarvis, Scott – Language Testing, 2017
The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…
Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers
Virtanen, T. E.; Pakarinen, E.; Lerkkanen, M.-K.; Poikkeus, A.-M.; Siekkinen, M.; Nurmi, J.-E. – Journal of Early Adolescence, 2018
This study examined the reliability and validity of the Classroom Assessment Scoring System-Secondary (CLASS-S) in Finnish classrooms. Trained observers coded classroom interactions based on video recordings of 46 Grade 6 classrooms (450 cycles). Concurrent associations were investigated with respect to teacher self-ratings (e.g., efficacy beliefs…
Descriptors: Factor Analysis, Classroom Observation Techniques, Foreign Countries, Factor Structure
Hijikata-Someya, Yuko; Ono, Masumi; Yamanishi, Hiroyuki – English Language Teaching, 2015
Although the importance of summary writing is well documented in prior studies, few have investigated the evaluation of written summaries. Due to the complex nature of L2 summary writing, which requires one to read the original material and summarize its content in the L2, raters often emphasize different features when judging the quality of L2…
Descriptors: Foreign Countries, English (Second Language), Second Language Instruction, Second Language Learning
Previous Page | Next Page ยป
Pages: 1 | 2