Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 12 |
Descriptor
Comparative Analysis | 15 |
Computer Assisted Testing | 15 |
Interrater Reliability | 15 |
English (Second Language) | 10 |
Second Language Learning | 10 |
Foreign Countries | 8 |
Computer Software | 7 |
Evaluators | 7 |
Scoring | 7 |
Correlation | 6 |
Evaluation Methods | 6 |
More ▼ |
Source
Author
Coniam, David | 2 |
Alt, Mary | 1 |
Amanda Huee-Ping Wong | 1 |
Bell, John F. | 1 |
Bhola, Dennison S. | 1 |
Breyer, F. Jay | 1 |
Buckendahl, Chad W. | 1 |
Burk, John | 1 |
Figueroa, Cecilia | 1 |
Ivan Cherh Chiet Low | 1 |
Jiyeo Yun | 1 |
More ▼ |
Publication Type
Journal Articles | 14 |
Reports - Research | 7 |
Reports - Evaluative | 6 |
Tests/Questionnaires | 2 |
Collected Works - Proceedings | 1 |
Information Analyses | 1 |
Education Level
Higher Education | 7 |
Postsecondary Education | 5 |
Elementary Secondary Education | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 11 | 1 |
High Schools | 1 |
Preschool Education | 1 |
Audience
Location
China | 2 |
Hong Kong | 2 |
Singapore | 2 |
Arizona | 1 |
Asia | 1 |
Australia | 1 |
Brazil | 1 |
Connecticut | 1 |
Denmark | 1 |
Egypt | 1 |
Estonia | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Expressive One Word Picture… | 1 |
Mean Length of Utterance | 1 |
Peabody Picture Vocabulary… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Wang, Yuqi; Ren, Wei – Language Learning Journal, 2022
L2 pragmatics have explored the effects of different factors on different aspects of learners' pragmatic performance, but often not simultaneously. In addition, syntactic complexity is rarely examined in L2 pragmatics. This cross-sectional study aimed to conduct a multidimensional analysis to explore the effects of proficiency and study-abroad…
Descriptors: Pragmatics, Second Language Learning, Second Language Instruction, English (Second Language)
Kang, Okim; Rubin, Don; Kermad, Alyssa – Language Testing, 2019
As a result of the fact that judgments of non-native speech are closely tied to social biases, oral proficiency ratings are susceptible to error because of rater background and social attitudes. In the present study we seek first to estimate the variance attributable to rater background and attitudinal variables on novice raters' assessments of L2…
Descriptors: Evaluators, Second Language Learning, Language Tests, English (Second Language)
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Marking Essays on Screen: An Investigation into the Reliability of Marking Extended Subjective Texts
Johnson, Martin; Nadas, Rita; Bell, John F. – British Journal of Educational Technology, 2010
There is a growing body of research literature that considers how the mode of assessment, either computer-based or paper-based, might affect candidates' performances. Despite this, there is a fairly narrow literature that shifts the focus of attention to those making assessment judgements and which considers issues of assessor consistency when…
Descriptors: English Literature, Examiners, Evaluation Research, Evaluators
Alt, Mary; Meyers, Christina; Figueroa, Cecilia – Journal of Speech, Language, and Hearing Research, 2013
Purpose: The purpose of this study was to determine whether children exposed to 2 languages would benefit from the phonotactic probability cues of a single language in the same way as monolingual peers and to determine whether crosslinguistic influence would be present in a fast-mapping task. Method: Two groups of typically developing children…
Descriptors: Regression (Statistics), Spanish, Cues, Task Analysis
Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010
Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…
Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading
Coniam, David – Educational Research and Evaluation, 2009
This paper describes a study comparing paper-based marking (PBM) and onscreen marking (OSM) in Hong Kong utilising English language essay scripts drawn from the live 2007 Hong Kong Certificate of Education Examination (HKCEE) Year 11 English Language Writing Paper. In the study, 30 raters from the 2007 HKCEE Writing Paper marked on paper 100…
Descriptors: Student Attitudes, Foreign Countries, Essays, Comparative Analysis
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003
This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)
Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items
Lee, H. K. – Assessing Writing, 2004
This study aimed to comprehensively investigate the impact of a word-processor on an ESL writing assessment, covering comparison of inter-rater reliability, the quality of written products, the writing process across different testing occasions using different writing media, and students' perception of a computer-delivered test. Writing samples of…
Descriptors: Writing Evaluation, Student Attitudes, Writing Tests, Testing
Yang, Yongwei; Buckendahl, Chad W.; Juszkiewicz, Piotr J.; Bhola, Dennison S. – Journal of Applied Testing Technology, 2005
With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those…
Descriptors: Scoring, Validity, Interrater Reliability, Comparative Analysis
International Association for Development of the Information Society, 2012
The IADIS CELDA 2012 Conference intention was to address the main issues concerned with evolving learning processes and supporting pedagogies and applications in the digital age. There had been advances in both cognitive psychology and computing that have affected the educational arena. The convergence of these two disciplines is increasing at a…
Descriptors: Academic Achievement, Academic Persistence, Academic Support Services, Access to Computers