Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 14 |
Descriptor
Evaluators | 16 |
Scoring | 14 |
English (Second Language) | 11 |
Language Tests | 11 |
Second Language Learning | 11 |
Computer Assisted Testing | 9 |
Interrater Reliability | 7 |
Essays | 6 |
Scores | 6 |
Correlation | 5 |
Prompting | 5 |
More ▼ |
Source
ETS Research Report Series | 16 |
Author
Attali, Yigal | 2 |
Arslan, Burcu | 1 |
Bejar, Isaac I. | 1 |
Bell, Courtney A. | 1 |
Blanchard, Daniel | 1 |
Breyer, F. Jay | 1 |
Bridgeman, Brent | 1 |
Brown, Annie | 1 |
Cahill, Aoife | 1 |
Casabianca, Jodi M. | 1 |
Chen, Lei | 1 |
More ▼ |
Publication Type
Journal Articles | 16 |
Reports - Research | 15 |
Tests/Questionnaires | 4 |
Reports - General | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 10 |
Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Attali, Yigal – ETS Research Report Series, 2020
Principles of skill acquisition dictate that raters should be provided with frequent feedback about their ratings. However, in current operational practice, raters rarely receive immediate feedback about their scores owing to the prohibitive effort required to generate such feedback. An approach for generating and administering feedback responses…
Descriptors: Feedback (Response), Evaluators, Accuracy, Scores
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Qi, Yi; Bell, Courtney A.; Jones, Nathan D.; Lewis, Jennifer M.; Witherspoon, Margaret W.; Redash, Amanda – ETS Research Report Series, 2018
Teacher observations are being used for high-stakes purposes in states across the country, and administrators often serve as raters in teacher evaluation systems. This paper examines how the cognitive aspects of administrators' use of an observation instrument, a modified version of Charlotte Danielson's Framework for Teaching, interact with the…
Descriptors: Teacher Evaluation, Classroom Observation Techniques, Observation, Evaluation Methods
Rupp, André A.; Casabianca, Jodi M.; Krüger, Maleika; Keller, Stefan; Köller, Olaf – ETS Research Report Series, 2019
In this research report, we describe the design and empirical findings for a large-scale study of essay writing ability with approximately 2,500 high school students in Germany and Switzerland on the basis of 2 tasks with 2 associated prompts, each from a standardized writing assessment whose scoring involved both human and automated components.…
Descriptors: Automation, Foreign Countries, English (Second Language), Language Tests
Joe, Jilliam; Kitchen, Christopher; Chen, Lei; Feng, Gary – ETS Research Report Series, 2015
The purpose of this paper is to summarize the evaluation of human-scoring quality for an assessment of public speaking skills. Videotaped performances given by 17 speakers on 4 tasks were scored by expert and nonexpert raters who had extensive experience scoring performance-based and constructed-response assessments. The Public Speaking Competence…
Descriptors: Public Speaking, Communication Skills, Scoring, Scoring Rubrics
Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015
The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…
Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)
Blanchard, Daniel; Tetreault, Joel; Higgins, Derrick; Cahill, Aoife; Chodorow, Martin – ETS Research Report Series, 2013
This report presents work on the development of a new corpus of non-native English writing. It will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring. In this report, the corpus is described in detail.
Descriptors: Language Tests, Second Language Learning, English (Second Language), Writing Tests
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…
Descriptors: Scoring, Prompting, Evaluators, Computer Software
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Winke, Paula; Gass, Susan; Myford, Carol – ETS Research Report Series, 2011
This study investigated whether raters' second language (L2) background and the first language (L1) of test takers taking the TOEFL iBT® Speaking test were related through scoring. After an initial 4-hour training period, a group of 107 raters (mostly of learners of Chinese, Korean, and Spanish), listened to a selection of 432 speech samples that…
Descriptors: Second Language Learning, Evaluators, Speech Tests, English (Second Language)
Jamieson, Joan; Poonpon, Kornwipa – ETS Research Report Series, 2013
Research and development of a new type of scoring rubric for the integrated speaking tasks of "TOEFL iBT"® are described. These "analytic rating guides" could be helpful if tasks modeled after those in TOEFL iBT were used for formative assessment, a purpose which is different from TOEFL iBT's primary use for admission…
Descriptors: Oral Language, Language Proficiency, Scaling, Scores
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – ETS Research Report Series, 2008
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and "e-rater"® essay feature variables in the context of the TOEFL® computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scoring
Wylie, E. Caroline; Szpara, Michelle Y. – ETS Research Report Series, 2004
This study is an in-depth investigation of the National Board for Professional Teaching Standards (NBPTS) bias-reduction training, from the perspective of assessors. The research examined how successful the bias training was in guiding assessors to recognize their biases and to identify actions to be used to reduce the impact of bias on their…
Descriptors: Standards, Bias, Training, Evaluators
Zechner, Klaus; Bejar, Isaac I.; Hemat, Ramin – ETS Research Report Series, 2007
The increasing availability and performance of computer-based testing has prompted more research on the automatic assessment of language and speaking proficiency. In this investigation, we evaluated the feasibility of using an off-the-shelf speech-recognition system for scoring speaking prompts from the LanguEdge field test of 2002. We first…
Descriptors: Role, Computer Assisted Testing, Language Proficiency, Oral Language
Xi, Xiaoming; Mollaun, Pam – ETS Research Report Series, 2006
This study explores the utility of analytic scoring for the TOEFL® Academic Speaking Test (TAST) in providing useful and reliable diagnostic information in three aspects of candidates' performance: delivery, language use, and topic development. G studies were used to investigate the dependability of the analytic scores, the distinctness of the…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Oral Language
Previous Page | Next Page »
Pages: 1 | 2