Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 11 |
Descriptor
Language Tests | 13 |
Statistical Analysis | 13 |
English (Second Language) | 12 |
Second Language Learning | 11 |
Scoring | 9 |
Correlation | 6 |
Computer Assisted Testing | 5 |
Evaluators | 5 |
Scores | 5 |
Scoring Rubrics | 5 |
Essays | 4 |
More ▼ |
Source
ETS Research Report Series | 6 |
English Language Teaching | 1 |
JALT CALL Journal | 1 |
Journal of Educational… | 1 |
Language Assessment Quarterly | 1 |
Language Testing | 1 |
TESL Canada Journal | 1 |
Working Papers in TESOL &… | 1 |
Author
Kantor, Robert | 2 |
Ashwell, Tim | 1 |
Baba, Kyoko | 1 |
Bratkovich, Meghan Odsliv | 1 |
Bridgeman, Brent | 1 |
Cumming, Alister | 1 |
Davey, Tim | 1 |
Davis, Larry | 1 |
Des Brisay, Margaret | 1 |
Elam, Jesse R. | 1 |
Eouanzoui, Keanre | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 11 |
Tests/Questionnaires | 5 |
Reports - Evaluative | 2 |
Education Level
Higher Education | 4 |
Postsecondary Education | 4 |
Adult Education | 2 |
Audience
Location
California (Los Angeles) | 1 |
Canada | 1 |
Georgia | 1 |
Indiana | 1 |
Iowa | 1 |
Japan | 1 |
Michigan | 1 |
Minnesota | 1 |
New York | 1 |
New York (New York) | 1 |
Taiwan | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 13 |
What Works Clearinghouse Rating
Davis, Larry – Language Testing, 2016
Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…
Descriptors: Evaluators, Oral Language, Scores, Language Tests
Weigle, Sara Cushing – ETS Research Report Series, 2011
Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study addresses two validity-related issues regarding the use of e-rater® with the…
Descriptors: Scoring, English (Second Language), Second Language Instruction, Automation
Ashwell, Tim; Elam, Jesse R. – JALT CALL Journal, 2017
The ultimate aim of our research project was to use the Google Web Speech API to automate scoring of elicited imitation (EI) tests. However, in order to achieve this goal, we had to take a number of preparatory steps. We needed to assess how accurate this speech recognition tool is in recognizing native speakers' production of the test items; we…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Language Tests
Yang, Hui-Chun – Language Assessment Quarterly, 2014
This study explores the construct of a summarization test task by means of single-group and multigroup structural equation modeling (SEM). It examines the interrelationships between strategy use and performance, drawing on data from 298 Taiwanese undergraduates' summary essays and their self-reported strategy use. Single-group SEM analyses…
Descriptors: Foreign Countries, Structural Equation Models, Writing Skills, Language Tests
Bratkovich, Meghan Odsliv – Working Papers in TESOL & Applied Linguistics, 2014
This study investigated the nature of self-assessment and blind peer- and teacher-assessment in L2 writing. The type of feedback students gave to themselves and peers, the type of feedback used in the revision process, and the source of the feedback used were all analyzed. Additionally, student perceptions of self- and peer-assessment, feedback,…
Descriptors: Student Evaluation, Evaluation Methods, Self Evaluation (Individuals), Peer Evaluation
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…
Descriptors: Scoring, Prompting, Evaluators, Computer Software
Winke, Paula; Gass, Susan; Myford, Carol – ETS Research Report Series, 2011
This study investigated whether raters' second language (L2) background and the first language (L1) of test takers taking the TOEFL iBT® Speaking test were related through scoring. After an initial 4-hour training period, a group of 107 raters (mostly of learners of Chinese, Korean, and Spanish), listened to a selection of 432 speech samples that…
Descriptors: Second Language Learning, Evaluators, Speech Tests, English (Second Language)
Jamieson, Joan; Poonpon, Kornwipa – ETS Research Report Series, 2013
Research and development of a new type of scoring rubric for the integrated speaking tasks of "TOEFL iBT"® are described. These "analytic rating guides" could be helpful if tasks modeled after those in TOEFL iBT were used for formative assessment, a purpose which is different from TOEFL iBT's primary use for admission…
Descriptors: Oral Language, Language Proficiency, Scaling, Scores
Jernigan, Justin – English Language Teaching, 2012
Swain's Output Hypothesis proposes a facilitative effect for output on the acquisition of second language morphosyntax. In the context of classroom instruction, a number of studies and reviews suggest that explicit instruction in pragmatic elements promotes development. Other studies have offered less conclusive evidence of the effectiveness of…
Descriptors: English (Second Language), Second Language Instruction, Second Language Learning, Instructional Effectiveness
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – ETS Research Report Series, 2008
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and "e-rater"® essay feature variables in the context of the TOEFL® computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scoring

Wainer, Howard; Wang, Xiaohui – Journal of Educational Measurement, 2000
Modified the three-parameter model to include an additional random effect for items nested within the same testlet. Fitted the new model to 86 testlets from the Test of English as a Foreign Language (TOEFL) and compared standard parameters (discrimination, difficulty, and guessing) with those obtained through traditional modeling. Discusses the…
Descriptors: English (Second Language), Language Tests, Scoring, Statistical Analysis
Cumming, Alister; Kantor, Robert; Baba, Kyoko; Eouanzoui, Keanre; Erdosy, Usman; James, Mark – ETS Research Report Series, 2006
We assessed whether and how the discourse written for prototype integrated tasks (involving writing in response to print or audio source texts) field tested for the new TOEFL® differs from the discourse written for independent essays (i.e., the TOEFL essay). We selected 216 compositions written for 6 tasks by 36 examinees in a field…
Descriptors: Discourse Analysis, Essays, Scores, Language Proficiency
Des Brisay, Margaret – TESL Canada Journal, 1994
Data from the Canadian Test of English for Scholars and Trainees (CanTEST) are compared to data from the Test of English as a Foreign Language (TOEFL) to establish CanTEST as a valid admissions tool for English-as-a-Second Language college applicants. Data are taken from four groups of examinees who took both tests. (eight references) (LR)
Descriptors: Admission Criteria, Comparative Analysis, Comparative Testing, Correlation