Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 16 |
Descriptor
Source
Author
Papageorgiou, Spiros | 3 |
Bridgeman, Brent | 2 |
Carlson, Sybil B. | 2 |
Choi, Ikkyu | 2 |
Henning, Grant | 2 |
Sawaki, Yasuyo | 2 |
Sinharay, Sandip | 2 |
Angoff, William H. | 1 |
Attali, Yigal | 1 |
Baghaei, Purya | 1 |
Breland, Hunter M. | 1 |
More ▼ |
Publication Type
Reports - Research | 28 |
Journal Articles | 18 |
Tests/Questionnaires | 4 |
Speeches/Meeting Papers | 3 |
Numerical/Quantitative Data | 2 |
Audience
Researchers | 2 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Davis, Larry; Papageorgiou, Spiros – Assessment in Education: Principles, Policy & Practice, 2021
Human raters and machine scoring systems potentially have complementary strengths in evaluating language ability; specifically, it has been suggested that automated systems might be used to make consistent measurements of specific linguistic phenomena, whilst humans evaluate more global aspects of performance. We report on an empirical study that…
Descriptors: Scoring, English for Academic Purposes, Oral English, Speech Tests
Tavarez Da Costa, Pedro; Reyes Arias, Fransheska – Online Submission, 2021
The present work seeks to establish a comparison between two different and distant evaluation tools applied to the Dominican student population in order to measure the efficiency of our educational system in the recent years, one of them measured the quality of Dominican education in three areas (the PISA Test), whereas the other tested the…
Descriptors: Foreign Countries, Standardized Tests, Student Evaluation, International Assessment
Toroujeni, Seyyed Morteza Hashemi – Education and Information Technologies, 2022
Score interchangeability of Computerized Fixed-Length Linear Testing (henceforth CFLT) and Paper-and-Pencil-Based Testing (henceforth PPBT) has become a controversial issue over the last decade when technology has meaningfully restructured methods of the educational assessment. Given this controversy, various testing guidelines published on…
Descriptors: Computer Assisted Testing, Reading Tests, Reading Comprehension, Scoring
Adding Value to Second-Language Listening and Reading Subscores: Using a Score Augmentation Approach
Papageorgiou, Spiros; Choi, Ikkyu – International Journal of Testing, 2018
This study examined whether reporting subscores for groups of items within a test section assessing a second-language modality (specifically reading or listening comprehension) added value from a measurement perspective to the information already provided by the section scores. We analyzed the responses of 116,489 test takers to reading and…
Descriptors: Second Language Learning, Second Language Instruction, English (Second Language), Language Tests
Choi, Ikkyu; Papageorgiou, Spiros – Language Testing, 2020
Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scores
Sawaki, Yasuyo; Sinharay, Sandip – Language Testing, 2018
The present study examined the reliability of the reading, listening, speaking, and writing section scores for the TOEFL iBT® test and their interrelationship in order to collect empirical evidence to support, respectively, the "generalization" inference and the "explanation" inference in the TOEFL iBT validity argument…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Computer Assisted Testing
Hanifehzadeh, Sepeedeh; Farahzad, Farzaneh – International Journal of Language Testing, 2016
The present study was designed basically to develop a psycho-motor mechanism scale based on the theory of translation competence proposed by PACTE (2003), and then to assess the validity and reliability of the constructed scale. In this quantitative research, after designing the scale, two translation tasks were given to 90 M.A. students majoring…
Descriptors: Translation, Language Tests, Test Construction, Test Reliability
Baghaei, Purya; Ravand, Hamdollah – Psicologica: International Journal of Methodology and Experimental Psychology, 2016
In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Descriptors: Cloze Procedure, Reading, Reading Comprehension, Reading Skills
Pishkar, Kian; Moinzadeh, Ahmad; Dabaghi, Azizallah – English Language Teaching, 2017
Speaking a language involves more than simply knowing the linguistic components of the message, and developing language skills requires more than grammatical comprehension and vocabulary memorization. In teaching-learning processes, drama method may have some positive effects on ELL students' speaking fluency and accuracy. This study attempts to…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Undergraduate Students
Iberri-Shea, Gina – Cogent Education, 2017
Prominent spoken language assessments such as the Oral Proficiency Interview and the Test of Spoken English have been primarily concerned with speaking ability as it relates to conversation. This paper looks at an additional aspect of spoken language ability, namely public speaking. This study used an adapted form of a public speaking rating scale…
Descriptors: Public Speaking, Rating Scales, Adoption (Ideas), English Instruction
Bridgeman, Brent; Cho, Yeonsuk; DiPietro, Stephen – Language Testing, 2016
Data from 787 international undergraduate students at an urban university in the United States were used to demonstrate the importance of separating a sample into meaningful subgroups in order to demonstrate the ability of an English language assessment to predict the first-year grade point average (GPA). For example, when all students were pooled…
Descriptors: Grade Prediction, English Curriculum, Language Tests, Undergraduate Students
Sawaki, Yasuyo; Sinharay, Sandip – ETS Research Report Series, 2013
This study investigates the value of reporting the reading, listening, speaking, and writing section scores for the "TOEFL iBT"® test, focusing on 4 related aspects of the psychometric quality of the TOEFL iBT section scores: reliability of the section scores, dimensionality of the test, presence of distinct score profiles, and the…
Descriptors: Scores, Computer Assisted Testing, Factor Analysis, Correlation
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – ETS Research Report Series, 2008
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and "e-rater"® essay feature variables in the context of the TOEFL® computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scoring
Stricker, Lawrence J.; Rock, Donald A. – ETS Research Report Series, 2008
This study assessed the invariance in the factor structure of the "Test of English as a Foreign Language"™ Internet-based test (TOEFL® iBT) across subgroups of test takers who differed in native language and exposure to the English language. The subgroups were defined by (a) Indo-European and Non-Indo-European language family, (b)…
Descriptors: Factor Structure, English (Second Language), Language Tests, Computer Assisted Testing
Brown, James Dean; Ross, Jacqueline A. – 1993
This study investigates the Test of English as a Foreign Language (TOEFL), in particular the relative contributions to score dependability (analogous to classical theory reliability) of various numbers of items and subtests as well as the decision dependability at different cut points. Research questions that apply to the overall TOEFL battery and…
Descriptors: English (Second Language), Language Tests, Statistical Analysis, Test Reliability
Previous Page | Next Page »
Pages: 1 | 2