Publication Date
| In 2026 | 0 |
| Since 2025 | 178 |
| Since 2022 (last 5 years) | 1058 |
| Since 2017 (last 10 years) | 2880 |
| Since 2007 (last 20 years) | 6165 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Teachers | 480 |
| Practitioners | 358 |
| Researchers | 152 |
| Administrators | 122 |
| Policymakers | 51 |
| Students | 44 |
| Parents | 32 |
| Counselors | 25 |
| Community | 15 |
| Media Staff | 5 |
| Support Staff | 3 |
| More ▼ | |
Location
| Australia | 183 |
| Turkey | 156 |
| California | 133 |
| Canada | 123 |
| New York | 118 |
| United States | 112 |
| Florida | 107 |
| China | 103 |
| Texas | 72 |
| United Kingdom | 72 |
| Japan | 70 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 5 |
| Meets WWC Standards with or without Reservations | 11 |
| Does not meet standards | 8 |
Schulte, Niklas; Holling, Heinz; Bürkner, Paul-Christian – Educational and Psychological Measurement, 2021
Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high.…
Descriptors: Questionnaires, Measurement Techniques, Test Format, Scoring
Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Educational and Psychological Measurement, 2021
This study presents a latent (item response theory--like) framework of a recently developed classical approach to test scoring, equating, and item analysis, referred to as "D"-scoring method. Specifically, (a) person and item parameters are estimated under an item response function model on the "D"-scale (from 0 to 1) using…
Descriptors: Scoring, Equated Scores, Item Analysis, Item Response Theory
Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019
The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…
Descriptors: Evaluative Thinking, Preferences, Evaluators, Models
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Indrie Setya Lestari; Gayatri Nurnaningrum; Indri Andriani Astuti; Hengki Anggra Hermawan; Tiana Dara Lugina; Lulu Laela Amalia – International Journal of Language Education, 2025
This study investigates how PBL enhances students' critical thinking abilities as they compose narrative texts based on indigenous narratives in an English as a Foreign Language (EFL) classroom. The research conducted among 35 ninth-grade students (divided into seven groups) in a public junior high school in Bandung, Indonesia, using qualitative…
Descriptors: Scoring Rubrics, English (Second Language), Second Language Learning, Second Language Instruction
Adrea Truckenmiller; Katherine Valentine; Cherish Sarmiento; Lauren Hennenfent; Lindy Johnson; Pamella Moura; Julia Bachmann; Ellie Friedman; Samantha Bourgeois – Intervention in School and Clinic, 2025
In this article, recent developments in the assessment of writing, especially informational writing, are connected with research-based instruction that is likely to have an impact on students with disabilities and the wide range of student writing development in Grades 3 to 8. The objectives are for educators to assess, interpret, and set goals…
Descriptors: Writing Instruction, Individualized Instruction, Writing (Composition), Students with Disabilities
Ecem Kopuz; Galip Kartal – PASAA: Journal of Language Teaching and Learning in Thailand, 2025
The developments in artificial intelligence (AI) have significantly transformed second language (L2) learning and assessment, and the role of AI technologies in L2 assessment have been investigated in recent research. This study presents a bibliosystematic analysis of AI-assisted L2 assessment. Using both systematic analysis and bibliometric…
Descriptors: Artificial Intelligence, Computer Software, Technology Integration, Feedback (Response)
Jiayi Deng – Large-scale Assessments in Education, 2025
Background: Test score comparability in international large-scale assessments (LSAs) is greatly important to ensure test fairness. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic version of test forms into a common score scale. An example is the multigroup…
Descriptors: Guessing (Tests), Item Response Theory, Error Patterns, Arabic
Indiana Department of Education, 2025
The 2025-2026 Indiana Assessments Policy Manual communicates established guidelines regarding appropriate test administration in Indiana for key stakeholders including educators and Test Coordinators. This document contains policy guidance and appendices that delineate specific aspects of test implementation, including test security protocol,…
Descriptors: Measurement, Achievement Tests, Educational Testing, Reading Tests
Laila El-Hamamsy; María Zapata-Cáceres; Estefanía Martín-Barroso; Francesco Mondada; Jessica Dehler Zufferey; Barbara Bruno; Marcos Román-González – Technology, Knowledge and Learning, 2025
The introduction of computing education into curricula worldwide requires multi-year assessments to evaluate the long-term impact on learning. However, no single Computational Thinking (CT) assessment spans primary school, and no group of CT assessments provides a means of transitioning between instruments. This study therefore investigated…
Descriptors: Cognitive Tests, Computation, Thinking Skills, Test Validity
Sara T. Cushing – ETS Research Report Series, 2025
This report provides an in-depth comparison of TOEFL iBT® and the Duolingo English Test (DET) in terms of the degree to which both tests assess academic language proficiency in listening, reading, writing, and speaking. The analysis is based on publicly available documentation on both tests, including sample test questions available on the test…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Academic Language
Anne-Coleman Webre; Darrell Allen – TESOL in Context, 2025
Providing useful feedback on student writing is a challenging task, requiring an understanding of the specific language expectations in assignments teachers give students. Studies have shown that teachers are more likely to give corrective feedback on surface-level errors than attend to meaning-making linguistic resources. The question is how to…
Descriptors: Preservice Teachers, Preservice Teacher Education, Writing Evaluation, Feedback (Response)
Kang, Hyeon-Ah; Han, Suhwa; Kim, Doyoung; Kao, Shu-Chuan – Educational and Psychological Measurement, 2022
The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c)…
Descriptors: Goodness of Fit, Item Response Theory, Test Items, Scoring
Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022
As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…
Descriptors: Scores, Scoring, Comparative Analysis, Testing

Peer reviewed
Direct link
