NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 136,279 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Ikkyu Choi; Jiyun Zu – Language Testing, 2025
Today's language models can produce syntactically accurate and semantically coherent texts. This capability presents new opportunities for generating content for language assessments, which have traditionally required intensive expert resources. However, these models are also known to generate biased texts, leading to representational harms.…
Descriptors: Artificial Intelligence, Language Tests, Test Bias, Test Construction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Eunsoo Cho; Mina Son; Sarah Reiley; Eun Ha Kim – Language, Speech, and Hearing Services in Schools, 2025
Purpose: The purpose of this study was to develop and evaluate the initial reliability and validity evidence of the dynamic assessment (DA) of early reading and language as a second-stage screener in kindergarten, the first year of formal schooling. The DA comprises three subtests that capture students' ability to learn letter sounds and blending…
Descriptors: Alternative Assessment, Test Construction, Test Validity, Kindergarten
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Meryem Konu Kadirhanogullari; Esra Özay Köse – Science Insights Education Frontiers, 2025
This study aims to develop a valid and reliable achievement test in accordance with the content framework of the 9th-grade Biology Course Curriculum published within the scope of the Turkish Century Maarif Model on the subject of "Organic Matter". The screening method was used for this purpose. The sample of the study consists of 258…
Descriptors: Science Tests, Test Construction, Grade 9, Biology
Robert J. Marzano; Bridget Cahill; Jeni Gotto; Brian J. Kosena; Michael Lynch; Lucy Pearson – Solution Tree, 2025
In "Test-Specific Thinking," the authors provide recommended practices, methods, and means for educators to implement structural schemas into teaching, helping students better prepare for tests and formulate stronger responses to certain question frames. Armed with a better understanding of how tests are designed, teachers will increase…
Descriptors: English Instruction, Language Arts, Mathematics Tests, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Venessa F. Manna; Shuhong Li; Spiros Papageorgiou; Lixiong Gu – ETS Research Report Series, 2025
This technical manual describes the purpose and intended uses of the TOEFL iBT test, its target test-taker population, and relevant language use domains. The test design and scoring procedures are presented first, followed by a research agenda intended to support the interpretation and use of test scores. Given the updates to the test starting…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Rekha; Shakeela K. – Journal on School Educational Technology, 2025
The main objective of the present study was to construct and standardize an achievement test in science for the secondary school science students in grade 8. An achievement test having 120 test items was prepared by the facilitator based on the four main learning objectives of teaching science that are knowledge, understanding, application, and…
Descriptors: Test Construction, Standardized Tests, Secondary School Students, Science Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
Kentaro Fukushima; Nao Uchida; Kensuke Okada – Journal of Educational and Behavioral Statistics, 2025
Diagnostic tests are typically administered in a multiple-choice (MC) format due to their advantages of objectivity and time efficiency. The MC-deterministic input, noisy "and" gate (DINA) family of models, a representative class of cognitive diagnostic models for MC items, efficiently and parsimoniously estimates the mastery profiles of…
Descriptors: Diagnostic Tests, Cognitive Measurement, Multiple Choice Tests, Educational Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Christopher J. Anthony; Stephen N. Elliott – School Mental Health, 2025
Stress is a complex construct that is related to resilience and general health starting in childhood. Despite its importance for student health and well-being, there are few measures of stress designed for school-based applications. In this study, we developed and initially validated a Stress Indicators Scale using five samples of teachers,…
Descriptors: Test Construction, Stress Variables, Test Validity, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Marta Siedlecka; Piotr Litwin; Paulina Szyszka; Boryslaw Paulewicz – European Journal of Psychology of Education, 2025
Students change their responses during tests, and these revisions are often correct. Some studies have suggested that decisions regarding revisions are informed by metacognitive monitoring. We investigated whether assessing and reporting response confidence increases the accuracy of revisions and the final test score, and whether confidence in a…
Descriptors: Student Evaluation, Decision Making, Responses, Achievement Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Anatri Desstya; Ika Candra Sayekti; Muhammad Abduh; Sukartono – Journal of Turkish Science Education, 2025
This study aimed to develop a standardised instrument for diagnosing science misconceptions in primary school children. Following a developmental research approach using the 4-D model (Define, Design, Develop, Disseminate), 100 four-tier multiple choice items were constructed. Content validity was established through expert evaluation by six…
Descriptors: Test Construction, Science Tests, Science Instruction, Diagnostic Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024
Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…
Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Jeanne Sinclair – Critical Inquiry in Language Studies, 2025
In this paper, the White listening subject takes the form of a standardized high-stakes reading test, the State of Texas Assessment of Academic Readiness (STAAR). Although the test does not actually listen, it 'hears' and evaluates children's responses to its questions. I present the results of the 2017 Grade 8 reading exams, from the March, May,…
Descriptors: High Stakes Tests, Standardized Tests, Reading Tests, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  9086