NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 83 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Gustafsson, Martin; Barakat, Bilal Fouad – Comparative Education Review, 2023
International assessments inform education policy debates, yet little is known about their floor effects: To what extent do they fail to differentiate between the lowest performers, and what are the implications of this? TIMSS, SACMEQ, and LLECE data are analyzed to answer this question. In TIMSS, floor effects have been reduced through the…
Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Lawrence T. DeCarlo – Educational and Psychological Measurement, 2024
A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization…
Descriptors: Test Format, Multiple Choice Tests, Item Response Theory, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Kosh, Audra E.; Simpson, Mary Ann; Bickel, Lisa; Kellogg, Mark; Sanford-Moore, Ellie – Educational Measurement: Issues and Practice, 2019
Automatic item generation (AIG)--a means of leveraging technology to create large quantities of items--requires a minimum number of items to offset the sizable upfront investment (i.e., model development and technology deployment) in order to achieve cost savings. In this cost-benefit analysis, we estimated the cost of each step of AIG and manual…
Descriptors: Cost Effectiveness, Automation, Test Items, Mathematics Tests
Peer reviewed Peer reviewed
Sami Baral; Li Lucy; Ryan Knight; Alice Ng; Luca Soldaini; Neil T. Heffernan; Kyle Lo – Grantee Submission, 2024
In real-world settings, vision language models (VLMs) should robustly handle naturalistic, noisy visual content as well as domain-specific language and concepts. For example, K-12 educators using digital learning platforms may need to examine and provide feedback across many images of students' math work. To assess the potential of VLMs to support…
Descriptors: Visual Learning, Visual Perception, Natural Language Processing, Freehand Drawing
Peer reviewed Peer reviewed
Direct linkDirect link
Gantt, Allison L.; Paoletti, Teo; Corven, Julien – International Journal of Science and Mathematics Education, 2023
Covariational reasoning (or the coordination of two dynamically changing quantities) is central to secondary STEM subjects, but research has yet to fully explore its applicability to elementary and middle-grade levels within various STEM fields. To address this need, we selected a globally referenced STEM assessment--the Trends in International…
Descriptors: Incidence, Abstract Reasoning, Mathematics Education, Science Education
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Nana; Bolt, Daniel M. – Educational and Psychological Measurement, 2021
This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representing the mixture of respondents following different underlying response processes (between individuals), as well as the uncertainty present at the…
Descriptors: Item Response Theory, Response Style (Tests), Models, Test Items
National Academies Press, 2022
The National Assessment of Educational Progress (NAEP) -- often called "The Nation's Report Card" -- is the largest nationally representative and continuing assessment of what students in public and private schools in the United States know and can do in various subjects and has provided policy makers and the public with invaluable…
Descriptors: Costs, Futures (of Society), National Competency Tests, Educational Trends
Peer reviewed Peer reviewed
Direct linkDirect link
Cutumisu, Maria; Adams, Cathy; Lu, Chang – Journal of Science Education and Technology, 2019
Computational thinking (CT) is regarded as an essential twenty-first century competency and it is already embedded in K-12 curricula across the globe. However, research on assessing CT has lagged, with few assessments being implemented and validated. Moreover, there is a lack of systematic grouping of CT assessments. This scoping review examines…
Descriptors: Computation, Thinking Skills, 21st Century Skills, Elementary Secondary Education
Martin, Michael O., Ed.; von Davier, Matthias, Ed.; Mullis, Ina V. S., Ed. – International Association for the Evaluation of Educational Achievement, 2020
The chapters in this online volume comprise the TIMSS & PIRLS International Study Center's technical report of the methods and procedures used to develop, implement, and report the results of TIMSS 2019. There were various technical challenges because TIMSS 2019 was the initial phase of the transition to eTIMSS, with approximately half the…
Descriptors: Foreign Countries, Elementary Secondary Education, Achievement Tests, International Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Marbach, Joshua – Journal of Psychoeducational Assessment, 2017
The Mathematics Fluency and Calculation Tests (MFaCTs) are a series of measures designed to assess for arithmetic calculation skills and calculation fluency in children ages 6 through 18. There are five main purposes of the MFaCTs: (1) identifying students who are behind in basic math fact automaticity; (2) evaluating possible delays in arithmetic…
Descriptors: Mathematics Tests, Computation, Mathematics Skills, Arithmetic
Peer reviewed Peer reviewed
Direct linkDirect link
Banks, Kathleen – Educational Measurement: Issues and Practice, 2013
The purpose of this article was to present a synthesis of the peer-reviewed differential bundle functioning (DBF) research that has been conducted to date. A total of 16 studies were synthesized according to the following characteristics: tests used and learner groups, organizing principles used for developing bundles, DBF detection methods used,…
Descriptors: Test Bias, Research, Tests, Student Characteristics
McQuillan, Mark; Phelps, Richard P.; Stotsky, Sandra – Pioneer Institute for Public Policy Research, 2015
In July 2010, the Massachusetts Board of Elementary and Secondary Education (BESE) voted to adopt Common Core's standards in English language arts (ELA) and mathematics in place of the state's own standards in these two subjects. The vote was based largely on recommendations by Commissioner of Education Mitchell Chester and then Secretary of…
Descriptors: Reading Tests, Writing Tests, Achievement Tests, Common Core State Standards
Peer reviewed Peer reviewed
Direct linkDirect link
El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art – Assessment in Education: Principles, Policy & Practice, 2016
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Descriptors: International Assessment, Difficulty Level, Test Items, Language Variation
Peer reviewed Peer reviewed
Direct linkDirect link
Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012
The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…
Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling
Peer reviewed Peer reviewed
Direct linkDirect link
Black, Beth; Suto, Irenka; Bramley, Tom – Assessment in Education: Principles, Policy & Practice, 2011
In this paper we develop an evidence-based framework for considering many of the factors affecting marker agreement in GCSEs and A levels. A logical analysis of the demands of the marking task suggests a core grouping comprising: (i) question features; (ii) mark scheme features; and (iii) examinee response features. The framework synthesises…
Descriptors: Interrater Reliability, Grading, Scoring, High Stakes Tests
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6