Publication Date
| In 2026 | 0 |
| Since 2025 | 215 |
| Since 2022 (last 5 years) | 1084 |
| Since 2017 (last 10 years) | 2594 |
| Since 2007 (last 20 years) | 4955 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Daibao Guo; Katherine Landau Wright; Lianne Josbacher; Eun Hye Son – Elementary School Journal, 2025
Limited research has explored the use of visual displays (ViDis) in science tests, making it challenging to know how these tests align with classroom instruction and what skills students need to be successful on these tests. Therefore, the current study aims to describe the use of ViDis in upper elementary grade standardized science tests. We…
Descriptors: Standardized Tests, Science Tests, Elementary Education, Science Education
Yumou Wei; Paulo Carvalho; John Stamper – International Educational Data Mining Society, 2025
Educators evaluate student knowledge using knowledge component (KC) models that map assessment questions to KCs. Still, designing KC models for large question banks remains an insurmountable challenge for instructors who need to analyze each question by hand. The growing use of Generative AI in education is expected only to aggravate this chronic…
Descriptors: Artificial Intelligence, Cluster Grouping, Student Evaluation, Test Items
Sarah Alahmadi; Christine E. DeMars – Journal of Educational Measurement, 2025
Inadequate test-taking effort poses a significant challenge, particularly when low-stakes test results inform high-stakes policy and psychometric decisions. We examined how rapid guessing (RG), a common form of low test-taking effort, biases item parameter estimates, particularly the discrimination and difficulty parameters. Previous research…
Descriptors: Guessing (Tests), Computation, Statistical Bias, Test Items
Marcoulides, Katerina M. – Measurement: Interdisciplinary Research and Perspectives, 2023
Integrative data analyses have recently been shown to be an effective tool for researchers interested in synthesizing datasets from multiple studies in order to draw statistical or substantive conclusions. The actual process of integrating the different datasets depends on the availability of some common measures or items reflecting the same…
Descriptors: Data Analysis, Synthesis, Test Items, Simulation
Said Al Faraby; Adiwijaya Adiwijaya; Ade Romadhony – International Journal of Artificial Intelligence in Education, 2024
Questioning plays a vital role in education, directing knowledge construction and assessing students' understanding. However, creating high-level questions requires significant creativity and effort. Automatic question generation is expected to facilitate the generation of not only fluent and relevant but also educationally valuable questions.…
Descriptors: Test Items, Automation, Computer Software, Input Output Analysis
Séverin Lions; María Paz Blanco; Pablo Dartnell; Carlos Monsalve; Gabriel Ortega; Julie Lemarié – Applied Measurement in Education, 2024
Multiple-choice items are universally used in formal education. Since they should assess learning, not test-wiseness or guesswork, they must be constructed following the highest possible standards. Hundreds of item-writing guides have provided guidelines to help test developers adopt appropriate strategies to define the distribution and sequence…
Descriptors: Test Construction, Multiple Choice Tests, Guidelines, Test Items
Jianbin Fu; TsungHan Ho; Xuan Tan – Practical Assessment, Research & Evaluation, 2025
Item parameter estimation using an item response theory (IRT) model with fixed ability estimates is useful in equating with small samples on anchor items. The current study explores the impact of three ability estimation methods (weighted likelihood estimation [WLE], maximum a posteriori [MAP], and posterior ability distribution estimation [PST])…
Descriptors: Item Response Theory, Test Items, Computation, Equated Scores
Robert J. Marzano; Bridget Cahill; Jeni Gotto; Brian J. Kosena; Michael Lynch; Lucy Pearson – Solution Tree, 2025
In "Test-Specific Thinking," the authors provide recommended practices, methods, and means for educators to implement structural schemas into teaching, helping students better prepare for tests and formulate stronger responses to certain question frames. Armed with a better understanding of how tests are designed, teachers will increase…
Descriptors: English Instruction, Language Arts, Mathematics Tests, Test Construction
Sanford R. Student – Journal of Educational Measurement, 2025
Vertical scales are intended to establish a common metric for scores on test forms targeting different levels of development in a specified domain. They are often constructed using common item, nonequivalent group designs that implicitly rely on the linking items being effectively free from differential item functioning (DIF) or the DIF being…
Descriptors: Scaling, Factor Analysis, Test Bias, Test Items
Engelhard, George – Educational and Psychological Measurement, 2023
The purpose of this study is to introduce a functional approach for modeling unfolding response data. Functional data analysis (FDA) has been used for examining cumulative item response data, but a functional approach has not been systematically used with unfolding response processes. A brief overview of FDA is presented and illustrated within the…
Descriptors: Data Analysis, Models, Responses, Test Items
Rodgers, Emily; D'Agostino, Jerome V.; Berenbon, Rebecca; Johnson, Tracy; Winkler, Christa – Journal of Early Childhood Literacy, 2023
Running Records are thought to be an excellent formative assessment tool because they generate results that educators can use to make their teaching more responsive. Despite the technical nature of scoring Running Records and the kinds of important decisions that are attached to their analysis, few studies have investigated assessor accuracy. We…
Descriptors: Formative Evaluation, Scoring, Accuracy, Difficulty Level
Becker, Benjamin; Weirich, Sebastian; Goldhammer, Frank; Debeer, Dries – Journal of Educational Measurement, 2023
When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe…
Descriptors: Test Construction, Automation, Models, Test Items
Yun-Kyung Kim; Li Cai – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2025
This paper introduces an application of cross-classified item response theory (IRT) modeling to an assessment utilizing the embedded standard setting (ESS) method (Lewis & Cook). The cross-classified IRT model is used to treat both item and person effects as random, where the item effects are regressed on the target performance levels (target…
Descriptors: Standard Setting (Scoring), Item Response Theory, Test Items, Difficulty Level
Ye Ma; Deborah J. Harris – Educational Measurement: Issues and Practice, 2025
Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item…
Descriptors: Computer Assisted Testing, Adaptive Testing, Item Response Theory, Test Items
Gerd Kortemeyer; Marina Babayeva; Giulia Polverini; Ralf Widenhorn; Bor Gregorcic – Physical Review Physics Education Research, 2025
We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics,…
Descriptors: Artificial Intelligence, Physics, Science Tests, Scientific Concepts

Peer reviewed
Direct link
