ERIC - Search Results

Publication Date

In 2025	2
Since 2024	3
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	7

Descriptor

Educational Assessment	7
Sampling	7
Educational Research	3
Achievement Tests	2
Elementary Secondary Education	2
Foreign Countries	2
Generalization	2
Grade 4	2
International Assessment	2
Probability	2
Psychometrics	2
Quality Control	2
Reading Achievement	2
Reading Tests	2
Scores	2
Scoring	2
Test Items	2
Academic Achievement	1
Accuracy	1
Artificial Intelligence	1
Capacity Building	1
Causal Models	1
Classification	1
College Students	1
Comparative Analysis	1
More ▼

Source

British Journal of…	1
ETS Research Report Series	1
International Association for…	1
Journal of Educational and…	1
Journal of Research on…	1
Large-scale Assessments in…	1
Society for Research on…	1

Author

Chan, Wendy	2
Betsy Wolf	1
Diego Cortes	1
Dirk Hastedt	1
Li, Shuhong	1
Qian, Jiahe	1
Sabine Meinck	1
Shreya Bhandari	1
Wagemaker, Hans, Ed.	1
Yunting Liu	1
Zachary A. Pardos	1
More ▼

Publication Type

Reports - Research	6
Journal Articles	5
Books	1
Collected Works - General	1

Education Level

Elementary Education	3
Elementary Secondary Education	2
Grade 4	2
Intermediate Grades	2
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Indiana

Laws, Policies, & Programs

Assessments and Surveys

Progress in International…	2
International Association for…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Leveraging LLM Respondents for Item Evaluation: A Psychometric Analysis

Peer reviewed

Direct link

Yunting Liu; Shreya Bhandari; Zachary A. Pardos – British Journal of Educational Technology, 2025

Effective educational measurement relies heavily on the curation of well-designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT-3.5, GPT-4,…

Descriptors: Artificial Intelligence, Test Items, Psychometrics, Educational Assessment

Evaluating Uncertainty: The Impact of the Sampling and Assessment Design on Statistical Inference in the Context of ILSA

Peer reviewed

Direct link

Diego Cortes; Dirk Hastedt; Sabine Meinck – Large-scale Assessments in Education, 2025

This paper informs users of data collected in international large-scale assessments (ILSA), by presenting argumentsunderlining the importance of considering two design features employed in these studies. We examine a commonmisconception stating that the uncertainty arising from the assessment design is negligible compared with that arisingfrom the…

Descriptors: Sampling, Research Design, Educational Assessment, Statistical Inference

What Works for Whom: Subgroup Effects, Compositional Effects, and so Much Treatment Heterogeneity

Peer reviewed

Direct link

Betsy Wolf – Society for Research on Educational Effectiveness, 2024

Introduction: The What Works Clearinghouse (WWC) reviews rigorous research on educational interventions with a goal of identifying "what works" and making that information accessible to educators and policymakers. The WWC has historically prioritized internal validity over external validity in rating the quality of research. One critique…

Descriptors: Educational Assessment, Educational Research, Validity, Research Utilization

Model Adequacy Checking for Applying Harmonic Regression to Assessment Quality Control. Research Report. ETS RR-21-13

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe; Li, Shuhong – ETS Research Report Series, 2021

In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no…

Descriptors: Models, Regression (Statistics), Language Tests, Quality Control

Applications of Small Area Estimation to Generalization with Subclassification by Propensity Scores

Peer reviewed

Direct link

Chan, Wendy – Journal of Educational and Behavioral Statistics, 2018

Policymakers have grown increasingly interested in how experimental results may generalize to a larger population. However, recently developed propensity score-based methods are limited by small sample sizes, where the experimental study is generalized to a population that is at least 20 times larger. This is particularly problematic for methods…

Descriptors: Computation, Generalization, Probability, Sample Size

Partially Identified Treatment Effects for Generalizability

Peer reviewed

Direct link

Chan, Wendy – Journal of Research on Educational Effectiveness, 2017

Recent methods to improve generalizations from nonrandom samples typically invoke assumptions such as the strong ignorability of sample selection, which is challenging to meet in practice. Although researchers acknowledge the difficulty in meeting this assumption, point estimates are still provided and used without considering alternative…

Descriptors: Generalization, Inferences, Probability, Educational Research

Reliability and Validity of International Large-Scale Assessment: Understanding IEA's Comparative Studies of Student Achievement. IEA Research for Education. Volume 10

Download full text

Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020

Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…

Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis