NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 7 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Yunting Liu; Shreya Bhandari; Zachary A. Pardos – British Journal of Educational Technology, 2025
Effective educational measurement relies heavily on the curation of well-designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT-3.5, GPT-4,…
Descriptors: Artificial Intelligence, Test Items, Psychometrics, Educational Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Diego Cortes; Dirk Hastedt; Sabine Meinck – Large-scale Assessments in Education, 2025
This paper informs users of data collected in international large-scale assessments (ILSA), by presenting argumentsunderlining the importance of considering two design features employed in these studies. We examine a commonmisconception stating that the uncertainty arising from the assessment design is negligible compared with that arisingfrom the…
Descriptors: Sampling, Research Design, Educational Assessment, Statistical Inference
Peer reviewed Peer reviewed
Direct linkDirect link
Betsy Wolf – Society for Research on Educational Effectiveness, 2024
Introduction: The What Works Clearinghouse (WWC) reviews rigorous research on educational interventions with a goal of identifying "what works" and making that information accessible to educators and policymakers. The WWC has historically prioritized internal validity over external validity in rating the quality of research. One critique…
Descriptors: Educational Assessment, Educational Research, Validity, Research Utilization
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Qian, Jiahe; Li, Shuhong – ETS Research Report Series, 2021
In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no…
Descriptors: Models, Regression (Statistics), Language Tests, Quality Control
Peer reviewed Peer reviewed
Direct linkDirect link
Chan, Wendy – Journal of Educational and Behavioral Statistics, 2018
Policymakers have grown increasingly interested in how experimental results may generalize to a larger population. However, recently developed propensity score-based methods are limited by small sample sizes, where the experimental study is generalized to a population that is at least 20 times larger. This is particularly problematic for methods…
Descriptors: Computation, Generalization, Probability, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Chan, Wendy – Journal of Research on Educational Effectiveness, 2017
Recent methods to improve generalizations from nonrandom samples typically invoke assumptions such as the strong ignorability of sample selection, which is challenging to meet in practice. Although researchers acknowledge the difficulty in meeting this assumption, point estimates are still provided and used without considering alternative…
Descriptors: Generalization, Inferences, Probability, Educational Research
Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020
Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…
Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis