NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 139 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
James Soland – Journal of Research on Educational Effectiveness, 2024
When randomized control trials are not possible, quasi-experimental methods often represent the gold standard. One quasi-experimental method is difference-in-difference (DiD), which compares changes in outcomes before and after treatment across groups to estimate a causal effect. DiD researchers often use fairly exhaustive robustness checks to…
Descriptors: Item Response Theory, Testing, Test Validity, Intervention
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Aomi, Itsuki; Tsutsumi, Emiko; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2023
In automated essay scoring (AES), essays are automatically graded without human raters. Many AES models based on various manually designed features or various architectures of deep neural networks (DNNs) have been proposed over the past few decades. Each AES model has unique advantages and characteristics. Therefore, rather than using a single-AES…
Descriptors: Prediction, Scores, Computer Assisted Testing, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Jinnie; Gierl, Mark J. – Journal of Applied Testing Technology, 2022
Automated Essay Scoring (AES) technologies provide innovative solutions to score the written essays with a much shorter time span and at a fraction of the current cost. Traditionally, AES emphasized the importance of capturing the "coherence" of writing because abundant evidence indicated the connection between coherence and the overall…
Descriptors: Computer Assisted Testing, Scoring, Essays, Automation
Peer reviewed Peer reviewed
Direct linkDirect link
Markus T. Jansen; Ralf Schulze – Educational and Psychological Measurement, 2024
Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent…
Descriptors: Factor Analysis, Models, Item Analysis, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Eunsook Kim; Nathaniel von der Embse – Journal of Experimental Education, 2024
Using data from multiple informants has long been considered best practice in education. However, multiple informants often disagree on similar constructs, complicating decision-making. Polynomial regression and response-surface analysis (PRA) is often used to test the congruence effect between multiple informants on an outcome. However, PRA…
Descriptors: Congruence (Psychology), Information Sources, Best Practices, Regression (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Sami Baral; Eamon Worden; Wen-Chiang Lim; Zhuang Luo; Christopher Santorelli; Ashish Gurung; Neil Heffernan – Grantee Submission, 2024
The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research have explored methodologies to enhance the effectiveness of feedback to students in various ways. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated…
Descriptors: Automation, Scoring, Computer Assisted Testing, Natural Language Processing
Peer reviewed Peer reviewed
Direct linkDirect link
Thomas, Gary – British Educational Research Journal, 2021
Natural scientists are relaxed about the multiple forms experiment takes in their various fields. Yet in education we have for many years constrained our notion of experiment. This methodological circumscription has been self-imposed on the grounds that experiment of a particular, well-defined form offers the clearest evidence of a link between…
Descriptors: Educational Experiments, Models, Intervention, Context Effect
Peer reviewed Peer reviewed
Direct linkDirect link
Christopher D. Wilson; Kevin C. Haudek; Jonathan F. Osborne; Zoë E. Buck Bracey; Tina Cheuk; Brian M. Donovan; Molly A. M. Stuhlsatz; Marisol M. Santiago; Xiaoming Zhai – Journal of Research in Science Teaching, 2024
Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning--a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the…
Descriptors: Middle School Students, Competence, Science Process Skills, Persuasive Discourse
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Okano, Masashi – IEEE Transactions on Learning Technologies, 2021
In automated essay scoring (AES), scores are automatically assigned to essays as an alternative to grading by humans. Traditional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks to obviate the need for feature engineering. Those AES models generally require training on a…
Descriptors: Essays, Scoring, Writing Evaluation, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Raykov, Tenko; Marcoulides, George A.; Huber, Chuck – Measurement: Interdisciplinary Research and Perspectives, 2020
It is demonstrated that the popular three-parameter logistic model can lead to markedly inaccurate individual ability level estimates for mixture populations. A theoretically and empirically important setting is initially considered where (a) in one of two subpopulations (latent classes) the two-parameter logistic model holds for each item in a…
Descriptors: Item Response Theory, Models, Measurement Techniques, Item Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Levin, Nathan A. – Journal of Educational Data Mining, 2021
The Big Data for Education Spoke of the NSF Northeast Big Data Innovation Hub and ETS co-sponsored an educational data mining competition in which contestants were asked to predict efficient time use on the NAEP 8th grade mathematics computer-based assessment, based on the log file of a student's actions on a prior portion of the assessment. In…
Descriptors: Learning Analytics, Data Collection, Competition, Prediction
Michael Gilraine; Jeffrey Penney – Annenberg Institute for School Reform at Brown University, 2021
An administrative rule allowed students who failed an exam to retake it shortly after, triggering strong `teach to the test' incentives to raise these students' test scores for the retake. We develop a model that accounts for truncation and find that these students score 0.14 standard deviations higher on the retest. Using a regression…
Descriptors: Tests, Models, Scores, Test Coaching
Ayodele, Alicia Nicole – ProQuest LLC, 2017
Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…
Descriptors: Statistical Analysis, Test Bias, Test Items, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M. – Journal of Psychoeducational Assessment, 2018
We investigated the classification accuracy of learning disability (LD) identification methods premised on the identification of an intraindividual pattern of processing strengths and weaknesses (PSW) method using multiple indicators for all latent constructs. Known LD status was derived from latent scores; values at the observed level identified…
Descriptors: Accuracy, Learning Disabilities, Classification, Identification
Peer reviewed Peer reviewed
Direct linkDirect link
Hsiao, Yu-Yu; Kwok, Oi-Man; Lai, Mark H. C. – Educational and Psychological Measurement, 2018
Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural…
Descriptors: Error of Measurement, Testing, Scores, Models
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10