NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
What Works Clearinghouse Rating
Showing 1 to 15 of 119 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025
Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…
Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes
Peer reviewed Peer reviewed
Direct linkDirect link
Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024
Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…
Descriptors: Scoring, Essays, Writing Evaluation, Memory
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Caroline F. Rowland; Amy Bidgood; Gary Jones; Andrew Jessop; Paula Stinson; Julian M. Pine; Samantha Durrant; Michelle S. Peter – Language Learning, 2025
A strong predictor of children's language is performance on non-word repetition (NWR) tasks. However, the basis of this relationship remains unknown. Some suggest that NWR tasks measure phonological working memory, which then affects language growth. Others argue that children's knowledge of language/language experience affects NWR performance. A…
Descriptors: Vocabulary Development, Comparative Analysis, Computational Linguistics, Language Skills
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018
Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…
Descriptors: Grading, Models, Reliability, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024
This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…
Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Verhavert, San; Bouwer, Renske; Donche, Vincent; De Maeyer, Sven – Assessment in Education: Principles, Policy & Practice, 2019
Comparative Judgement (CJ) aims to improve the quality of performance-based assessments by letting multiple assessors judge pairs of performances. CJ is generally associated with high levels of reliability, but there is also a large variation in reliability between assessments. This study investigates which assessment characteristics influence the…
Descriptors: Meta Analysis, Reliability, Comparative Analysis, Value Judgment
Peer reviewed Peer reviewed
Direct linkDirect link
Pedder, Hugo; Boucher, Martin; Dias, Sofia; Bennetts, Margherita; Welton, Nicky J. – Research Synthesis Methods, 2020
Time-course model-based network meta-analysis (MBNMA) has been proposed as a framework to combine treatment comparisons from a network of randomized controlled trials reporting outcomes at multiple time-points. This can explain heterogeneity/inconsistency that arises by pooling studies with different follow-up times and allow inclusion of studies…
Descriptors: Simulation, Randomized Controlled Trials, Meta Analysis, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Madison, Matthew J. – Educational Measurement: Issues and Practice, 2019
Recent advances have enabled diagnostic classification models (DCMs) to accommodate longitudinal data. These longitudinal DCMs were developed to study how examinees change, or transition, between different attribute mastery statuses over time. This study examines using longitudinal DCMs as an approach to assessing growth and serves three purposes:…
Descriptors: Longitudinal Studies, Item Response Theory, Psychometrics, Criterion Referenced Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Moeller, Julia; Viljaranta, Jaana; Kracke, Bärbel; Dietrich, Julia – Frontline Learning Research, 2020
This article proposes a study design developed to disentangle the objective characteristics of a learning situation from individuals' subjective perceptions of that situation. The term objective characteristics refers to the agreement across students, whereas subjective perceptions refers to inter-individual heterogeneity. We describe a novel…
Descriptors: Student Attitudes, College Students, Lecture Method, Student Interests
Peer reviewed Peer reviewed
Direct linkDirect link
van Valkenhoef, Gert; Dias, Sofia; Ades, A. E.; Welton, Nicky J. – Research Synthesis Methods, 2016
Network meta-analysis enables the simultaneous synthesis of a network of clinical trials comparing any number of treatments. Potential inconsistencies between estimates of relative treatment effects are an important concern, and several methods to detect inconsistency have been proposed. This paper is concerned with the node-splitting approach,…
Descriptors: Networks, Meta Analysis, Automation, Models
Botarleanu, Robert-Mihai; Dascalu, Mihai; Watanabe, Micah; Crossley, Scott Andrew; McNamara, Danielle S. – Grantee Submission, 2022
Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase…
Descriptors: Age Differences, Vocabulary Development, Correlation, Reading Comprehension
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Storme, Martin; Myszkowski, Nils; Baron, Simon; Bernard, David – Journal of Intelligence, 2019
Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in…
Descriptors: Intelligence Tests, Item Response Theory, Comparative Analysis, Test Reliability
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8