NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023
Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…
Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Finch, Holmes – Practical Assessment, Research & Evaluation, 2022
Researchers in many disciplines work with ranking data. This data type is unique in that it is often deterministic in nature (the ranks of items "k"-1 determine the rank of item "k"), and the difference in a pair of rank scores separated by "k" units is equivalent regardless of the actual values of the two ranks in…
Descriptors: Data Analysis, Statistical Inference, Models, College Faculty
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel – Journal of Educational Data Mining, 2021
Automatic machine learning (AutoML) methods automate the time-consuming, feature-engineering process so that researchers produce accurate student models more quickly and easily. In this paper, we compare two AutoML feature engineering methods in the context of the National Assessment of Educational Progress (NAEP) data mining competition. The…
Descriptors: Accuracy, Learning Analytics, Models, National Competency Tests
Peer reviewed Peer reviewed
Direct linkDirect link
von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023
Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…
Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education
Susan Rowe – ProQuest LLC, 2023
This dissertation explored whether unnecessary linguistic complexity (LC) in mathematics and biology assessment items changes the direction and significance of differential item functioning (DIF) between subgroups emergent bilinguals (EBs) and English proficient students (EPs). Due to inconsistencies in measuring LC in items, Study One adapted a…
Descriptors: Difficulty Level, English for Academic Purposes, Second Language Learning, Second Language Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Szarkowska, Agnieszka; Krejtz, Krzysztof; Dutka, Lukasz; Pilipczuk, Olga – Interpreter and Translator Trainer, 2018
In this study, we examined whether interpreters and interpreting trainees are better predisposed to respeaking than people with no interpreting skills. We tested 57 participants (22 interpreters, 23 translators and 12 controls) while respeaking 5-minute videos with two parameters: speech rate (fast/slow) and number of speakers (one/many). Having…
Descriptors: Translation, Comparative Analysis, Professional Personnel, Video Technology
Peer reviewed Peer reviewed
Direct linkDirect link
Scheirer, Mary Ann; Mark, Melvin M.; Brooks, Ariana; Grob, George F.; Chapel, Thomas J.; Geisz, Mary; McKaughan, Molly; Leviton, Laura – American Journal of Evaluation, 2012
Linking evaluation methods to the several phases of a program's life cycle can provide evaluation planners and funders with guidance about what types of evaluation are most appropriate over the trajectory of social and educational programs and other interventions. If methods are matched to the needs of program phases, evaluation can and should…
Descriptors: Evidence, Evaluation Methods, Program Development, Life Cycle Costing
Peer reviewed Peer reviewed
Direct linkDirect link
Huber, Amy Mattingly; Leigh, Katharine E.; Tremblay, Kenneth R., Jr. – College Student Journal, 2012
The creative process is a multifaceted and dynamic path of thinking required to execute a project in design-based disciplines. The goal of this research was to test a model outlining the creative design process by investigating student experiences in a design project assignment. The study used an exploratory design to collect data from student…
Descriptors: Interior Design, Creativity, Creative Thinking, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Coryn, Chris L. S.; Hattie, John A.; Scriven, Michael; Hartmann, David J. – American Journal of Evaluation, 2007
This research describes, classifies, and comparatively evaluates national models and mechanisms used to evaluate research and allocate research funding in 16 countries. Although these models and mechanisms vary widely in terms of how research is evaluated and financed, nearly all share the common characteristic of relating funding to some measure…
Descriptors: Ethics, Evaluation Methods, Comparative Analysis, Resource Allocation
Kenyon, Dorry; Stansfield, Charles W. – 1993
This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…
Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics