ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	12

Descriptor

Comparative Analysis	13
Evaluators	13
Models	13
Computer Software	6
Item Analysis	6
Accuracy	5
Scores	5
Evaluation Methods	4
Mathematics Tests	4
Scoring	4
Test Items	4
English (Second Language)	3
Foreign Countries	3
Artificial Intelligence	2
Classification	2
Computer Assisted Testing	2
Data Analysis	2
Essays	2
Interrater Reliability	2
Item Response Theory	2
Language Proficiency	2
Language Tests	2
Motivation	2
Networks	2
Science Tests	2
More ▼

Source

American Journal of Evaluation	2
Applied Measurement in…	1
College Student Journal	1
ETS Research Report Series	1
Educational and Psychological…	1
International Educational…	1
Interpreter and Translator…	1
Journal of Educational Data…	1
Language Testing	1
Practical Assessment,…	1
ProQuest LLC	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	10
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Opinion Papers	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	2
Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 4	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Location

Massachusetts	1
Poland	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023

Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…

Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

An Introduction to the Analysis of Ranked Response Data

Peer reviewed
PDF on ERIC

Download full text

Finch, Holmes – Practical Assessment, Research & Evaluation, 2022

Researchers in many disciplines work with ranking data. This data type is unique in that it is often deterministic in nature (the ranks of items "k"-1 determine the rank of item "k"), and the difference in a pair of rank scores separated by "k" units is equivalent regardless of the actual values of the two ranks in…

Descriptors: Data Analysis, Statistical Inference, Models, College Faculty

AutoML Feature Engineering for Student Modeling Yields High Accuracy, but Limited Interpretability

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel – Journal of Educational Data Mining, 2021

Automatic machine learning (AutoML) methods automate the time-consuming, feature-engineering process so that researchers produce accurate student models more quickly and easily. In this paper, we compare two AutoML feature engineering methods in the context of the National Assessment of Educational Progress (NAEP) data mining competition. The…

Descriptors: Accuracy, Learning Analytics, Models, National Competency Tests

Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks

Peer reviewed

Direct link

von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…

Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education

Examining the Effects of Linguistic Complexity on Emergent Bilinguals' Academic Content Performance

Direct link

Susan Rowe – ProQuest LLC, 2023

This dissertation explored whether unnecessary linguistic complexity (LC) in mathematics and biology assessment items changes the direction and significance of differential item functioning (DIF) between subgroups emergent bilinguals (EBs) and English proficient students (EPs). Due to inconsistencies in measuring LC in items, Study One adapted a…

Descriptors: Difficulty Level, English for Academic Purposes, Second Language Learning, Second Language Instruction

Are Interpreters Better Respeakers?

Peer reviewed

Direct link

Szarkowska, Agnieszka; Krejtz, Krzysztof; Dutka, Lukasz; Pilipczuk, Olga – Interpreter and Translator Trainer, 2018

In this study, we examined whether interpreters and interpreting trainees are better predisposed to respeaking than people with no interpreting skills. We tested 57 participants (22 interpreters, 23 translators and 12 controls) while respeaking 5-minute videos with two parameters: speech rate (fast/slow) and number of speakers (one/many). Having…

Descriptors: Translation, Comparative Analysis, Professional Personnel, Video Technology

Planning Evaluation through the Program Life Cycle

Peer reviewed

Direct link

Scheirer, Mary Ann; Mark, Melvin M.; Brooks, Ariana; Grob, George F.; Chapel, Thomas J.; Geisz, Mary; McKaughan, Molly; Leviton, Laura – American Journal of Evaluation, 2012

Linking evaluation methods to the several phases of a program's life cycle can provide evaluation planners and funders with guidance about what types of evaluation are most appropriate over the trajectory of social and educational programs and other interventions. If methods are matched to the needs of program phases, evaluation can and should…

Descriptors: Evidence, Evaluation Methods, Program Development, Life Cycle Costing

Creativity Processes of Students in the Design Studio

Peer reviewed

Direct link

Huber, Amy Mattingly; Leigh, Katharine E.; Tremblay, Kenneth R., Jr. – College Student Journal, 2012

The creative process is a multifaceted and dynamic path of thinking required to execute a project in design-based disciplines. The goal of this research was to test a model outlining the creative design process by investigating student experiences in a design project assignment. The study used an exploratory design to collect data from student…

Descriptors: Interior Design, Creativity, Creative Thinking, Evaluators

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Models and Mechanisms for Evaluating Government-Funded Research: An International Comparison

Peer reviewed

Direct link

Coryn, Chris L. S.; Hattie, John A.; Scriven, Michael; Hartmann, David J. – American Journal of Evaluation, 2007

This research describes, classifies, and comparatively evaluates national models and mechanisms used to evaluate research and allocate research funding in 16 countries. Although these models and mechanisms vary widely in terms of how research is evaluated and financed, nearly all share the common characteristic of relating funding to some measure…

Descriptors: Ethics, Evaluation Methods, Comparative Analysis, Resource Allocation

Evaluating the Efficacy of Rater Self-Training.

Download full text

Kenyon, Dorry; Stansfield, Charles W. – 1993

This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…

Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics

Bosch, Nigel	1
Breyer, F. Jay	1
Brooks, Ariana	1
Chapel, Thomas J.	1
Coryn, Chris L. S.	1
Dutka, Lukasz	1
Finch, Holmes	1
Geisz, Mary	1
Gierl, Mark J.	1
Grob, George F.	1
Hartmann, David J.	1
Hattie, John A.	1
Heffernan, Neil	1
Huber, Amy Mattingly	1
Kenyon, Dorry	1
Khorramdel, Lale	1
Krejtz, Krzysztof	1
Lan, Andrew	1
Lee, Won-Chan	1
Leigh, Katharine E.	1
Leviton, Laura	1
Lorenz, Florian	1
Mark, Melvin M.	1
McKaughan, Molly	1
Pilipczuk, Olga	1
More ▼