ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	5
Since 2017 (last 10 years)	8
Since 2007 (last 20 years)	10

Descriptor

Computer Assisted Testing	11
Decision Making	11
Scoring	11
Item Analysis	4
Student Evaluation	4
Evaluation Methods	3
Test Items	3
Test Validity	3
Validity	3
Accuracy	2
Automation	2
Cognitive Processes	2
Computer Software	2
Correlation	2
Ethics	2
Evaluators	2
Foreign Countries	2
Formative Evaluation	2
Grading	2
Learning Analytics	2
Multiple Choice Tests	2
Reliability	2
Scores	2
Second Language Learning	2
Task Analysis	2
More ▼

Source

Applied Measurement in…	1
Communique	1
Creativity Research Journal	1
Innovations in Education and…	1
International Electronic…	1
International Online Journal…	1
Journal of Cognition and…	1
Journal of Educational…	1
Language Testing	1
ProQuest LLC	1

Publication Type

Journal Articles	9
Reports - Research	8
Reports - Descriptive	2
Dissertations/Theses -…	1
Speeches/Meeting Papers	1

Education Level

Higher Education	2
Postsecondary Education	2
Elementary Education	1
Grade 8	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

China	1
Illinois	1
North Carolina (Greensboro)	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Family Educational Rights and…	1
Health Insurance Portability…	1
Individuals with Disabilities…	1

Assessments and Surveys

National Assessment of…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Assessing the Ethical Capabilities of Chat GPT in Healthcare: A Study on Its Proficiency in Situational Judgement Test

Peer reviewed

Direct link

Kunal Sareen – Innovations in Education and Teaching International, 2024

This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…

Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software

Decoding Student Insights: Analyzing Response Change in NAEP Mathematics Constructed Response Items

Peer reviewed
PDF on ERIC

Download full text

Congning Ni; Bhashithe Abeysinghe; Juanita Hicks – International Electronic Journal of Elementary Education, 2025

The National Assessment of Educational Progress (NAEP), often referred to as The Nation's Report Card, offers a window into the state of U.S. K-12 education system. Since 2017, NAEP has transitioned to digital assessments, opening new research opportunities that were previously impossible. Process data tracks students' interactions with the…

Descriptors: Reaction Time, Multiple Choice Tests, Behavior Change, National Competency Tests

Semantic Distance and the Alternate Uses Task: Recommendations for Reliable Automated Assessment of Originality

Peer reviewed

Direct link

Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022

Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…

Descriptors: Semantics, Scoring, Creative Thinking, Creativity

A Comparative Judgment Approach to Assessing Chinese Sign Language Interpreting

Peer reviewed

Direct link

Han, Chao; Xiao, Xiaoyan – Language Testing, 2022

The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…

Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators

Virtual Cognitive Assessment: Legal and Ethical Considerations

Direct link

Carlson, Tiffany; Crepeau-Hobson, Franci – Communique, 2021

When the coronavirus pandemic was declared a public health crisis in March 2020, school psychologists were forced into situations where face-to-face interaction with their students was discouraged and in some cases, prohibited. Consequently, the traditional practice of school psychology abruptly ended. Individualized Education Plans (IEP) and…

Descriptors: Cognitive Tests, Ethics, Decision Making, Models

A Review of Digital Formative Assessment Tools: Features and Future Directions

Peer reviewed
PDF on ERIC

Download full text

Çekiç, Ahmet; Bakla, Arif – International Online Journal of Education and Teaching, 2021

The Internet and the software stores for mobile devices come with a huge number of digital tools for any task, and those intended for digital formative assessment (DFA) have burgeoned exponentially in the last decade. These tools vary in terms of their functionality, pedagogical quality, cost, operating systems and so forth. Teachers and learners…

Descriptors: Formative Evaluation, Futures (of Society), Computer Assisted Testing, Guidance

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

Not so Fast: Reassessing Gender Essentialism in Young Adults

Peer reviewed

Direct link

Eidson, R. Cole; Coley, John D. – Journal of Cognition and Development, 2014

We examined young adults' essentialist reasoning about gender categories. Previous developmental results suggest that until age 9 or 10, children show marked essentialist reasoning about gender, but this disappears by early adulthood. In contrast, results from social cognition suggest that essentialist thinking about social categories persists…

Descriptors: Undergraduate Students, Gender Differences, Social Cognition, Task Analysis

Rater Expertise in a Second Language Speaking Assessment: The Influence of Training and Experience

Direct link

Davis, Lawrence Edward – ProQuest LLC, 2012

Speaking performance tests typically employ raters to produce scores; accordingly, variability in raters' scoring decisions has important consequences for test reliability and validity. One such source of variability is the rater's level of expertise in scoring. Therefore, it is important to understand how raters' performance is influenced by…

Descriptors: Evaluators, Expertise, Scores, Second Language Learning

Confidence in Pass/Fail Decisions for Computer Adaptive and Paper and Pencil Examinations.

Bergstrom, Betty A.; Lunz, Mary E. – 1991

The level of confidence in pass/fail decisions obtained with computer adaptive tests (CATs) was compared to decisions based on paper-and-pencil tests. Subjects included 645 medical technology students from 238 educational programs across the country. The tests used in this study constituted part of the subjects' review for the certification…

Descriptors: Adaptive Testing, Certification, Comparative Testing, Computer Assisted Testing

Bakla, Arif	1
Beaty, Roger E.	1
Bergstrom, Betty A.	1
Bhashithe Abeysinghe	1
Carlson, Tiffany	1
Coley, John D.	1
Congning Ni	1
Crepeau-Hobson, Franci	1
Davis, Lawrence Edward	1
Eidson, R. Cole	1
Forthmann, Boris	1
Han, Chao	1
Jinnie Shin	1
Johnson, Dan R.	1
Juanita Hicks	1
Kunal Sareen	1
Lunz, Mary E.	1
Rupp, André A.	1
Wallace N. Pinto Jr.	1
Xiao, Xiaoyan	1
Zeitlen, Daniel C.	1
Çekiç, Ahmet	1
More ▼