NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
What Works Clearinghouse Rating
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Kunal Sareen – Innovations in Education and Teaching International, 2024
This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…
Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Congning Ni; Bhashithe Abeysinghe; Juanita Hicks – International Electronic Journal of Elementary Education, 2025
The National Assessment of Educational Progress (NAEP), often referred to as The Nation's Report Card, offers a window into the state of U.S. K-12 education system. Since 2017, NAEP has transitioned to digital assessments, opening new research opportunities that were previously impossible. Process data tracks students' interactions with the…
Descriptors: Reaction Time, Multiple Choice Tests, Behavior Change, National Competency Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022
Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…
Descriptors: Semantics, Scoring, Creative Thinking, Creativity
Peer reviewed Peer reviewed
Direct linkDirect link
Han, Chao; Xiao, Xiaoyan – Language Testing, 2022
The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…
Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators
Carlson, Tiffany; Crepeau-Hobson, Franci – Communique, 2021
When the coronavirus pandemic was declared a public health crisis in March 2020, school psychologists were forced into situations where face-to-face interaction with their students was discouraged and in some cases, prohibited. Consequently, the traditional practice of school psychology abruptly ended. Individualized Education Plans (IEP) and…
Descriptors: Cognitive Tests, Ethics, Decision Making, Models
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Çekiç, Ahmet; Bakla, Arif – International Online Journal of Education and Teaching, 2021
The Internet and the software stores for mobile devices come with a huge number of digital tools for any task, and those intended for digital formative assessment (DFA) have burgeoned exponentially in the last decade. These tools vary in terms of their functionality, pedagogical quality, cost, operating systems and so forth. Teachers and learners…
Descriptors: Formative Evaluation, Futures (of Society), Computer Assisted Testing, Guidance
Peer reviewed Peer reviewed
Direct linkDirect link
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Peer reviewed Peer reviewed
Direct linkDirect link
Eidson, R. Cole; Coley, John D. – Journal of Cognition and Development, 2014
We examined young adults' essentialist reasoning about gender categories. Previous developmental results suggest that until age 9 or 10, children show marked essentialist reasoning about gender, but this disappears by early adulthood. In contrast, results from social cognition suggest that essentialist thinking about social categories persists…
Descriptors: Undergraduate Students, Gender Differences, Social Cognition, Task Analysis
Davis, Lawrence Edward – ProQuest LLC, 2012
Speaking performance tests typically employ raters to produce scores; accordingly, variability in raters' scoring decisions has important consequences for test reliability and validity. One such source of variability is the rater's level of expertise in scoring. Therefore, it is important to understand how raters' performance is influenced by…
Descriptors: Evaluators, Expertise, Scores, Second Language Learning
Bergstrom, Betty A.; Lunz, Mary E. – 1991
The level of confidence in pass/fail decisions obtained with computer adaptive tests (CATs) was compared to decisions based on paper-and-pencil tests. Subjects included 645 medical technology students from 238 educational programs across the country. The tests used in this study constituted part of the subjects' review for the certification…
Descriptors: Adaptive Testing, Certification, Comparative Testing, Computer Assisted Testing