Publication Date
In 2025 | 3 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 17 |
Since 2016 (last 10 years) | 35 |
Since 2006 (last 20 years) | 89 |
Descriptor
Evaluation Methods | 170 |
Scoring | 170 |
Computer Assisted Testing | 77 |
Student Evaluation | 57 |
Testing | 38 |
Educational Assessment | 30 |
Test Construction | 30 |
Educational Testing | 29 |
Elementary Secondary Education | 27 |
Testing Problems | 25 |
Writing Evaluation | 25 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
Australia | 8 |
China | 4 |
Vermont | 4 |
Canada | 3 |
United States | 3 |
Connecticut | 2 |
Florida | 2 |
Hong Kong | 2 |
Idaho | 2 |
Kentucky | 2 |
New Hampshire | 2 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 3 |
No Child Left Behind Act 2001 | 3 |
Every Student Succeeds Act… | 2 |
Comprehensive Education… | 1 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Hacer Karamese – ProQuest LLC, 2022
Multistage adaptive testing (MST) has become popular in the testing industry because the research has shown that it combines the advantages of both linear tests and item-level computer adaptive testing (CAT). The previous research efforts primarily focused on MST design issues such as panel design, module length, test length, distribution of test…
Descriptors: Adaptive Testing, Scoring, Computer Assisted Testing, Design
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Andersen, Øistein E.; Yuan, Zheng; Watson, Rebecca; Cheung, Kevin Yet Fong – International Educational Data Mining Society, 2021
Automated essay scoring (AES), where natural language processing is applied to score written text, can underpin educational resources in blended and distance learning. AES performance has typically been reported in terms of correlation coefficients or agreement statistics calculated between a system and an expert human examiner. We describe the…
Descriptors: Evaluation Methods, Scoring, Essays, Computer Assisted Testing
Lynch, Sarah – Practical Assessment, Research & Evaluation, 2022
In today's digital age, tests are increasingly being delivered on computers. Many of these computer-based tests (CBTs) have been adapted from paper-based tests (PBTs). However, this change in mode of test administration has the potential to introduce construct-irrelevant variance, affecting the validity of score interpretations. Because of this,…
Descriptors: Computer Assisted Testing, Tests, Scores, Scoring
Madsen, Adrian; McKagan, Sarah B.; Sayre, Eleanor C. – Physics Teacher, 2020
Physics faculty care about their students learning physics content. In addition, they usually hope that their students will learn some deeper lessons about thinking critically and scientifically. They hope that as a result of taking a physics class, students will come to appreciate physics as a coherent and logical method of understanding the…
Descriptors: Science Instruction, Physics, Student Surveys, Student Attitudes
Han, Chao – Language Testing, 2022
Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the…
Descriptors: Translation, Language Tests, Testing, Evaluation Methods
Rafner, Janet; Biskjaer, Michael Mose; Zana, Blanka; Langsford, Steven; Bergenholtz, Carsten; Rahimi, Seyedahmad; Carugati, Andrea; Noy, Lior; Sherson, Jacob – Creativity Research Journal, 2022
Creativity assessments should be valid, reliable, and scalable to support various stakeholders (e.g., policy-makers, educators, corporations, and the general public) in their decision-making processes. Established initiatives toward scalable creativity assessments have relied on well-studied standardized tests. Although robust in many ways, most…
Descriptors: Creativity, Evaluation Methods, Video Games, Computer Assisted Testing
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Cathy Cavanaugh; Bryn Humphrey; Paige Pullen – International Journal on E-Learning, 2024
To address needs in one US state to provide a professional development micro-credential for tens of thousands of educators, we automated an assignment scoring workflow in an online course by developing and refining an AI model to scan submitted assignments and score them against a rubric. This article outlines the AI model development process and…
Descriptors: Artificial Intelligence, Automation, Scoring, Microcredentials
Eran Hadas; Arnon Hershkovitz – Journal of Learning Analytics, 2025
Creativity is an imperative skill for today's learners, one that has important contributions to issues of inclusion and equity in education. Therefore, assessing creativity is of major importance in educational contexts. However, scoring creativity based on traditional tools suffers from subjectivity and is heavily time- and labour-consuming. This…
Descriptors: Creativity, Evaluation Methods, Computer Assisted Testing, Artificial Intelligence
Bradley J. Ungurait – ProQuest LLC, 2021
Advancements in technology and computer-based testing has allowed for greater flexibility in assessing examinee knowledge on large-scale, high-stakes assessments. Through computer-based delivery, cognitive ability and skills can be effectively assessed cost-efficiently and measure domains that are difficult or even impossible to measure with…
Descriptors: Computer Assisted Testing, Evaluation Methods, Scoring, Student Evaluation
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Chen, Dandan; Hebert, Michael; Wilson, Joshua – American Educational Research Journal, 2022
We used multivariate generalizability theory to examine the reliability of hand-scoring and automated essay scoring (AES) and to identify how these scoring methods could be used in conjunction to optimize writing assessment. Students (n = 113) included subsamples of struggling writers and non-struggling writers in Grades 3-5 drawn from a larger…
Descriptors: Reliability, Scoring, Essays, Automation
Mattern, Krista; Radunzel, Justine – ACT, Inc., 2019
When applicants take the ACT® more than once, how do colleges and universities reconcile and make sense of the multiple scores? In terms of validity, fairness, and impact on subgroup differences, are certain score-use polices better than others? The focus of this issue brief is to summarize evidence on the validity and fairness of various…
Descriptors: Scoring, College Entrance Examinations, Test Validity, Evaluation Methods