Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 17 |
Descriptor
Computer Assisted Testing | 34 |
Item Analysis | 34 |
Scoring | 34 |
Test Items | 17 |
Computer Software | 10 |
Adaptive Testing | 8 |
Comparative Analysis | 8 |
Test Construction | 8 |
Evaluation Methods | 7 |
Models | 7 |
Mathematics Tests | 6 |
More ▼ |
Source
Author
Attali, Yigal | 1 |
Aybek, Eren Can | 1 |
Bakla, Arif | 1 |
Beaty, Roger E. | 1 |
Bejar, Isaac I. | 1 |
Ben-Simon, Anat | 1 |
Bennett, Randy Elliot | 1 |
Bennett, Randy Elliott | 1 |
Betz, Nancy E. | 1 |
Bhashithe Abeysinghe | 1 |
Bhola, Dennison S. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 3 |
Elementary Secondary Education | 2 |
Grade 8 | 2 |
Higher Education | 2 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Researchers | 3 |
Practitioners | 1 |
Teachers | 1 |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
National Assessment of… | 2 |
Graduate Record Examinations | 1 |
Test of English as a Foreign… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Kunal Sareen – Innovations in Education and Teaching International, 2024
This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…
Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software
Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023
Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…
Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests
Congning Ni; Bhashithe Abeysinghe; Juanita Hicks – International Electronic Journal of Elementary Education, 2025
The National Assessment of Educational Progress (NAEP), often referred to as The Nation's Report Card, offers a window into the state of U.S. K-12 education system. Since 2017, NAEP has transitioned to digital assessments, opening new research opportunities that were previously impossible. Process data tracks students' interactions with the…
Descriptors: Reaction Time, Multiple Choice Tests, Behavior Change, National Competency Tests
Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022
Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…
Descriptors: Semantics, Scoring, Creative Thinking, Creativity
National Academies Press, 2022
The National Assessment of Educational Progress (NAEP) -- often called "The Nation's Report Card" -- is the largest nationally representative and continuing assessment of what students in public and private schools in the United States know and can do in various subjects and has provided policy makers and the public with invaluable…
Descriptors: Costs, Futures (of Society), National Competency Tests, Educational Trends
Çekiç, Ahmet; Bakla, Arif – International Online Journal of Education and Teaching, 2021
The Internet and the software stores for mobile devices come with a huge number of digital tools for any task, and those intended for digital formative assessment (DFA) have burgeoned exponentially in the last decade. These tools vary in terms of their functionality, pedagogical quality, cost, operating systems and so forth. Teachers and learners…
Descriptors: Formative Evaluation, Futures (of Society), Computer Assisted Testing, Guidance
Aybek, Eren Can; Demirtasli, R. Nukhet – International Journal of Research in Education and Science, 2017
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Descriptors: Computer Assisted Testing, Adaptive Testing, Item Response Theory, Test Items
Mullis, Ina V. S., Ed.; Martin, Michael O., Ed.; von Davier, Matthias, Ed. – International Association for the Evaluation of Educational Achievement, 2021
TIMSS (Trends in International Mathematics and Science Study) is a long-standing international assessment of mathematics and science at the fourth and eighth grades that has been collecting trend data every four years since 1995. About 70 countries use TIMSS trend data for monitoring the effectiveness of their education systems in a global…
Descriptors: Achievement Tests, International Assessment, Science Achievement, Mathematics Achievement
Brinkhuis, Matthieu J. S.; Savi, Alexander O.; Hofman, Abe D.; Coomans, Frederik; van der Maas, Han L. J.; Maris, Gunter – Journal of Learning Analytics, 2018
With the advent of computers in education, and the ample availability of online learning and practice environments, enormous amounts of data on learning become available. The purpose of this paper is to present a decade of experience with analyzing and improving an online practice environment for math, which has thus far recorded over a billion…
Descriptors: Data Analysis, Mathematics Instruction, Accuracy, Reaction Time
Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011
This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…
Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis
Sukkarieh, Jane Z.; von Davier, Matthias; Yamamoto, Kentaro – ETS Research Report Series, 2012
This document describes a solution to a problem in the automatic content scoring of the multilingual character-by-character highlighting item type. This solution is language independent and represents a significant enhancement. This solution not only facilitates automatic scoring but plays an important role in clustering students' responses;…
Descriptors: Scoring, Multilingualism, Test Items, Role
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Shahnazari-Dorcheh, Mohammadtaghi; Roshan, Saeed – English Language Teaching, 2012
Due to the lack of span test for the use in language-specific and cross-language studies, this study provides L1 and L2 researchers with a reliable language-independent span test (math span test) for the measurement of working memory capacity. It also describes the development, validation, and scoring method of this test. This test included 70…
Descriptors: Language Research, Native Language, Second Language Learning, Scoring
Attali, Yigal; Bridgeman, Brent; Trapani, Catherine – Journal of Technology, Learning, and Assessment, 2010
A generic approach in automated essay scoring produces scores that have the same meaning across all prompts, existing or new, of a writing assessment. This is accomplished by using a single set of linguistic indicators (or features), a consistent way of combining and weighting these features into essay scores, and a focus on features that are not…
Descriptors: Writing Evaluation, Writing Tests, Scoring, Test Scoring Machines
Chang, Shao-Hua; Lin, Pei-Chun; Lin, Zih-Chuan – Educational Technology & Society, 2007
This study investigates differences in the partial scoring performance of examinees in elimination testing and conventional dichotomous scoring of multiple-choice tests implemented on a computer-based system. Elimination testing that uses the same set of multiple-choice items rewards examinees with partial knowledge over those who are simply…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Scoring, Item Analysis