Publication Date
In 2025 | 4 |
Since 2024 | 8 |
Since 2021 (last 5 years) | 27 |
Since 2016 (last 10 years) | 45 |
Since 2006 (last 20 years) | 63 |
Descriptor
Source
Author
White, Edward M. | 6 |
Anderson, Paul S. | 4 |
Plake, Barbara S. | 3 |
Wind, Stefanie A. | 3 |
Guo, Wenjing | 2 |
Kim, Doyoung | 2 |
Kim, Sooyeon | 2 |
Moses, Tim | 2 |
Mott, Michael S. | 2 |
Stergiopoulos, Charalampos | 2 |
Triantis, Dimos | 2 |
More ▼ |
Publication Type
Reports - Research | 120 |
Journal Articles | 71 |
Speeches/Meeting Papers | 21 |
Tests/Questionnaires | 7 |
Reports - Descriptive | 6 |
Numerical/Quantitative Data | 4 |
Reports - Evaluative | 2 |
Reference Materials -… | 1 |
Education Level
Audience
Practitioners | 1 |
Teachers | 1 |
Location
California | 6 |
Turkey | 3 |
United States | 3 |
Asia | 2 |
Iran | 2 |
Australia | 1 |
Austria | 1 |
Bhutan | 1 |
Cambodia | 1 |
Canada | 1 |
China | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024
Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…
Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses
Brian E. Clauser; Victoria Yaneva; Peter Baldwin; Le An Ha; Janet Mee – Applied Measurement in Education, 2024
Multiple-choice questions have become ubiquitous in educational measurement because the format allows for efficient and accurate scoring. Nonetheless, there remains continued interest in constructed-response formats. This interest has driven efforts to develop computer-based scoring procedures that can accurately and efficiently score these items.…
Descriptors: Computer Uses in Education, Artificial Intelligence, Scoring, Responses
Selcuk Acar; Peter Organisciak; Denis Dumas – Journal of Creative Behavior, 2025
In this three-study investigation, we applied various approaches to score drawings created in response to both Form A and Form B of the Torrance Tests of Creative Thinking-Figural (broadly TTCT-F) as well as the Multi-Trial Creative Ideation task (MTCI). We focused on TTCT-F in Study 1, and utilizing a random forest classifier, we achieved 79% and…
Descriptors: Scoring, Computer Assisted Testing, Models, Correlation
Herwin, Herwin; Pristiwaluyo, Triyanto; Ruslan, Ruslan; Dahalan, Shakila Che – Cypriot Journal of Educational Sciences, 2022
The application of multiple-choice tests often does not consider the scoring technique and the number of choices. The study aims at describing the effect of the scoring technique and numerous options towards the reliability of multiple-choice objective tests on social subjects in elementary school. The study is quantitative research with…
Descriptors: Scoring, Multiple Choice Tests, Test Reliability, Elementary School Students
Rebecka Weegar; Peter Idestam-Almquist – International Journal of Artificial Intelligence in Education, 2024
Machine learning methods can be used to reduce the manual workload in exam grading, making it possible for teachers to spend more time on other tasks. However, when it comes to grading exams, fully eliminating manual work is not yet possible even with very accurate automated grading, as any grading mistakes could have significant consequences for…
Descriptors: Grading, Computer Assisted Testing, Introductory Courses, Computer Science Education
Meaghan McKenna; Hope Gerde; Nicolette Grasley-Boy – Reading and Writing: An Interdisciplinary Journal, 2025
This article describes the development and administration of the "Kindergarten-Second Grade (K-2) Writing Data-Based Decision Making (DBDM) Survey." The "K-2 Writing DBDM Survey" was developed to learn more about current DBDM practices specific to early writing. A total of 376 educational professionals (175 general education…
Descriptors: Writing Evaluation, Writing Instruction, Preschool Teachers, Kindergarten
Harrison, Scott; Kroehne, Ulf; Goldhammer, Frank; Lüdtke, Oliver; Robitzsch, Alexander – Large-scale Assessments in Education, 2023
Background: Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an…
Descriptors: Scoring, Test Items, Difficulty Level, Foreign Countries
Dongmei Li; Shalini Kapoor; Ann Arthur; Chi-Yu Huang; YoungWoo Cho; Chen Qiu; Hongling Wang – ACT Education Corp., 2025
Starting in April 2025, ACT will introduce enhanced forms of the ACT® test for national online testing, with a full rollout to all paper and online test takers in national, state and district, and international test administrations by Spring 2026. ACT introduced major updates by changing the test lengths and testing times, providing more time per…
Descriptors: College Entrance Examinations, Testing, Change, Scoring
Guo, Wenjing; Wind, Stefanie A. – Journal of Educational Measurement, 2021
The use of mixed-format tests made up of multiple-choice (MC) items and constructed response (CR) items is popular in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district- and state-level assessments in the United States. Rater effects, or raters' scoring tendencies that result in…
Descriptors: Test Format, Multiple Choice Tests, Scoring, Test Items
Schulte, Niklas; Holling, Heinz; Bürkner, Paul-Christian – Educational and Psychological Measurement, 2021
Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high.…
Descriptors: Questionnaires, Measurement Techniques, Test Format, Scoring
Kang, Hyeon-Ah; Han, Suhwa; Kim, Doyoung; Kao, Shu-Chuan – Educational and Psychological Measurement, 2022
The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c)…
Descriptors: Goodness of Fit, Item Response Theory, Test Items, Scoring
Wind, Stefanie A.; Guo, Wenjing – Educational Assessment, 2021
Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and…
Descriptors: Scoring, Responses, Test Items, Test Format
Betts, Joe; Muntean, William; Kim, Doyoung; Kao, Shu-chuan – Educational and Psychological Measurement, 2022
The multiple response structure can underlie several different technology-enhanced item types. With the increased use of computer-based testing, multiple response items are becoming more common. This response type holds the potential for being scored polytomously for partial credit. However, there are several possible methods for computing raw…
Descriptors: Scoring, Test Items, Test Format, Raw Scores
Shahid A. Choudhry; Timothy J. Muckle; Christopher J. Gill; Rajat Chadha; Magnus Urosev; Matt Ferris; John C. Preston – Practical Assessment, Research & Evaluation, 2024
The National Board of Certification and Recertification for Nurse Anesthetists (NBCRNA) conducted a one-year research study comparing performance on the traditional continued professional certification assessment, administered at a test center or online with remote proctoring, to a longitudinal assessment that required answering quarterly…
Descriptors: Nurses, Certification, Licensing Examinations (Professions), Computer Assisted Testing
Uysal, Ibrahim; Dogan, Nuri – International Journal of Assessment Tools in Education, 2021
Scoring constructed-response items can be highly difficult, time-consuming, and costly in practice. Improvements in computer technology have enabled automated scoring of constructed-response items. However, the application of automated scoring without an investigation of test equating can lead to serious problems. The goal of this study was to…
Descriptors: Computer Assisted Testing, Scoring, Item Response Theory, Test Format