ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	4

Descriptor

Computer Assisted Testing	5
Test Validity	5
Item Response Theory	3
Test Construction	3
Test Reliability	3
Automation	2
Efficiency	2
Scoring	2
Test Scoring Machines	2
Achievement Tests	1
Adaptive Testing	1
Best Practices	1
Correlation	1
Data Collection	1
Data Interpretation	1
Decision Making	1
Design	1
Elementary Secondary Education	1
Essay Tests	1
Essays	1
Factor Analysis	1
Foreign Countries	1
Generalizability Theory	1
Guessing (Tests)	1
High Stakes Tests	1
More ▼

Source

Applied Measurement in…

Author

Ben-Simon, Anat	1
Coffman, Don D.	1
Cohen, Yoav	1
Henly, George A.	1
Levi, Effi	1
Rupp, André A.	1
Vispoel, Walter P.	1
Wan, Lei	1
Wise, Steven L.	1

Publication Type

Journal Articles	5
Reports - Evaluative	2
Reports - Research	2
Reports - Descriptive	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 5	1
Grade 8	1
High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Israel

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Peer reviewed

Direct link

Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2006

In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…

Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Computerized-Adaptive and Self-Adapted Music-Listening Tests: Psychometric Features and Motivational Benefits.

Peer reviewed

Vispoel, Walter P.; Coffman, Don D. – Applied Measurement in Education, 1994

Computerized-adaptive (CAT) and self-adapted (SAT) music listening tests were compared for efficiency, reliability, validity, and motivational benefits with 53 junior high school students. Results demonstrate trade-offs, with greater potential motivational benefits for SAT and greater efficiency for CAT. SAT elicited more favorable responses from…

Descriptors: Adaptive Testing, Computer Assisted Testing, Efficiency, Item Response Theory