ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Source

Applied Measurement in…

Author

Lee, Yoonsun	1
Musch, Jochen	1
Papenberg, Martin	1
Phillips, Gary W.	1
Taylor, Catherine S.	1
Wise, Steven L.	1

Publication Type

Journal Articles	4
Reports - Research	3
Reports - Evaluative	1

Education Level

Grade 10	1
Grade 4	1
Grade 7	1

Audience

Location

Germany

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Stability of Rasch Scales over Time

Peer reviewed

Direct link

Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…

Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2006

In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…

Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Item Analysis	4
Test Reliability	4
Test Validity	4
Item Response Theory	3
Equated Scores	2
Comparative Testing	1
Computer Assisted Testing	1
Correlation	1
Difficulty Level	1
Error of Measurement	1
Evaluation Methods	1
Evaluation Problems	1
Experimenter Characteristics	1
Foreign Countries	1
Group Testing	1
Guessing (Tests)	1
Mathematics Tests	1
Measures (Individuals)	1
Motivation	1
Multiple Choice Tests	1
Predictor Variables	1
Probability	1
Reaction Time	1
Reading Tests	1
Replication (Evaluation)	1
More ▼