Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 6 |
Descriptor
Test Items | 11 |
Test Reliability | 11 |
Multiple Choice Tests | 6 |
Difficulty Level | 4 |
Scores | 3 |
Scoring | 3 |
Test Format | 3 |
Achievement Tests | 2 |
Computer Assisted Testing | 2 |
Foreign Countries | 2 |
Item Response Theory | 2 |
More ▼ |
Source
Applied Measurement in… | 11 |
Author
Publication Type
Journal Articles | 11 |
Reports - Research | 7 |
Reports - Evaluative | 4 |
Education Level
Elementary Secondary Education | 2 |
Grade 8 | 2 |
Middle Schools | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 5 | 1 |
High Schools | 1 |
Higher Education | 1 |
Junior High Schools | 1 |
Postsecondary Education | 1 |
Audience
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Slepkov, Aaron D.; Godfrey, Alan T. K. – Applied Measurement in Education, 2019
The answer-until-correct (AUC) method of multiple-choice (MC) testing involves test respondents making selections until the keyed answer is identified. Despite attendant benefits that include improved learning, broad student adoption, and facile administration of partial credit, the use of AUC methods for classroom testing has been extremely…
Descriptors: Multiple Choice Tests, Test Items, Test Reliability, Scores
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Edwards, Michael C.; Flora, David B.; Thissen, David – Applied Measurement in Education, 2012
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Test Items
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Kettler, Ryan J.; Rodriguez, Michael C.; Bolt, Daniel M.; Elliott, Stephen N.; Beddow, Peter A.; Kurz, Alexander – Applied Measurement in Education, 2011
Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce…
Descriptors: Multiple Choice Tests, Test Items, Alternative Assessment, Test Reliability
Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004
Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…
Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Qualls, Audrey L. – Applied Measurement in Education, 1995
Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)
Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format

Feldt, Leonard S. – Applied Measurement in Education, 1993
The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)
Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models
Sykes, Robert C.; Hou, Liling – Applied Measurement in Education, 2003
Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated.…
Descriptors: Item Response Theory, Weighted Scores, Responses, Scores

Loyd, Brenda H. – Applied Measurement in Education, 1990
Four mathematics test-item types that may perform differently when calculators are used were assessed using data from 160 high school students attending a summer enrichment program. The effects of testing with and without calculators on testing time, test reliability, item difficulty, and item discrimination were also assessed. (TJH)
Descriptors: Calculators, Difficulty Level, High School Students, High Schools