Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 13 |
Descriptor
Test Items | 15 |
Test Validity | 15 |
Achievement Tests | 5 |
Test Construction | 5 |
Test Bias | 4 |
Foreign Countries | 3 |
Item Analysis | 3 |
Item Response Theory | 3 |
Measurement | 3 |
Models | 3 |
Difficulty Level | 2 |
More ▼ |
Source
Applied Measurement in… | 15 |
Author
Andrich, David | 1 |
Anne Traynor | 1 |
Bostic, Jonathan | 1 |
Carney, Michele | 1 |
Crocker, Linda M. | 1 |
Downing, Steven M. | 1 |
El Masri, Yasmine H. | 1 |
Hansen, Mary A. | 1 |
Heh, Peter | 1 |
Hendrickson, Amy | 1 |
Henly, George A. | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 9 |
Reports - Evaluative | 4 |
Reports - Descriptive | 2 |
Education Level
Elementary Secondary Education | 3 |
Secondary Education | 3 |
High Schools | 2 |
Elementary Education | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 8 | 1 |
Grade 9 | 1 |
Middle Schools | 1 |
Audience
Location
Canada | 1 |
France | 1 |
Germany | 1 |
Jordan | 1 |
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024
Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…
Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction
Karadavut, Tugba – Applied Measurement in Education, 2021
Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…
Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics
Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019
This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…
Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020
In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…
Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests
Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019
Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…
Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Solano-Flores, Guillermo – Applied Measurement in Education, 2014
This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…
Descriptors: English Language Learners, Test Items, Probability, Test Bias
Hansen, Mary A.; Lyon, Steven R.; Heh, Peter; Zigmond, Naomi – Applied Measurement in Education, 2013
Large-scale assessment programs, including alternate assessments based on alternate achievement standards (AA-AAS), must provide evidence of technical quality and validity. This study provides information about the technical quality of one AA-AAS by evaluating the standard setting for the science component. The assessment was designed to have…
Descriptors: Alternative Assessment, Science Tests, Standard Setting, Test Validity
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Hendrickson, Amy; Huff, Kristen; Luecht, Richard – Applied Measurement in Education, 2010
Evidence-centered assessment design (ECD) explicates a transparent evidentiary argument to warrant the inferences we make from student test performance. This article describes how the vehicles for gathering student evidence--task models and test specifications--are developed. Task models, which are the basis for item development, flow directly…
Descriptors: Evidence, Test Construction, Measurement, Classification
Rogers, W. Todd; Lin, Jie; Rinaldi, Christia M. – Applied Measurement in Education, 2011
The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g.,…
Descriptors: Test Items, Item Analysis, Achievement Tests, French
Walker, Cindy M.; Zhang, Bo; Surber, John – Applied Measurement in Education, 2008
Many teachers and curriculum specialists claim that the reading demand of many mathematics items is so great that students do not perform well on mathematics tests, even though they have a good understanding of mathematics. The purpose of this research was to test this claim empirically. This analysis was accomplished by considering examinees that…
Descriptors: Test Items, Construct Validity, Test Validity, Mathematics Tests

Downing, Steven M.; And Others – Applied Measurement in Education, 1995
The criterion-related validity evidence and other psychometric characteristics of multiple-choice and multiple true-false (MTF) items in medical specialty certification examinations were compared using results from 21,346 candidates. Advantages of MTF items and implications for test construction are discussed. (SLD)
Descriptors: Cognitive Ability, Licensing Examinations (Professions), Medical Education, Objective Tests

Crocker, Linda M.; And Others – Applied Measurement in Education, 1989
Techniques for quantifying the degree of fit between test items and curricula are classified according to the purposes of assessing: overall fit, fit of individual items to content domain, and the impact of test specifications on performance. Procedures for calculating each index and their properties are included. (SLD)
Descriptors: Achievement Tests, Content Validity, Curriculum, Elementary Secondary Education