ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	13

Descriptor

Test Items	15
Test Validity	15
Achievement Tests	5
Test Construction	5
Test Bias	4
Foreign Countries	3
Item Analysis	3
Item Response Theory	3
Measurement	3
Models	3
Difficulty Level	2
Elementary Secondary Education	2
Error of Measurement	2
Evidence	2
Goodness of Fit	2
Inferences	2
Mathematics Tests	2
Multiple Choice Tests	2
Science Tests	2
Scores	2
Test Format	2
Test Reliability	2
Accuracy	1
Achievement	1
Advanced Placement Programs	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	15
Reports - Research	9
Reports - Evaluative	4
Reports - Descriptive	2

Education Level

Elementary Secondary Education	3
Secondary Education	3
High Schools	2
Elementary Education	1
Grade 3	1
Grade 5	1
Grade 6	1
Grade 8	1
Grade 9	1
Middle Schools	1

Audience

Location

Canada	1
France	1
Germany	1
Jordan	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Using Content Relevance and Representativeness Indices in Instrument Revision

Peer reviewed

Direct link

Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024

Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…

Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction

Characterizing the Latent Classes in a Mixture IRT Model Using DIF

Peer reviewed

Direct link

Karadavut, Tugba – Applied Measurement in Education, 2021

Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…

Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics

Argument-Based Validation in Practice: Examples from Mathematics Education

Peer reviewed

Direct link

Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019

This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…

Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Peer reviewed

Direct link

El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020

In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…

Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests

Prescribing Structure for Validation Arguments: Elemental, Structural, and Ecological Validity

Peer reviewed

Direct link

Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019

Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…

Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Applied Measurement in Education, 2014

This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…

Descriptors: English Language Learners, Test Items, Probability, Test Bias

Comparing Panelists' Understanding of Standard Setting across Multiple Levels of an Alternate Science Assessment

Peer reviewed

Direct link

Hansen, Mary A.; Lyon, Steven R.; Heh, Peter; Zigmond, Naomi – Applied Measurement in Education, 2013

Large-scale assessment programs, including alternate assessments based on alternate achievement standards (AA-AAS), must provide evidence of technical quality and validity. This study provides information about the technical quality of one AA-AAS by evaluating the standard setting for the science component. The assessment was designed to have…

Descriptors: Alternative Assessment, Science Tests, Standard Setting, Test Validity

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Peer reviewed

Direct link

Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement

Claims, Evidence, and Achievement-Level Descriptors as a Foundation for Item Design and Test Specifications

Peer reviewed

Direct link

Hendrickson, Amy; Huff, Kristen; Luecht, Richard – Applied Measurement in Education, 2010

Evidence-centered assessment design (ECD) explicates a transparent evidentiary argument to warrant the inferences we make from student test performance. This article describes how the vehicles for gathering student evidence--task models and test specifications--are developed. Task models, which are the basis for item development, flow directly…

Descriptors: Evidence, Test Construction, Measurement, Classification

Validity of the Simultaneous Approach to the Development of Equivalent Achievement Tests in English and French

Peer reviewed

Direct link

Rogers, W. Todd; Lin, Jie; Rinaldi, Christia M. – Applied Measurement in Education, 2011

The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g.,…

Descriptors: Test Items, Item Analysis, Achievement Tests, French

Using a Multidimensional Differential Item Functioning Framework to Determine if Reading Ability Affects Student Performance in Mathematics

Peer reviewed

Direct link

Walker, Cindy M.; Zhang, Bo; Surber, John – Applied Measurement in Education, 2008

Many teachers and curriculum specialists claim that the reading demand of many mathematics items is so great that students do not perform well on mathematics tests, even though they have a good understanding of mathematics. The purpose of this research was to test this claim empirically. This analysis was accomplished by considering examinees that…

Descriptors: Test Items, Construct Validity, Test Validity, Mathematics Tests

Item Type and Cognitive Ability Measured: The Validity Evidence for Multiple True-False Items in Medical Specialty Certification.

Peer reviewed

Downing, Steven M.; And Others – Applied Measurement in Education, 1995

The criterion-related validity evidence and other psychometric characteristics of multiple-choice and multiple true-false (MTF) items in medical specialty certification examinations were compared using results from 21,346 candidates. Advantages of MTF items and implications for test construction are discussed. (SLD)

Descriptors: Cognitive Ability, Licensing Examinations (Professions), Medical Education, Objective Tests

Quantitative Methods for Assessing the Fit between Test and Curriculum.

Peer reviewed

Crocker, Linda M.; And Others – Applied Measurement in Education, 1989

Techniques for quantifying the degree of fit between test items and curricula are classified according to the purposes of assessing: overall fit, fit of individual items to content domain, and the impact of test specifications on performance. Procedures for calculating each index and their properties are included. (SLD)

Descriptors: Achievement Tests, Content Validity, Curriculum, Elementary Secondary Education

Andrich, David	1
Anne Traynor	1
Bostic, Jonathan	1
Carney, Michele	1
Crocker, Linda M.	1
Downing, Steven M.	1
El Masri, Yasmine H.	1
Hansen, Mary A.	1
Heh, Peter	1
Hendrickson, Amy	1
Henly, George A.	1
Huff, Kristen	1
Jacobson, Erik	1
Jones, Andrew T.	1
Karadavut, Tugba	1
Kopp, Jason P.	1
Krupa, Erin Elizabeth	1
Lin, Jie	1
Luecht, Richard	1
Lyon, Steven R.	1
Musch, Jochen	1
Papenberg, Martin	1
Rinaldi, Christia M.	1
Rogers, W. Todd	1
More ▼