ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	35

Descriptor

Test Validity	59
Test Construction	20
Test Items	15
Scores	12
Test Reliability	12
Item Response Theory	10
Achievement Tests	9
Evaluation Methods	8
Item Analysis	8
Models	8
Educational Assessment	7
Elementary Secondary Education	7
Foreign Countries	7
Test Bias	7
Test Interpretation	7
Evidence	6
Scoring	6
Standardized Tests	6
Computer Assisted Testing	5
Construct Validity	5
Decision Making	5
Elementary School Students	5
Mathematics Tests	5
Psychometrics	5
Advanced Placement Programs	4
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	59
Reports - Research	29
Reports - Evaluative	22
Reports - Descriptive	8
Speeches/Meeting Papers	4
Information Analyses	3
Opinion Papers	1

Education Level

High Schools	8
Secondary Education	7
Elementary Secondary Education	5
Elementary Education	3
Grade 12	3
Grade 7	3
Higher Education	3
Grade 4	2
Grade 6	2
Grade 8	2
Grade 9	2
Middle Schools	2
Postsecondary Education	2
Grade 10	1
Grade 3	1
Grade 5	1
Intermediate Grades	1
Junior High Schools	1
More ▼

Audience

Location

Canada	3
France	1
Germany	1
Israel	1
Jordan	1
Kansas	1
Norway	1
Slovenia	1
Sweden	1
United Kingdom	1

Laws, Policies, & Programs

Race to the Top

Assessments and Surveys

Program for International…	2
SAT (College Admission Test)	2
Iowa Tests of Basic Skills	1
National Assessment of…	1
Perceived Competence Scale…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 59 results Save | Export

Using Content Relevance and Representativeness Indices in Instrument Revision

Peer reviewed

Direct link

Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024

Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…

Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction

A Method for Displaying Incremental Validity with Expectancy Charts

Peer reviewed

Direct link

Lee, Samuel David; Walmsley, Philip T.; Sackett, Paul R.; Kuncel, Nathan – Applied Measurement in Education, 2021

Providing assessment validity information to decision makers in a clear and useful format is an ongoing challenge for the educational and psychological measurement community. We identify issues with a previous approach to a graphical presentation, noting that it is mislabeled as presenting incremental validity, when in fact it displays the effects…

Descriptors: Test Validity, Predictor Variables, Charts

Rethinking Think-Alouds: The Often-Problematic Collection of Response Process Data

Peer reviewed

Direct link

Leighton, Jacqueline P. – Applied Measurement in Education, 2021

The objective of this paper is to comment on the think-aloud methods presented in the three papers included in this special issue. The commentary offered stems from the author's own psychological investigations of unobservable information processes and the conditions under which the most defensible claims can be advanced. The structure of this…

Descriptors: Protocol Analysis, Data Collection, Test Construction, Test Validity

Characterizing the Latent Classes in a Mixture IRT Model Using DIF

Peer reviewed

Direct link

Karadavut, Tugba – Applied Measurement in Education, 2021

Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…

Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics

Argument-Based Validation in Practice: Examples from Mathematics Education

Peer reviewed

Direct link

Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019

This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…

Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education

Gathering Response Process Data for a Problem-Solving Measure through Whole-Class Think Alouds

Peer reviewed

Direct link

Bostic, Jonathan David; Sondergeld, Toni A.; Matney, Gabriel; Stone, Gregory; Hicks, Tiara – Applied Measurement in Education, 2021

Response process validity evidence provides a window into a respondent's cognitive processing. The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades. Data from…

Descriptors: Data Collection, Protocol Analysis, Problem Solving, Cognitive Processes

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Peer reviewed

Direct link

El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020

In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…

Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests

Prescribing Structure for Validation Arguments: Elemental, Structural, and Ecological Validity

Peer reviewed

Direct link

Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019

Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…

Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation

A Validation Argument from Soup to Nuts: Assessing Progress on Learning Trajectories for Middle-School Mathematics

Peer reviewed

Direct link

Confrey, Jere; Toutkoushian, Emily; Shah, Meetal – Applied Measurement in Education, 2019

Fully articulating validation arguments in the context of classroom assessment requires connecting evidence from multiple sources and addressing multiple types of validity in a coherent chain of reasoning. This type of validation argument is particularly complex for assessments that function in close proximity to instruction, address the fine…

Descriptors: Test Validity, Item Response Theory, Middle School Students, Mathematics Instruction

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

In Search of Validity Evidence in Support of the Interpretation and Use of Assessments of Complex Constructs: Discussion of Research on Assessing 21st Century Skills

Peer reviewed

Direct link

Ercikan, Kadriye; Oliveri, María Elena – Applied Measurement in Education, 2016

Assessing complex constructs such as those discussed under the umbrella of 21st century constructs highlights the need for a principled assessment design and validation approach. In our discussion, we made a case for three considerations: (a) taking construct complexity into account across various stages of assessment development such as the…

Descriptors: Evaluation Methods, Test Construction, Design, Scaling

Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Applied Measurement in Education, 2014

This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…

Descriptors: English Language Learners, Test Items, Probability, Test Bias

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Byrne, Barbara M.	3
Linn, Robert L.	3
Hambleton, Ronald K.	2
Huff, Kristen	2
Sackett, Paul R.	2
Abedi, Jamal	1
Andrich, David	1
Anne Traynor	1
Bart, William M.	1
Beatty, Adam S.	1
Ben-Simon, Anat	1
Benson, Jeri	1
Bostic, Jonathan	1
Bostic, Jonathan David	1
Brennan, Robert L.	1
Carlo, Maria S.	1
Carney, Michele	1
Chia, Magda Y.	1
Coffman, Don D.	1
Cohen, Allan	1
Cohen, Yoav	1
Confrey, Jere	1
Crocker, Linda	1
Crocker, Linda M.	1
More ▼