ERIC - Search Results

Publication Date

In 2025	0
Since 2024	4
Since 2021 (last 5 years)	11
Since 2016 (last 10 years)	36
Since 2006 (last 20 years)	81

Descriptor

Evaluation Methods	42
Item Response Theory	33
Test Items	26
Comparative Analysis	20
Monte Carlo Methods	19
Methods	17
Simulation	13
Computation	11
Error of Measurement	11
Mathematics Tests	10
Models	10
Scores	10
Statistical Analysis	10
Measurement	9
Accuracy	8
Equated Scores	8
Measurement Techniques	8
Test Construction	8
Test Validity	8
Achievement Tests	7
Correlation	7
Cutting Scores	7
Sample Size	7
Test Bias	7
Validity	7
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	81
Reports - Research	55
Reports - Evaluative	17
Reports - Descriptive	9
Information Analyses	1
Tests/Questionnaires	1

Education Level

Secondary Education	9
High Schools	8
Elementary Education	7
Middle Schools	6
Elementary Secondary Education	5
Higher Education	5
Grade 8	4
Junior High Schools	4
Grade 3	3
Grade 5	3
Grade 7	3
Postsecondary Education	3
Early Childhood Education	2
Grade 1	2
Grade 2	2
Grade 4	2
Grade 6	2
Primary Education	2
Adult Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 9	1
Intermediate Grades	1
More ▼

Audience

Researchers

Location

Arizona	2
Florida	2
North Carolina	2
Tennessee	2
United States	2
Australia	1
Colorado	1
Germany	1
Kentucky	1
Maine	1
Massachusetts	1
New Hampshire	1
New York	1
Spain	1
Texas	1
Trinidad and Tobago	1
Virginia	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Program for International…	2
Trends in International…	2
ACT Assessment	1
Advanced Placement…	1
Florida Comprehensive…	1
Measures of Academic Progress	1
National Assessment of…	1
Progress in International…	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing 1 to 15 of 81 results Save | Export

Combining Nonparametric and Parametric Item Response Theory to Explore Data Quality: Illustrations and a Simulation Study

Peer reviewed

Direct link

Stefanie A. Wind; Benjamin Lugu – Applied Measurement in Education, 2024

Researchers who use measurement models for evaluation purposes often select models with stringent requirements, such as Rasch models, which are parametric. Mokken Scale Analysis (MSA) offers a theory-driven nonparametric modeling approach that may be more appropriate for some measurement applications. Researchers have discussed using MSA as a…

Descriptors: Item Response Theory, Data Analysis, Simulation, Nonparametric Statistics

A Method for Identifying Partial Test-Taking Engagement

Peer reviewed

Direct link

Wise, Steven; Kuhfeld, Megan – Applied Measurement in Education, 2021

Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not…

Descriptors: Scoring, Achievement Tests, Identification, Validity

Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST

Peer reviewed

Direct link

Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023

This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…

Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement

Enacting a Process for Developing Culturally Relevant Classroom Assessments

Peer reviewed

Direct link

O'Dwyer, Eowyn P.; Sparks, Jesse R.; Nabors Oláh, Leslie – Applied Measurement in Education, 2023

A critical aspect of the development of culturally relevant classroom assessments is the design of tasks that affirm students' racial and ethnic identities and community cultural practices. This paper describes the process we followed to build a shared understanding of what culturally relevant assessments are, to pursue ways of bringing more…

Descriptors: Evaluation Methods, Culturally Relevant Education, Test Construction, Educational Research

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Analyzing Student Response Processes to Evaluate Success on a Technology-Based Problem-Solving Task

Peer reviewed

Direct link

Han, Yuting; Wilson, Mark – Applied Measurement in Education, 2022

A technology-based problem-solving test can automatically capture all the actions of students when they complete tasks and save them as process data. Response sequences are the external manifestations of the latent intellectual activities of the students, and it contains rich information about students' abilities and different problem-solving…

Descriptors: Technology Uses in Education, Problem Solving, 21st Century Skills, Evaluation Methods

Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data

Peer reviewed

Direct link

Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024

Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…

Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests

Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2020

This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…

Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)

Keeping up the Pace: Evaluating Grade 8 Student Achievement Outcomes for New Hampshire's Innovative Assessment System

Peer reviewed

Direct link

Perez, Alexandra Lane; Evans, Carla – Applied Measurement in Education, 2023

New Hampshire's Performance Assessment of Competency Education (PACE) innovative assessment system uses student scores from classroom performance assessments as well as other classroom tests for school accountability purposes. One concern is that not having annual state testing may incentivize schools and teachers away from teaching the breadth of…

Descriptors: Grade 8, Competency Based Education, Evaluation Methods, Educational Innovation

Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics

Peer reviewed

Direct link

Liu, Chunyan; Jurich, Daniel; Morrison, Carol; Grabovsky, Irina – Applied Measurement in Education, 2021

The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified "INFIT" and "OUTFIT"…

Descriptors: Equated Scores, Test Items, Item Response Theory, Difficulty Level

An Information-Based Approach to Identifying Rapid-Guessing Thresholds

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2019

The identification of rapid guessing is important to promote the validity of achievement test scores, particularly with low-stakes tests. Effective methods for identifying rapid guesses require reliable threshold methods that are also aligned with test taker behavior. Although several common threshold methods are based on rapid guessing response…

Descriptors: Guessing (Tests), Identification, Reaction Time, Reliability

Argument-Based Validation in Practice: Examples from Mathematics Education

Peer reviewed

Direct link

Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019

This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…

Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education

Validating Rubric Scoring Processes: An Application of an Item Response Tree Model

Peer reviewed

Direct link

Myers, Aaron J.; Ames, Allison J.; Leventhal, Brian C.; Holzman, Madison A. – Applied Measurement in Education, 2020

When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters' scoring processes…

Descriptors: Scoring Rubrics, Validity, Item Response Theory, Interrater Reliability

Some Methods and Evaluation for Linking and Equating with Small Samples

Peer reviewed

Direct link

Peabody, Michael R. – Applied Measurement in Education, 2020

The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required…

Descriptors: Evaluation Methods, Equated Scores, Sample Size, Item Response Theory

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Wells, Craig S.	5
Yin, Yue	4
Ayala, Carlos C.	3
Brandon, Paul R.	3
Finch, Holmes	3
Furtak, Erin Marie	3
Lee, Won-Chan	3
Ruiz-Primo, Maria Araceli	3
Shavelson, Richard J.	3
Sireci, Stephen G.	3
Bolt, Daniel M.	2
Haberman, Shelby	2
Huff, Kristen	2
Monahan, Patrick	2
Penfield, Randall D.	2
Puhan, Gautam	2
Soland, James	2
Tomita, Miki K.	2
Wise, Steven L.	2
Young, Donald B.	2
Alonzo, Alicia C.	1
Ames, Allison J.	1
Antal, Judit	1
Awuor, Risper	1
Baghaei, Purya	1
More ▼