ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	17

Descriptor

Error of Measurement	19
Test Items	19
Item Response Theory	12
Difficulty Level	6
Simulation	5
Comparative Analysis	4
Monte Carlo Methods	4
Sample Size	4
Scores	4
Test Construction	4
Test Length	4
Accuracy	3
Achievement Tests	3
Computation	3
Foreign Countries	3
Item Analysis	3
Mathematics Tests	3
Probability	3
Psychometrics	3
Sampling	3
Test Bias	3
Elementary Secondary Education	2
English	2
Equated Scores	2
Generalizability Theory	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	19
Reports - Research	18
Information Analyses	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	3
Grade 3	2
Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Grade 1	1
Grade 2	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Location

Canada	1
Georgia	1
Iran	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST

Peer reviewed

Direct link

Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023

This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…

Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Gauging Uncertainty in Test-to-Curriculum Alignment Indices

Peer reviewed

Direct link

Traynor, Anne; Li, Tingxuan; Zhou, Shuqi – Applied Measurement in Education, 2020

During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test's items and performance objective statements. The individual experts' ratings may then be used to compute summary indices to quantify the match between a…

Descriptors: Alignment (Education), Achievement Tests, Curriculum, Error of Measurement

Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning

Peer reviewed

Direct link

Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024

Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…

Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

A Comparison of Estimation Techniques for IRT Models with Small Samples

Peer reviewed

Direct link

Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…

Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level

Item Parameter Drift in a Time-Varying Predictor

Peer reviewed

Direct link

Lee, HyeSun – Applied Measurement in Education, 2018

The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…

Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

The Effect of Anchor Test Construction on Scale Drift

Peer reviewed

Direct link

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…

Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory

Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

Peer reviewed

Direct link

Michaelides, Michalis P.; Haertel, Edward H. – Applied Measurement in Education, 2014

The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…

Descriptors: Equated Scores, Test Items, Sampling, Statistical Inference

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Using Confirmatory Factor Analysis and the Rasch Model to Assess Measurement Invariance in a High Stakes Reading Assessment

Peer reviewed

Direct link

Randall, Jennifer; Engelhard, George, Jr. – Applied Measurement in Education, 2010

The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…

Descriptors: Testing Accommodations, Test Items, Learning Disabilities, Factor Analysis

Validity of the Simultaneous Approach to the Development of Equivalent Achievement Tests in English and French

Peer reviewed

Direct link

Rogers, W. Todd; Lin, Jie; Rinaldi, Christia M. – Applied Measurement in Education, 2011

The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g.,…

Descriptors: Test Items, Item Analysis, Achievement Tests, French

Previous Page | Next Page »

Pages: 1 | 2

Abulela, Mohammed A. A.	1
Antal, Judit	1
Bergstrom, Betty A.	1
Chalmers, R. Philip	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Engelhard, George, Jr.	1
Finch, Holmes	1
French, Brian F.	1
Haertel, Edward H.	1
Hou, Liling	1
James S. Kim	1
Jones, Andrew T.	1
Joshua B. Gilbert	1
Kachchaf, Rachel	1
Kannan, Priya	1
Katz, Irvin R.	1
Kim, Stella Yun	1
Kopp, Jason P.	1
Lee, HyeSun	1
Lee, Won-Chan	1
Li, Tingxuan	1
Lin, Jie	1
Lixin Yuan	1
More ▼