ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	28
Since 2006 (last 20 years)	65

Descriptor

Comparative Analysis	99
Item Response Theory	31
Test Items	31
Scores	18
Mathematics Tests	16
Computer Assisted Testing	15
Statistical Analysis	14
Test Bias	14
Foreign Countries	12
Simulation	12
Models	11
Correlation	10
Error of Measurement	10
Item Analysis	10
Scoring	10
Achievement Tests	9
Computation	9
High School Students	9
Monte Carlo Methods	9
Test Format	9
Difficulty Level	8
Evaluation Methods	8
Goodness of Fit	8
Test Construction	8
Accuracy	7
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	99
Reports - Research	72
Reports - Evaluative	28
Information Analyses	4
Reports - Descriptive	2
Speeches/Meeting Papers	2

Education Level

Elementary Secondary Education	10
Secondary Education	9
Grade 5	7
Grade 8	7
Elementary Education	5
Higher Education	5
Grade 4	4
High Schools	4
Middle Schools	4
Grade 10	3
Grade 3	3
Grade 6	3
Grade 7	3
Junior High Schools	3
Postsecondary Education	3
Grade 11	2
Intermediate Grades	2
Grade 9	1
Kindergarten	1
More ▼

Audience

Researchers

Location

North Carolina	4
Texas	4
California	2
Canada	2
Florida	2
New York	2
Virginia	2
Australia	1
Colorado	1
Georgia	1
Maryland	1
Massachusetts	1
Oklahoma	1
South Carolina	1
South Korea	1
Spain	1
Tennessee	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	8
Graduate Record Examinations	3
SAT (College Admission Test)	3
National Assessment of…	2
Trends in International…	2
ACT Assessment	1
Early Childhood Longitudinal…	1
Georgia Criterion Referenced…	1
Iowa Tests of Educational…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 99 results Save | Export

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data

Peer reviewed

Direct link

Finch, Holmes – Applied Measurement in Education, 2022

Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…

Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Are Large Admissions Test Coaching Effects Widespread? A Longitudinal Analysis of Admissions Test Scores

Peer reviewed

Direct link

Dahlke, Jeffrey A.; Sackett, Paul R.; Kuncel, Nathan R. – Applied Measurement in Education, 2023

We examine longitudinal data from 120,384 students who took a version of the PSAT/SAT in the 9th, 10th, 11th, and 12th grades. We investigate score changes over time and show that socioeconomic status (SES) is related to the degree of score improvement. We note that the 9th and 10th grade PSAT are low-stakes tests, while the operational SAT is a…

Descriptors: Scores, College Entrance Examinations, Socioeconomic Status, Test Preparation

The Impact of Three Factors on the Recovery of Item Parameters for the Three-Parameter Logistic Model

Peer reviewed

Direct link

Kim, Kyung Yong; Lee, Won-Chan – Applied Measurement in Education, 2017

This article provides a detailed description of three factors (specification of the ability distribution, numerical integration, and frame of reference for the item parameter estimates) that might affect the item parameter estimation of the three-parameter logistic model, and compares five item calibration methods, which are combinations of the…

Descriptors: Test Items, Item Response Theory, Comparative Analysis, Methods

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

A Comparison of Estimation Techniques for IRT Models with Small Samples

Peer reviewed

Direct link

Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…

Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level

Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

Peer reviewed

Direct link

Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

Descriptors: Error of Measurement, Test Bias, International Assessment, Computation

In Search of Optimal Cognitive Diagnostic Model(s) for ESL Grammar Test Data

Peer reviewed

Direct link

Yi, Yeon-Sook – Applied Measurement in Education, 2017

This study compares five cognitive diagnostic models in search of optimal one(s) for English as a Second Language grammar test data. Using a unified modeling framework that can represent specific models with proper constraints, the article first fit the full model (the log-linear cognitive diagnostic model, LCDM) and investigated which model…

Descriptors: English (Second Language), Grammar, Language Tests, Cognitive Measurement

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Framing Appropriate Accommodations in Terms of Individual Need: Examining the Fit of Four Approaches to Selecting Test Accommodations of English Language Learners

Peer reviewed

Direct link

Koran, Jennifer; Kopriva, Rebecca J. – Applied Measurement in Education, 2017

Providing appropriate test accommodations to most English language learners (ELLs) is important to facilitate meaningful inferences about learning. This study compared teacher large-scale test accommodation recommendations to those from a literature- and practitioner-grounded accommodation selection taxonomy. The taxonomy links student-specific…

Descriptors: English Language Learners, Testing Accommodations, Comparative Analysis, Taxonomy

Detection of Differential Item Functioning for More than Two Groups: A Monte Carlo Comparison of Methods

Peer reviewed

Direct link

Finch, W. Holmes – Applied Measurement in Education, 2016

Differential item functioning (DIF) assessment is a crucial component in test construction, serving as the primary way in which instrument developers ensure that measures perform in the same way for multiple groups within the population. When such is not the case, scores may not accurately reflect the trait of interest for all individuals in the…

Descriptors: Test Bias, Monte Carlo Methods, Comparative Analysis, Population Groups

IRT Item Parameter Scaling for Developing New Item Pools

Peer reviewed

Direct link

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items

Analyzing Fairness among Linguistic Minority Populations Using a Latent Class Differential Item Functioning Approach

Peer reviewed

Direct link

Oliveri, Maria Elena; Ercikan, Kadriye; Lyons-Thomas, Juliette; Holtzman, Steven – Applied Measurement in Education, 2016

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that…

Descriptors: Test Bias, Language Minorities, Effect Size, Foreign Countries

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7

Lee, Won-Chan	4
Davis, Laurie Laughlin	3
Ercikan, Kadriye	3
Finch, Holmes	3
Linn, Robert L.	3
Oliveri, Maria Elena	3
Attali, Yigal	2
Bolt, Daniel M.	2
Bridgeman, Brent	2
Hambleton, Ronald K.	2
Kong, Xiaojing	2
McBride, Yuanyuan	2
Penfield, Randall D.	2
Puhan, Gautam	2
Sinharay, Sandip	2
Sireci, Stephen G.	2
Wells, Craig S.	2
Abulela, Mohammed A. A.	1
Allen, Jeff	1
Andrich, David	1
Awuor, Risper	1
Baldwin, Su	1
Ban, Jae-Chun	1
Banks, Kathleen	1
More ▼