ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	15
Since 2016 (last 10 years)	33
Since 2006 (last 20 years)	103

Descriptor

Test Bias	154
Test Items	67
Item Response Theory	58
Statistical Analysis	45
Simulation	35
Models	23
Item Analysis	22
Comparative Analysis	21
Correlation	21
Error of Measurement	21
Computation	20
Evaluation Methods	18
Foreign Countries	18
Test Validity	18
Sample Size	17
Scores	17
Effect Size	16
Regression (Statistics)	16
Difficulty Level	15
Test Reliability	15
Response Style (Tests)	14
Achievement Tests	13
Factor Analysis	13
Monte Carlo Methods	13
Measurement Techniques	12
More ▼

Source

Educational and Psychological…

154

Publication Type

Journal Articles	143
Reports - Research	111
Reports - Evaluative	25
Reports - Descriptive	5
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	9
Postsecondary Education	8
Elementary Education	7
Middle Schools	6
Secondary Education	6
Grade 3	3
Grade 4	3
Intermediate Grades	3
Junior High Schools	3
Early Childhood Education	2
Grade 6	2
Grade 7	2
High Schools	2
Primary Education	2
Adult Education	1
Grade 8	1
Grade 9	1
Kindergarten	1
Preschool Education	1
More ▼

Audience

Location

Germany	4
Canada	3
California	2
Georgia	2
Spain	2
Taiwan	2
United States	2
Australia	1
Brazil	1
China	1
Florida	1
Greece	1
Ireland	1
Israel	1
Netherlands	1
New Zealand	1
Poland	1
Romania	1
Singapore	1
Sweden	1
Turkey	1
United Kingdom	1
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 154 results Save | Export

A Comparison of Response Time Threshold Scoring Procedures in Mitigating Bias from Rapid Guessing Behavior

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2024

Rapid guessing (RG) is a form of non-effortful responding that is characterized by short response latencies. This construct-irrelevant behavior has been shown in previous research to bias inferences concerning measurement properties and scores. To mitigate these deleterious effects, a number of response time threshold scoring procedures have been…

Descriptors: Reaction Time, Scores, Item Response Theory, Guessing (Tests)

Reevaluating the SIBTEST Classification Heuristics for Dichotomous Differential Item Functioning

Peer reviewed

Direct link

Weese, James D.; Turner, Ronna C.; Ames, Allison; Crawford, Brandon; Liang, Xinya – Educational and Psychological Measurement, 2022

A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel-Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item…

Descriptors: Test Bias, Heuristics, Classification, Statistical Analysis

Testing for Differential Item Functioning under the "D"-Scoring Method

Peer reviewed

Direct link

Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Educational and Psychological Measurement, 2022

This study offers an approach to testing for differential item functioning (DIF) in a recently developed measurement framework, referred to as "D"-scoring method (DSM). Under the proposed approach, called "P-Z" method of testing for DIF, the item response functions of two groups (reference and focal) are compared by…

Descriptors: Test Bias, Methods, Test Items, Scoring

The Impact and Detection of Uniform Differential Item Functioning for Continuous Item Response Models

Peer reviewed

Direct link

Finch, W. Holmes – Educational and Psychological Measurement, 2023

Psychometricians have devoted much research and attention to categorical item responses, leading to the development and widespread use of item response theory for the estimation of model parameters and identification of items that do not perform in the same way for examinees from different population subgroups (e.g., differential item functioning…

Descriptors: Test Bias, Item Response Theory, Computation, Methods

Correcting for Extreme Response Style: Model Choice Matters

Peer reviewed

Direct link

Martijn Schoenmakers; Jesper Tijmstra; Jeroen Vermunt; Maria Bolsinova – Educational and Psychological Measurement, 2024

Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these…

Descriptors: Item Response Theory, Response Style (Tests), Models, Likert Scales

Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights from a Novel Modeling Approach

Peer reviewed

Direct link

Hung-Yu Huang – Educational and Psychological Measurement, 2025

The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…

Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

Peer reviewed

Direct link

Weese, James D.; Turner, Ronna C.; Liang, Xinya; Ames, Allison; Crawford, Brandon – Educational and Psychological Measurement, 2023

A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and…

Descriptors: Effect Size, Classification, Guidelines, Statistical Analysis

Detecting Preknowledge Cheating via Innovative Measures: A Mixture Hierarchical Model for Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts

Peer reviewed

Direct link

Man, Kaiwen; Harring, Jeffrey R. – Educational and Psychological Measurement, 2023

Preknowledge cheating jeopardizes the validity of inferences based on test results. Many methods have been developed to detect preknowledge cheating by jointly analyzing item responses and response times. Gaze fixations, an essential eye-tracker measure, can be utilized to help detect aberrant testing behavior with improved accuracy beyond using…

Descriptors: Cheating, Reaction Time, Test Items, Responses

DIF Detection with Zero-Inflation under the Factor Mixture Modeling Framework

Peer reviewed

Direct link

Lee, Sooyong; Han, Suhwa; Choi, Seung W. – Educational and Psychological Measurement, 2022

Response data containing an excessive number of zeros are referred to as zero-inflated data. When differential item functioning (DIF) detection is of interest, zero-inflation can attenuate DIF effects in the total sample and lead to underdetection of DIF items. The current study presents a DIF detection procedure for response data with excess…

Descriptors: Test Bias, Monte Carlo Methods, Simulation, Models

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A.; Ge, Yuan – Educational and Psychological Measurement, 2021

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for…

Descriptors: Evaluators, Scoring, Data Collection, Design

Extended Multivariate Generalizability Theory with Complex Design Structures

Peer reviewed

Direct link

Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022

This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…

Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction

Generalized Mantel-Haenszel Estimators for Simultaneous Differential Item Functioning Tests

Peer reviewed

Direct link

Liu, Ivy; Suesse, Thomas; Harvey, Samuel; Gu, Peter Yongqi; Fernández, Daniel; Randal, John – Educational and Psychological Measurement, 2023

The Mantel-Haenszel estimator is one of the most popular techniques for measuring differential item functioning (DIF). A generalization of this estimator is applied to the context of DIF to compare items by taking the covariance of odds ratio estimators between dependent items into account. Unlike the Item Response Theory, the method does not rely…

Descriptors: Test Bias, Computation, Statistical Analysis, Achievement Tests

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Peer reviewed

Direct link

Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…

Descriptors: Test Bias, Interrater Reliability, Responses, Correlation

Awareness Is Bliss: How Acquiescence Affects Exploratory Factor Analysis

Peer reviewed

Direct link

D'Urso, E. Damiano; Tijmstra, Jesper; Vermunt, Jeroen K.; De Roover, Kim – Educational and Psychological Measurement, 2023

Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals' latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these…

Descriptors: Factor Analysis, Measurement Techniques, Self Evaluation (Individuals), Psychological Patterns

Detecting Differential Rater Functioning in Severity and Centrality: The Dual DRF Facets Model

Peer reviewed

Direct link

Jin, Kuan-Yu; Eckes, Thomas – Educational and Psychological Measurement, 2022

Performance assessments heavily rely on human ratings. These ratings are typically subject to various forms of error and bias, threatening the assessment outcomes' validity and fairness. Differential rater functioning (DRF) is a special kind of threat to fairness manifesting itself in unwanted interactions between raters and performance- or…

Descriptors: Performance Based Assessment, Rating Scales, Test Bias, Student Evaluation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

Finch, W. Holmes	6
Wang, Wen-Chung	6
Beretvas, S. Natasha	4
French, Brian F.	4
Oshima, T. C.	4
Strobl, Carolin	4
Walker, Cindy M.	4
DeMars, Christine E.	3
Magis, David	3
Penfield, Randall D.	3
Plake, Barbara S.	3
Shih, Ching-Lin	3
Wilson, Mark	3
Zeileis, Achim	3
Zumbo, Bruno D.	3
Ahn, Soyeon	2
Alliger, George M.	2
Ames, Allison	2
Cicchetti, Domenic V.	2
Crawford, Brandon	2
De Boeck, Paul	2
Dimitrov, Dimiter M.	2
Engelhard, George, Jr.	2
Fidalgo, Angel M.	2
More ▼

SAT (College Admission Test)	4
Georgia Criterion Referenced…	2
Graduate Record Examinations	2
Program for International…	2
SRA Achievement Series	2
Wechsler Adult Intelligence…	2
Wechsler Intelligence Scale…	2
ACT Assessment	1
Beck Depression Inventory	1
Boehm Test of Basic Concepts	1
California Achievement Tests	1
Cognitive Abilities Test	1
College Board Achievement…	1
Comprehensive Tests of Basic…	1
Draw a Person Test	1
Estes Attitude Scale	1
Florida Comprehensive…	1
Kaufman Assessment Battery…	1
Marlowe Crowne Social…	1
National Assessment of…	1
Peabody Picture Vocabulary…	1
Rosenberg Self Esteem Scale	1
Sixteen Personality Factor…	1
Stanford Achievement Tests	1
Test of English as a Foreign…	1
More ▼