ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	10

Descriptor

Simulation	12
Test Bias	12
Test Reliability	12
Test Items	8
Scores	4
Item Analysis	3
Item Response Theory	3
Models	3
Test Validity	3
Accuracy	2
Classification	2
Computation	2
Computer Assisted Testing	2
Error of Measurement	2
Evaluation Methods	2
Factor Analysis	2
Foreign Countries	2
Goodness of Fit	2
Junior High School Students	2
Measures (Individuals)	2
Monte Carlo Methods	2
Statistical Analysis	2
Statistical Bias	2
Test Theory	2
True Scores	2
More ▼

Source

Applied Measurement in…	2
Journal of Educational and…	2
Applied Psychological…	1
Center for Education Data &…	1
EURASIA Journal of…	1
Educational and Psychological…	1
Journal of Educational Issues	1
Measurement:…	1
Psychological Methods	1

Publication Type

Journal Articles	10
Reports - Research	6
Reports - Descriptive	4
Reports - Evaluative	2

Education Level

Junior High Schools	2
Middle Schools	2
Secondary Education	2
Elementary Secondary Education	1
Grade 9	1
High Schools	1
Higher Education	1

Audience

Researchers

Location

Indonesia	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Wechsler Adult Intelligence…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Accuracy and Sensitivity of Coefficient Alpha and Its Alternatives with Unidimensional and Contaminated Scales

Peer reviewed

Direct link

Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023

We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…

Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length

Estimating Difference-Score Reliability in Pretest-Posttest Settings

Peer reviewed

Direct link

Gu, Zhengguo; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational and Behavioral Statistics, 2021

Clinical, medical, and health psychologists use difference scores obtained from pretest--posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed…

Descriptors: Test Reliability, Scores, Pretests Posttests, Computation

R Packages for Item Response Theory Analysis: Descriptions and Features

Peer reviewed

Direct link

Choi, Youn-Jeng; Asilkalkan, Abdullah – Measurement: Interdisciplinary Research and Perspectives, 2019

About 45 R packages to analyze data using item response theory (IRT) have been developed over the last decade. This article introduces these 45 R packages with their descriptions and features. It also describes possible advanced IRT models using R packages, as well as dichotomous and polytomous IRT models, and R packages that contain applications…

Descriptors: Item Response Theory, Data Analysis, Computer Software, Test Bias

DIF Analysis with Multilevel Data: A Simulation Study Using the Latent Variable Approach

Peer reviewed
PDF on ERIC

Download full text

Jin, Ying; Eason, Hershel – Journal of Educational Issues, 2016

The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling…

Descriptors: Test Bias, Simulation, Hierarchical Linear Modeling, Comparative Analysis

Screening Test Items for Differential Item Functioning

Peer reviewed

Direct link

Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2014

A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…

Descriptors: Test Items, Test Bias, Simulation, Hypothesis Testing

Multidimensional Computerized Adaptive Testing for Indonesia Junior High School Biology

Peer reviewed

Direct link

Kuo, Bor-Chen; Daud, Muslem; Yang, Chih-Wei – EURASIA Journal of Mathematics, Science & Technology Education, 2015

This paper describes a curriculum-based multidimensional computerized adaptive test that was developed for Indonesia junior high school Biology. In adherence to the Indonesian curriculum of different Biology dimensions, 300 items was constructed, and then tested to 2238 students. A multidimensional random coefficients multinomial logit model was…

Descriptors: Secondary School Science, Science Education, Science Tests, Computer Assisted Testing

Higher Order Testlet Response Models for Hierarchical Latent Traits and Testlet-Based Items

Peer reviewed

Direct link

Huang, Hung-Yu; Wang, Wen-Chung – Educational and Psychological Measurement, 2013

Both testlet design and hierarchical latent traits are fairly common in educational and psychological measurements. This study aimed to develop a new class of higher order testlet response models that consider both local item dependence within testlets and a hierarchy of latent traits. Due to high dimensionality, the authors adopted the Bayesian…

Descriptors: Item Response Theory, Models, Bayesian Statistics, Computation

Assessing the "Rothstein Falsification Test": Does It Really Show Teacher Value-Added Models Are Biased? CEDR Working Paper No. 2012 1.3

Direct link

Goldhaber, Dan; Chaplin, Duncan – Center for Education Data & Research, 2012

In a provocative and influential paper, Jesse Rothstein (2010) finds that standard value added models (VAMs) suggest implausible future teacher effects on past student achievement, a finding that obviously cannot be viewed as causal. This is the basis of a falsification test (the Rothstein falsification test) that appears to indicate bias in VAM…

Descriptors: School Effectiveness, Teacher Effectiveness, Achievement Gains, Statistical Bias

Multinomial and Compound Multinomial Error Models for Tests with Complex Item Scoring

Peer reviewed

Direct link

Lee, Won-Chan – Applied Psychological Measurement, 2007

This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…

Descriptors: Simulation, Error of Measurement, Scoring, Test Items

Measurement Invariance versus Selection Invariance: Is Fair Selection Possible?

Peer reviewed

Direct link

Borsman, Denny; Romeijn, Jan-Willem; Wicherts, Jelte M. – Psychological Methods, 2008

This article shows that measurement invariance (defined in terms of an invariant measurement model in different groups) is generally inconsistent with selection invariance (defined in terms of equal sensitivity and specificity across groups). In particular, when a unidimensional measurement instrument is used and group differences are present in…

Descriptors: Test Items, Minority Groups, Measurement, Scores

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

An Empirical Investigaiton of Six Methods for Examing Test Item Bias. Final Report.

Merz, William R.; Grossen, Neal E. – 1978

Six approaches to assessing test item bias were examined: transformed item difficulty, point biserial correlations, chi-square, factor analysis, one parameter item characteristic curve, and three parameter item characteristic curve. Data sets for analysis were generated by a Monte Carlo technique based on the three parameter model; thus, four…

Descriptors: Difficulty Level, Evaluation Methods, Factor Analysis, Item Analysis

Asilkalkan, Abdullah	1
Borsman, Denny	1
Boughton, Keith A.	1
Chaplin, Duncan	1
Choi, Youn-Jeng	1
Daud, Muslem	1
Eason, Hershel	1
Emons, Wilco H. M.	1
Gierl, Mark J.	1
Goldhaber, Dan	1
Gotzmann, Andrea	1
Grossen, Neal E.	1
Gu, Zhengguo	1
Hau, Kit-Tai	1
Huang, Hung-Yu	1
Jin, Ying	1
Kuo, Bor-Chen	1
Lee, Won-Chan	1
Longford, Nicholas T.	1
Merz, William R.	1
Romeijn, Jan-Willem	1
Sijtsma, Klaas	1
Wang, Wen-Chung	1
Wicherts, Jelte M.	1
Xiao, Leifeng	1
More ▼