ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	10

Descriptor

Hierarchical Linear Modeling	10
Item Response Theory	6
Test Bias	5
Test Items	5
Models	4
Evaluation Methods	3
Achievement Tests	2
College Students	2
Computation	2
Correlation	2
Difficulty Level	2
Foreign Countries	2
Item Analysis	2
Longitudinal Studies	2
Pretests Posttests	2
Test Validity	2
Accuracy	1
Achievement Gains	1
Bayesian Statistics	1
Bias	1
Business	1
Business Administration…	1
Cognitive Processes	1
College Entrance Examinations	1
Comparative Analysis	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	10
Reports - Research	10

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Education	1
Secondary Education	1

Audience

Location

Germany

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Program for International…	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Peer reviewed

Direct link

Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…

Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory

Using Hierarchical Logistic Regression to Study DIF and DIF Variance in Multilevel Data

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2018

When contextual features of test-taking environments differentially affect item responding for different test takers and these features vary across test administrations, they may cause differential item functioning (DIF) that varies across test administrations. Because many common DIF detection methods ignore potential DIF variance, this article…

Descriptors: Test Bias, Regression (Statistics), Hierarchical Linear Modeling

Scoring Stability in a Large-Scale Assessment Program: A Longitudinal Analysis of Leniency/Severity Effects

Peer reviewed

Direct link

Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019

Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…

Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation

Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation

Peer reviewed

Direct link

Albano, Anthony D.; Cai, Liuhan; Lease, Erin M.; McConnell, Scott R. – Journal of Educational Measurement, 2019

Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in…

Descriptors: Test Items, Computer Assisted Testing, Item Analysis, Difficulty Level

Integrating the Analysis of Mental Operations into Multilevel Models to Validate an Assessment of Higher Education Students' Competency in Business and Economics

Peer reviewed

Direct link

Brückner, Sebastian; Pellegrino, James W. – Journal of Educational Measurement, 2016

The Standards for Educational and Psychological Testing indicate that validation of assessments should include analyses of participants' response processes. However, such analyses typically are conducted only to supplement quantitative field studies with qualitative data, and seldom are such data connected to quantitative data on student or item…

Descriptors: Hierarchical Linear Modeling, Test Validity, Statistical Analysis, College Students

Modeling Instructional Sensitivity Using a Longitudinal Multilevel Differential Item Functioning Approach

Peer reviewed

Direct link

Naumann, Alexander; Hochweber, Jan; Hartig, Johannes – Journal of Educational Measurement, 2014

Students' performance in assessments is commonly attributed to more or less effective teaching. This implies that students' responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a…

Descriptors: Test Bias, Longitudinal Studies, Hierarchical Linear Modeling, Test Items

Pretest-Posttest-Posttest Multilevel IRT Modeling of Competence Growth of Students in Higher Education in Germany

Peer reviewed

Direct link

Schmidt, Susanne; Zlatkin-Troitschanskaia, Olga; Fox, Jean-Paul – Journal of Educational Measurement, 2016

Longitudinal research in higher education faces several challenges. Appropriate methods of analyzing competence growth of students are needed to deal with those challenges and thereby obtain valid results. In this article, a pretest-posttest-posttest multivariate multilevel IRT model for repeated measures is introduced which is designed to address…

Descriptors: Foreign Countries, Pretests Posttests, Hierarchical Linear Modeling, Item Response Theory

Multilevel Modeling of Item Position Effects

Peer reviewed

Direct link

Albano, Anthony D. – Journal of Educational Measurement, 2013

In many testing programs it is assumed that the context or position in which an item is administered does not have a differential effect on examinee responses to the item. Violations of this assumption may bias item response theory estimates of item and person parameters. This study examines the potentially biasing effects of item position. A…

Descriptors: Test Items, Item Response Theory, Test Format, Questioning Techniques

Estimation Methods for One-Parameter Testlet Models

Peer reviewed

Direct link

Jiao, Hong; Wang, Shudong; He, Wei – Journal of Educational Measurement, 2013

This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…

Descriptors: Computation, Item Response Theory, Models, Monte Carlo Methods

Albano, Anthony D.	2
Artur Pokropek	1
Brückner, Sebastian	1
Bunch, Michael B.	1
Cai, Liuhan	1
Carl Westine	1
Carmen Köhler	1
Fox, Jean-Paul	1
Hartig, Johannes	1
He, Wei	1
Hochweber, Jan	1
Jiao, Hong	1
Johannes Hartig	1
Lale Khorramdel	1
Lease, Erin M.	1
McConnell, Scott R.	1
Michelle Boyer	1
Naumann, Alexander	1
Palermo, Corey	1
Pellegrino, James W.	1
Ridge, Kirk	1
Schmidt, Susanne	1
Shear, Benjamin R.	1
Stella Y. Kim	1
Tong Wu	1
More ▼