ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	24

Descriptor

Comparative Analysis	40
Simulation	30
Item Response Theory	23
Test Items	15
Computer Simulation	10
Computer Assisted Testing	9
Models	9
Evaluation Methods	8
Computation	7
Sample Size	6
Scores	6
Adaptive Testing	5
Error of Measurement	5
Estimation (Mathematics)	5
Item Analysis	5
Difficulty Level	4
Equated Scores	4
Hypothesis Testing	4
Measurement	4
Regression (Statistics)	4
Statistical Analysis	4
Test Bias	4
Bayesian Statistics	3
Correlation	3
Data Analysis	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	40
Reports - Research	25
Reports - Evaluative	15
Speeches/Meeting Papers	4

Education Level

Elementary Secondary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Indiana Statewide Testing for…	1
National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Bayesian Model Selection Methods for Multilevel IRT Models: A Comparison of Five DIC-Based Indices

Peer reviewed

Direct link

Zhang, Xue; Tao, Jian; Wang, Chun; Shi, Ning-Zhong – Journal of Educational Measurement, 2019

Model selection is important in any statistical analysis, and the primary goal is to find the preferred (or most parsimonious) model, based on certain criteria, from a set of candidate models given data. Several recent publications have employed the deviance information criterion (DIC) to do model selection among different forms of multilevel item…

Descriptors: Bayesian Statistics, Item Response Theory, Measurement, Models

Standard Errors of IRT Parameter Scale Transformation Coefficients: Comparison of Bootstrap Method, Delta Method, and Multiple Imputation Method

Peer reviewed

Direct link

Zhang, Zhonghua; Zhao, Mingren – Journal of Educational Measurement, 2019

The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item…

Descriptors: Item Response Theory, Error Patterns, Item Analysis, Simulation

A Top-Down Approach to Designing the Computerized Adaptive Multistage Test

Peer reviewed

Direct link

Luo, Xiao; Kim, Doyoung – Journal of Educational Measurement, 2018

The top-down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route-based top-down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The…

Descriptors: Computer Assisted Testing, Test Construction, Decision Making, Simulation

Scale Alignment in Between-Item Multidimensional Rasch Models

Peer reviewed

Direct link

Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019

Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…

Descriptors: Item Response Theory, Models, Scores, Comparative Analysis

Equating with Miditests Using IRT

Peer reviewed

Direct link

Fitzpatrick, Joseph; Skorupski, William P. – Journal of Educational Measurement, 2016

The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…

Descriptors: Difficulty Level, Test Items, Comparative Analysis, Test Construction

Lord's Wald Test for Detecting Dif in Multidimensional Irt Models: A Comparison of Two Estimation Approaches

Peer reviewed

Direct link

Lee, Soo; Suh, Youngsuk – Journal of Educational Measurement, 2018

Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…

Descriptors: Item Response Theory, Sample Size, Models, Error of Measurement

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Optimal Bandwidth Selection in Observed-Score Kernel Equating

Peer reviewed

Direct link

Häggström, Jenny; Wiberg, Marie – Journal of Educational Measurement, 2014

The selection of bandwidth in kernel equating is important because it has a direct impact on the equated test scores. The aim of this article is to examine the use of double smoothing when selecting bandwidths in kernel equating and to compare double smoothing with the commonly used penalty method. This comparison was made using both an equivalent…

Descriptors: Equated Scores, Data Analysis, Comparative Analysis, Simulation

Adjoined Piecewise Linear Approximations (APLAs) for Equating: Accuracy Evaluations of a Postsmoothing Equating Method

Peer reviewed

Direct link

Moses, Tim – Journal of Educational Measurement, 2013

The purpose of this study was to evaluate the use of adjoined and piecewise linear approximations (APLAs) of raw equipercentile equating functions as a postsmoothing equating method. APLAs are less familiar than other postsmoothing equating methods (i.e., cubic splines), but their use has been described in historical equating practices of…

Descriptors: Equated Scores, Accuracy, Simulation, Comparative Analysis

Semiparametric Item Response Functions in the Context of Guessing

Peer reviewed

Direct link

Falk, Carl F.; Cai, Li – Journal of Educational Measurement, 2016

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

Descriptors: Item Response Theory, Guessing (Tests), Mathematics Tests, Simulation

Assessing Individual-Level Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis

Detection of Invalid Test Scores: The Usefulness of Simple Nonparametric Statistics

Peer reviewed

Direct link

Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014

In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…

Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis

Longitudinal Multistage Testing

Peer reviewed

Direct link

Pohl, Steffi – Journal of Educational Measurement, 2013

This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for…

Descriptors: Adaptive Testing, Longitudinal Studies, Difficulty Level, Comparative Analysis

Monitoring Items in Real Time to Enhance CAT Security

Peer reviewed

Direct link

Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016

An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

Peer reviewed

Direct link

Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014

Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…

Descriptors: Test Bias, Models, Simulation, Error Patterns

Previous Page | Next Page »

Pages: 1 | 2 | 3

Kim, Seonghoon	2
Nandakumar, Ratna	2
Suh, Youngsuk	2
Bejar, Isaac I.	1
Burket, George R.	1
Cai, Li	1
Chang, Hua-Hua	1
Chen, Ping	1
Chen, Shu-Ying	1
Cho, Sun-Joo	1
Choi, Seung W.	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
De Ayala, R. J.	1
DeCarlo, Lawrence T.	1
Ding, Shuliang	1
Emons, Wilco H. M.	1
Falk, Carl F.	1
Feldt, Leonard S.	1
Feuerstahler, Leah	1
Finch, Holmes	1
Fitzpatrick, Anne R.	1
Fitzpatrick, Joseph	1
Frary, Robert B.	1
More ▼