ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	5

Source

Applied Measurement in…

Publication Type

Journal Articles	26
Reports - Evaluative	26
Speeches/Meeting Papers	3
Reports - Research	2

Education Level

Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Kindergarten	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Peer reviewed

Direct link

Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…

Descriptors: Test Items, Test Content, Testing Programs, Simulation

Comparisons of Methodologies and Results in Vertical Scaling for Educational Achievement Tests

Peer reviewed

Direct link

Tong, Ye; Kolen, Michael J. – Applied Measurement in Education, 2007

A number of vertical scaling methodologies were examined in this article. Scaling variations included data collection design, scaling method, item response theory (IRT) scoring procedure, and proficiency estimation method. Vertical scales were developed for Grade 3 through Grade 8 for 4 content areas and 9 simulated datasets. A total of 11 scaling…

Descriptors: Achievement Tests, Scaling, Methods, Item Response Theory

The Effects of Nonnormality and Number of Response Categories on Reliability.

Peer reviewed

Bandalos, Deborah L.; Enders, Craig K. – Applied Measurement in Education, 1996

Computer simulation indicated that reliability increased with the degree of similarity between underlying and observed distributions when the observed categorical distribution was deliberately constructed to match the shape of the underlying distribution of the trait being measured. Reliability also increased with correlation among variables and…

Descriptors: Computer Simulation, Correlation, Likert Scales, Reliability

Assessing the Dimensionality of Item Response Matrices with Small Sample Sizes and Short Test Lengths.

Peer reviewed

De Champlain, Andre; Gessaroli, Marc E. – Applied Measurement in Education, 1998

Type I error rates and rejection rates for three-dimensionality assessment procedures were studied with data sets simulated to reflect short tests and small samples. Results show that the G-squared difference test (D. Bock, R. Gibbons, and E. Muraki, 1988) suffered from a severely inflated Type I error rate at all conditions simulated. (SLD)

Descriptors: Item Response Theory, Matrices, Sample Size, Simulation

Comparison of the Psychometric Properties of Several Computer-Based Test Designs for Credentialing Exams with Multiple Purposes

Peer reviewed

Direct link

Jodoin, Michael G.; Zenisky, April; Hambleton, Ronald K. – Applied Measurement in Education, 2006

Many credentialing agencies today are either administering their examinations by computer or are likely to be doing so in the coming years. Unfortunately, although several promising computer-based test designs are available, little is known about how well they function in examination settings. The goal of this study was to compare fixed-length…

Descriptors: Computers, Test Results, Psychometrics, Computer Simulation

The Effects of Heterogeneous Item Distributions on Reliability.

Peer reviewed

Enders, Craig K.; Bandalos, Deborah L. – Applied Measurement in Education, 1999

Examined the degree to which coefficient alpha is affected by including items with different distribution shapes within a unidimensional scale. Computer simulation results indicate that reliability does not increase dramatically as a result of using differentially shaped items within a scale. Discusses implications for test construction. (SLD)

Descriptors: Computer Simulation, Reliability, Scaling, Statistical Distributions

Vertical Scaling with the Rasch Model Utilizing Default and Tight Convergence Settings with WINSTEPS and BILOG-MG

Peer reviewed

Direct link

Custer, Michael; Omar, Md Hafidz; Pomplun, Mark – Applied Measurement in Education, 2006

This study compared vertical scaling results for the Rasch model from BILOG-MG and WINSTEPS. The item and ability parameters for the simulated vocabulary tests were scaled across 11 grades; kindergarten through 10th. Data were based on real data and were simulated under normal and skewed distribution assumptions. WINSTEPS and BILOG-MG were each…

Descriptors: Models, Scaling, Computer Software, Vocabulary

Capitalization on Item Calibration Error in Adaptive Testing.

Peer reviewed

van der Linden, Wim J.; Glas, Cees A. W. – Applied Measurement in Education, 2000

Performed a simulation study to demonstrate the dramatic impact of capitalization on estimation errors on ability estimation in adaptive testing. Discusses four different strategies to minimize the likelihood of capitalization in computerized adaptive testing. (SLD)

Descriptors: Ability, Adaptive Testing, Computer Assisted Testing, Estimation (Mathematics)

Evaluating the Effects of Multidimensionality on IRT True-Score Equating.

Peer reviewed

Bolt, Daniel M. – Applied Measurement in Education, 1999

Examined whether the item response theory (IRT) true-score equating method is more adversely affected by the presence of multidimensionality than two conventional equating methods, linear and equipercentile equating. Results of two simulation studies suggest that the IRT method performs as well as the conventional methods when the correlation…

Descriptors: Correlation, Equated Scores, Item Response Theory, Simulation

Evaluating Type I Error and Power Rates Using an Effect Size Measure with the Logistic Regression Procedure for DIF Detection.

Peer reviewed

Jodoin, Michael G.; Gierl, Mark J. – Applied Measurement in Education, 2001

Developed a new classification method for the logistic regression (LR) procedure for differential item functioning (DIF) based on methods used in the Simultaneous Item Bias test and conducted a simulation study to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure. Results show that inclusion of…

Descriptors: Classification, Effect Size, Item Bias, Power (Statistics)

An Examination of Conditioning Variables Used in Computer Adaptive Testing for DIF Analyses.

Peer reviewed

Walker, Cindy M.; Beretvas, S. Natasha; Ackerman, Terry – Applied Measurement in Education, 2001

Conducted a simulation study of differential item functioning (DIF) to compare the power and Type I error rates for two conditions: using an examinee's ability estimate as the conditioning variable with the CATSIB program and either using the regression correction from CATSIB or not. Discusses implications of findings for DIF detection. (SLD)

Descriptors: Ability, Adaptive Testing, Computer Assisted Testing, Item Bias

Bayesian or Non-Bayesian: A Comparison Study of Item Parameter Estimation in the Three-Parameter Logistic Model

Peer reviewed

Direct link

Gao, Furong; Chen, Lisue – Applied Measurement in Education, 2005

Through a large-scale simulation study, this article compares item parameter estimates obtained by the marginal maximum likelihood estimation (MMLE) and marginal Bayes modal estimation (MBME) procedures in the 3-parameter logistic model. The impact of different prior specifications on the MBME estimates is also investigated using carefully…

Descriptors: Simulation, Computation, Bayesian Statistics, Item Analysis

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

A Simulation Comparison of Parametric and Nonparametric Dimensionality Detection Procedures

Peer reviewed

Direct link

Mroch, Andrew A.; Bolt, Daniel M. – Applied Measurement in Education, 2006

Recently, nonparametric methods have been proposed that provide a dimensionally based description of test structure for tests with dichotomous items. Because such methods are based on different notions of dimensionality than are assumed when using a psychometric model, it remains unclear whether these procedures might lead to a different…

Descriptors: Simulation, Comparative Analysis, Psychometrics, Methods Research

A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

Peer reviewed

Kingsbury, G. Gage; Zara, Anthony R. – Applied Measurement in Education, 1991

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)

Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Computer Simulation

Previous Page | Next Page »

Pages: 1 | 2

Simulation	16
Test Items	12
Computer Simulation	10
Item Response Theory	9
Computer Assisted Testing	6
Test Construction	6
Evaluation Methods	5
Adaptive Testing	4
Comparative Analysis	4
Test Bias	4
Classification	3
Estimation (Mathematics)	3
Item Bias	3
Performance Based Assessment	3
Psychometrics	3
Reliability	3
Scaling	3
Student Evaluation	3
Ability	2
Achievement Tests	2
Computation	2
Computer Software	2
Constructed Response	2
Correlation	2
Effect Size	2
More ▼

Bandalos, Deborah L.	2
Bolt, Daniel M.	2
Enders, Craig K.	2
Gierl, Mark J.	2
Jodoin, Michael G.	2
Su, Ya-Hui	2
Wang, Wen-Chung	2
Ackerman, Terry	1
Beretvas, S. Natasha	1
Boughton, Keith A.	1
Chang, Lucy	1
Chen, Lisue	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
Custer, Michael	1
De Ayala, R. J.	1
De Champlain, Andre	1
El-Bayoumi, Gigi	1
Fitzpatrick, Anne R.	1
Gao, Furong	1
Gessaroli, Marc E.	1
Glas, Cees A. W.	1
Gotzmann, Andrea	1
Hambleton, Ronald K.	1
More ▼