ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	19

Descriptor

Comparative Analysis	35
Test Items	35
Item Response Theory	19
Simulation	15
Computer Assisted Testing	12
Adaptive Testing	10
Item Analysis	8
Models	7
Statistical Analysis	7
Test Construction	6
Evaluation Methods	5
Item Banks	5
Computation	4
Computer Simulation	4
Equated Scores	4
Error of Measurement	4
Item Bias	4
Measurement Techniques	4
Monte Carlo Methods	4
Selection	4
Correlation	3
Equations (Mathematics)	3
Estimation (Mathematics)	3
Factor Analysis	3
Foreign Countries	3
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	35
Reports - Research	20
Reports - Evaluative	15
Speeches/Meeting Papers	1

Education Level

Higher Education	2
Early Childhood Education	1
Elementary Education	1
Grade 2	1
High Schools	1
Postsecondary Education	1
Primary Education	1
Secondary Education	1

Audience

Location

Israel	1
Netherlands	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Law School Admission Test

What Works Clearinghouse Rating

Showing 1 to 15 of 35 results Save | Export

Confirming Testlet Effects

Peer reviewed

Direct link

DeMars, Christine E. – Applied Psychological Measurement, 2012

A testlet is a cluster of items that share a common passage, scenario, or other context. These items might measure something in common beyond the trait measured by the test as a whole; if so, the model for the item responses should allow for this testlet trait. But modeling testlet effects that are negligible makes the model unnecessarily…

Descriptors: Test Items, Item Response Theory, Comparative Analysis, Models

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Peer reviewed

Direct link

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores

Iterative Linking with the Differential Functioning of Items and Tests (DFIT) Method: Comparison of Testwide and Item Parameter Replication (IPR) Critical Values

Peer reviewed

Direct link

Seybert, Jacob; Stark, Stephen – Applied Psychological Measurement, 2012

A Monte Carlo study was conducted to examine the accuracy of differential item functioning (DIF) detection using the differential functioning of items and tests (DFIT) method. Specifically, the performance of DFIT was compared using "testwide" critical values suggested by Flowers, Oshima, and Raju, based on simulations involving large numbers of…

Descriptors: Test Bias, Monte Carlo Methods, Form Classes (Languages), Simulation

The MIMIC Model as a Tool for Differential Bundle Functioning Detection

Peer reviewed

Direct link

Finch, W. Holmes – Applied Psychological Measurement, 2012

Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…

Descriptors: Test Bias, Test Items, Statistical Analysis, Models

DIF Testing for Ordinal Items with Poly-SIBTEST, the Mantel and GMH Tests, and IRT-LR-DIF when the Latent Distribution Is Nonnormal for Both Groups

Peer reviewed

Direct link

Woods, Carol M. – Applied Psychological Measurement, 2011

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another. One way to test items with ordinal response scales for DIF is likelihood ratio (LR) testing using item response theory (IRT), or IRT-LR-DIF. Despite the various advantages of…

Descriptors: Test Bias, Test Items, Item Response Theory, Nonparametric Statistics

Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

Peer reviewed

Direct link

Jones, Andrew T. – Applied Psychological Measurement, 2011

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…

Descriptors: Test Items, Item Analysis, Cutting Scores, Statistics

Coefficient Alpha and Reliability of Scale Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Psychological Measurement, 2013

The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (a; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw…

Descriptors: Raw Scores, Scaling, Reliability, Computation

An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models with Computerized Adaptive Testing

Peer reviewed

Direct link

Yen, Yung-Chin; Ho, Rong-Guey; Laio, Wen-Wei; Chen, Li-Ju; Kuo, Ching-Chin – Applied Psychological Measurement, 2012

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Response Style (Tests)

A Comparison of Anchor-Item Designs for the Concurrent Calibration of Large Banks of Likert-Type Items

Peer reviewed

Direct link

Garcia-Perez, Miguel A.; Alcala-Quintana, Rocio; Garcia-Cueto, Eduardo – Applied Psychological Measurement, 2010

Current interest in measuring quality of life is generating interest in the construction of computerized adaptive tests (CATs) with Likert-type items. Calibration of an item bank for use in CAT requires collecting responses to a large number of candidate items. However, the number is usually too large to administer to each subject in the…

Descriptors: Comparative Analysis, Test Items, Equated Scores, Item Banks

A Comparison of Item Selection Techniques for Testlets

Peer reviewed

Direct link

Murphy, Daniel L.; Dodd, Barbara G.; Vaughn, Brandon K. – Applied Psychological Measurement, 2010

This study examined the performance of the maximum Fisher's information, the maximum posterior weighted information, and the minimum expected posterior variance methods for selecting items in a computerized adaptive testing system when the items were grouped in testlets. A simulation study compared the efficiency of ability estimation among the…

Descriptors: Simulation, Adaptive Testing, Item Analysis, Item Response Theory

A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

Peer reviewed

Direct link

Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose – Applied Psychological Measurement, 2010

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…

Descriptors: Test Items, Simulation, Adaptive Testing, Item Analysis

Comparison of CAT Item Selection Criteria for Polytomous Items

Peer reviewed

Direct link

Choi, Seung W.; Swartz, Richard J. – Applied Psychological Measurement, 2009

Item selection is a core component in computerized adaptive testing (CAT). Several studies have evaluated new and classical selection methods; however, the few that have applied such methods to the use of polytomous items have reported conflicting results. To clarify these discrepancies and further investigate selection method properties, six…

Descriptors: Adaptive Testing, Item Analysis, Comparative Analysis, Test Items

Item Selection and Hypothesis Testing for the Adaptive Measurement of Change

Peer reviewed

Direct link

Finkelman, Matthew D.; Weiss, David J.; Kim-Kang, Gyenam – Applied Psychological Measurement, 2010

Assessing individual change is an important topic in both psychological and educational measurement. An adaptive measurement of change (AMC) method had previously been shown to exhibit greater efficiency in detecting change than conventional nonadaptive methods. However, little work had been done to compare different procedures within the AMC…

Descriptors: Computer Assisted Testing, Hypothesis Testing, Measurement, Item Analysis

Three Classes of Nonparametric Differential Step Functioning Effect Estimators

Peer reviewed

Direct link

Penfield, Randall D. – Applied Psychological Measurement, 2008

The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…

Descriptors: Simulation, Nonparametric Statistics, Item Response Theory, Computation

Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…

Descriptors: Test Length, Test Content, Simulation, Computation

Previous Page | Next Page »

Pages: 1 | 2 | 3

Dodd, Barbara G.	3
Cohen, Allan S.	2
Cui, Zhongmin	2
Meijer, Rob R.	2
van der Linden, Wim J.	2
Abad, Francisco Jose	1
Alcala-Quintana, Rocio	1
Almehrizi, Rashid S.	1
Barrada, Juan Ramon	1
Beller, Michael	1
Camilli, Gregory	1
Chang, Hua-Hua	1
Chen, Hanwei	1
Chen, Li-Ju	1
Choi, Seung W.	1
DeMars, Christine E.	1
Fang, Yu	1
Finch, W. Holmes	1
Finkelman, Matthew D.	1
Garcia-Cueto, Eduardo	1
Garcia-Perez, Miguel A.	1
Gialluca, Kathleen A.	1
He, Yong	1
Hetter, Rebecca D.	1
More ▼