ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	56

Descriptor

Comparative Analysis	122
Item Response Theory	59
Simulation	36
Test Items	35
Models	26
Mathematical Models	23
Statistical Analysis	22
Computer Assisted Testing	18
Equations (Mathematics)	17
Computer Simulation	16
Estimation (Mathematics)	16
Evaluation Methods	16
Adaptive Testing	15
Computation	15
Monte Carlo Methods	14
Equated Scores	13
Goodness of Fit	12
Sample Size	12
Error of Measurement	11
Correlation	10
Item Analysis	10
Maximum Likelihood Statistics	10
Nonparametric Statistics	10
Tests	10
Computer Software	9
More ▼

Source

Applied Psychological…

122

Publication Type

Journal Articles	122
Reports - Research	59
Reports - Evaluative	55
Reports - Descriptive	7
Speeches/Meeting Papers	3
Book/Product Reviews	1
Guides - Non-Classroom	1

Education Level

Higher Education	4
High Schools	2
Early Childhood Education	1
Elementary Education	1
Grade 2	1
Postsecondary Education	1
Primary Education	1
Secondary Education	1

Audience

Practitioners	1
Researchers	1

Location

Israel	3
Netherlands	3
Australia	1
Singapore	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Armed Services Vocational…	1
Center for Epidemiologic…	1
Iowa Tests of Educational…	1
Law School Admission Test	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 122 results Save | Export

The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2013

A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…

Descriptors: Item Response Theory, Reliability, Scores, Error of Measurement

Observed Score and True Score Equating Procedures for Multidimensional Item Response Theory

Peer reviewed

Direct link

Brossman, Bradley G.; Lee, Won-Chan – Applied Psychological Measurement, 2013

The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the multidimensional item response theory (MIRT) framework. Three equating procedures--two observed score procedures and one true score procedure--were created and described in detail. One observed score procedure was…

Descriptors: Equated Scores, True Scores, Item Response Theory, Mathematics Tests

Confirming Testlet Effects

Peer reviewed

Direct link

DeMars, Christine E. – Applied Psychological Measurement, 2012

A testlet is a cluster of items that share a common passage, scenario, or other context. These items might measure something in common beyond the trait measured by the test as a whole; if so, the model for the item responses should allow for this testlet trait. But modeling testlet effects that are negligible makes the model unnecessarily…

Descriptors: Test Items, Item Response Theory, Comparative Analysis, Models

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Peer reviewed

Direct link

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores

Comparison of Automated Scoring Methods for a Computerized Performance Assessment of Clinical Judgment

Peer reviewed

Direct link

Harik, Polina; Baldwin, Peter; Clauser, Brian – Applied Psychological Measurement, 2013

Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…

Descriptors: Computer Assisted Testing, Automation, Scoring, Comparative Analysis

A Comparison of Four Methods of IRT Subscoring

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011

Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…

Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis

Coefficient Alpha Bootstrap Confidence Interval under Nonnormality

Peer reviewed

Direct link

Padilla, Miguel A.; Divers, Jasmin; Newton, Matthew – Applied Psychological Measurement, 2012

Three different bootstrap methods for estimating confidence intervals (CIs) for coefficient alpha were investigated. In addition, the bootstrap methods were compared with the most promising coefficient alpha CI estimation methods reported in the literature. The CI methods were assessed through a Monte Carlo simulation utilizing conditions…

Descriptors: Intervals, Monte Carlo Methods, Computation, Sampling

A SAS IML Macro for Loglinear Smoothing

Peer reviewed

Direct link

Moses, Tim; von Davier, Alina – Applied Psychological Measurement, 2011

Polynomial loglinear models for one-, two-, and higher-way contingency tables have important applications to measurement and assessment. They are essentially regarded as a smoothing technique, which is commonly referred to as loglinear smoothing. A SAS IML (SAS Institute, 2002a) macro was created to implement loglinear smoothing according to…

Descriptors: Statistical Analysis, Computer Software, Algebra, Mathematical Formulas

GMHDIF: A Computer Program for Detecting DIF in Dichotomous and Polytomous Items Using Generalized Mantel-Haenszel Statistics

Peer reviewed

Direct link

Fidalgo, Angel M. – Applied Psychological Measurement, 2011

Mantel-Haenszel (MH) methods constitute one of the most popular nonparametric differential item functioning (DIF) detection procedures. GMHDIF has been developed to provide an easy-to-use program for conducting DIF analyses. Some of the advantages of this program are that (a) it performs two-stage DIF analyses in multiple groups simultaneously;…

Descriptors: Test Bias, Computer Software, Statistics, Comparative Analysis

Iterative Linking with the Differential Functioning of Items and Tests (DFIT) Method: Comparison of Testwide and Item Parameter Replication (IPR) Critical Values

Peer reviewed

Direct link

Seybert, Jacob; Stark, Stephen – Applied Psychological Measurement, 2012

A Monte Carlo study was conducted to examine the accuracy of differential item functioning (DIF) detection using the differential functioning of items and tests (DFIT) method. Specifically, the performance of DFIT was compared using "testwide" critical values suggested by Flowers, Oshima, and Raju, based on simulations involving large numbers of…

Descriptors: Test Bias, Monte Carlo Methods, Form Classes (Languages), Simulation

The MIMIC Model as a Tool for Differential Bundle Functioning Detection

Peer reviewed

Direct link

Finch, W. Holmes – Applied Psychological Measurement, 2012

Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…

Descriptors: Test Bias, Test Items, Statistical Analysis, Models

A Comparison between Some Generalized Mantel-Haenszel Statistics for Detecting DIF in Data Simulated under the Graded Response Model

Peer reviewed

Direct link

Fidalgo, Angel M.; Bartram, Dave – Applied Psychological Measurement, 2010

The main objective of this study was to establish the relative efficacy of the generalized Mantel-Haenszel test (GMH) and the Mantel test for detecting large numbers of differential item functioning (DIF) patterns. To this end this study considered a topic not dealt with in the literature to date: the possible differential effect of type of scores…

Descriptors: Test Bias, Statistics, Scoring, Comparative Analysis

DIF Testing for Ordinal Items with Poly-SIBTEST, the Mantel and GMH Tests, and IRT-LR-DIF when the Latent Distribution Is Nonnormal for Both Groups

Peer reviewed

Direct link

Woods, Carol M. – Applied Psychological Measurement, 2011

Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another. One way to test items with ordinal response scales for DIF is likelihood ratio (LR) testing using item response theory (IRT), or IRT-LR-DIF. Despite the various advantages of…

Descriptors: Test Bias, Test Items, Item Response Theory, Nonparametric Statistics

Fitting IRT Models to Dichotomous and Polytomous Data: Assessing the Relative Model-Data Fit of Ideal Point and Dominance Models

Peer reviewed

Direct link

Tay, Louis; Ali, Usama S.; Drasgow, Fritz; Williams, Bruce – Applied Psychological Measurement, 2011

This study investigated the relative model-data fit of an ideal point item response theory (IRT) model (the generalized graded unfolding model [GGUM]) and dominance IRT models (e.g., the two-parameter logistic model [2PLM] and Samejima's graded response model [GRM]) to simulated dichotomous and polytomous data generated from each of these models.…

Descriptors: Item Response Theory, Data, Models, Goodness of Fit

Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

Peer reviewed

Direct link

Jones, Andrew T. – Applied Psychological Measurement, 2011

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…

Descriptors: Test Items, Item Analysis, Cutting Scores, Statistics

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Cohen, Allan S.	5
Woods, Carol M.	5
Brennan, Robert L.	3
Dodd, Barbara G.	3
Finkelman, Matthew D.	3
Lee, Won-Chan	3
Liou, Michelle	3
Meijer, Rob R.	3
Moses, Tim	3
van der Linden, Wim J.	3
Cui, Zhongmin	2
Drasgow, Fritz	2
Feldt, Leonard S.	2
Fidalgo, Angel M.	2
Garcia-Perez, Miguel A.	2
Hanson, Bradley A.	2
Kang, Taehoon	2
Kim, Seock-Ho	2
Kim, Wonsuk	2
Kolen, Michael J.	2
Mellenbergh, Gideon J.	2
Nering, Michael L.	2
Olea, Julio	2
Ponsoda, Vicente	2
More ▼