ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	4

Descriptor

Test Format	10
Test Items	10
Item Response Theory	6
Models	5
Test Construction	4
Equated Scores	3
Foreign Countries	3
Multiple Choice Tests	3
Goodness of Fit	2
Sampling	2
Statistical Distributions	2
Test Reliability	2
Test Validity	2
Academic Ability	1
Adaptive Testing	1
Adults	1
Attitude Measures	1
Bayesian Statistics	1
Behavior	1
Chi Square	1
College Applicants	1
College Students	1
Comparative Analysis	1
Computation	1
Computer Assisted Testing	1
More ▼

Source

Applied Psychological…

Author

Baker, Frank B.	1
Budescu, David V.	1
Budgell, Glen R.	1
Chang, Lei	1
Chang, Wanchen	1
Dodd, Barbara G.	1
Hanson, Bradley A.	1
Hol, A. Michiel	1
Hsu, Louis M.	1
Mellenbergh, Gideon J.	1
Nicewander, W. Alan	1
Quenette, Mary A.	1
Schwarz, Richard D.	1
Thomasson, Gary L.	1
Vorst, Harrie C. M.	1
Whittaker, Tiffany A.	1
Yao, Lihua	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	7
Reports - Evaluative	4

Education Level

Elementary Secondary Education	1
Grade 12	1
Grade 4	1
Grade 8	1
Higher Education	1

Audience

Location

Canada	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Armed Services Vocational…	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

The Performance of IRT Model Selection Methods with Mixed-Format Tests

Peer reviewed

Direct link

Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G. – Applied Psychological Measurement, 2012

When tests consist of multiple-choice and constructed-response items, researchers are confronted with the question of which item response theory (IRT) model combination will appropriately represent the data collected from these mixed-format tests. This simulation study examined the performance of six model selection criteria, including the…

Descriptors: Item Response Theory, Models, Selection, Criteria

Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison with Short Forms

Peer reviewed

Direct link

Hol, A. Michiel; Vorst, Harrie C. M.; Mellenbergh, Gideon J. – Applied Psychological Measurement, 2007

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible…

Descriptors: Student Motivation, Simulation, Adaptive Testing, Computer Assisted Testing

An Investigation of the Sampling Distributions of Equating Coefficients.

Peer reviewed

Baker, Frank B. – Applied Psychological Measurement, 1996

Using the characteristic curve method for dichotomously scored test items, the sampling distributions of equating coefficients were examined. Simulations indicate that for the equating conditions studied, the sampling distributions of the equating coefficients appear to have acceptable characteristics, suggesting confidence in the values obtained…

Descriptors: Equated Scores, Item Response Theory, Sampling, Statistical Distributions

Standard Errors of Levine Linear Equating.

Peer reviewed

Hanson, Bradley A.; And Others – Applied Psychological Measurement, 1993

The delta method was used to derive standard errors (SES) of the Levine observed score and Levine true score linear test equating methods using data from two test forms. SES derived without the normality assumption and bootstrap SES were very close. The situation with skewed score distributions is also discussed. (SLD)

Descriptors: Equated Scores, Equations (Mathematics), Error of Measurement, Sampling

Model-Based Versus Empirical Equating of Test Forms

Peer reviewed

Direct link

Quenette, Mary A.; Nicewander, W. Alan; Thomasson, Gary L. – Applied Psychological Measurement, 2006

Model-based equating was compared to empirical equating of an Armed Services Vocational Aptitude Battery (ASVAB) test form. The model-based equating was done using item pretest data to derive item response theory (IRT) item parameter estimates for those items that were retained in the final version of the test. The analysis of an ASVAB test form…

Descriptors: Item Response Theory, Multiple Choice Tests, Test Items, Computation

Ordering Power of Separate versus Grouped True-False Tests: Interaction of Type of Test with Knowledge Levels of Examinees.

Peer reviewed

Hsu, Louis M. – Applied Psychological Measurement, 1979

A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)

Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores

A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

Peer reviewed

Direct link

Yao, Lihua; Schwarz, Richard D. – Applied Psychological Measurement, 2006

Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

Descriptors: Models, Item Response Theory, Markov Processes, Monte Carlo Methods

A Psychometric Evaluation of 4-Point and 6-Point Likert-Type Scales in Relation to Reliability and Validity.

Peer reviewed

Chang, Lei – Applied Psychological Measurement, 1994

Reliability and validity of 4-point and 6-point scales were assessed using a new model-based approach to fit empirical data from 165 graduate students completing an attitude measure. Results suggest that the issue of four- versus six-point scales may depend on the empirical setting. (SLD)

Descriptors: Attitude Measures, Goodness of Fit, Graduate Students, Graduate Study

On the Feasibility of Multiple Matching Tests--Variations on a Theme by Gulliksen.

Peer reviewed

Budescu, David V. – Applied Psychological Measurement, 1988

A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)

Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)

Analysis of Differential Item Functioning in Translated Assessment Instruments.

Peer reviewed

Budgell, Glen R.; And Others – Applied Psychological Measurement, 1995

The usefulness of three item response theory-based methods and the Mantel Haenszel technique in evaluating the measurement equivalence of translated assessment instruments was demonstrated in a study involving 2,000 French-speaking Canadian adults who took a French test translation and 2,000 English-speaking adults who took the English original.…

Descriptors: Adults, Chi Square, Cultural Awareness, Culture Fair Tests