ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	5

Descriptor

Test Construction	13
Test Format	13
Item Response Theory	6
Item Banks	4
Multiple Choice Tests	4
Test Items	4
Test Reliability	4
Equated Scores	3
Higher Education	3
Models	3
Adaptive Testing	2
Computer Assisted Testing	2
Monte Carlo Methods	2
Rating Scales	2
Semantic Differential	2
Test Validity	2
Academic Ability	1
Achievement Tests	1
Algorithms	1
Analysis of Variance	1
Automation	1
Behavior	1
Behavior Rating Scales	1
Classification	1
College Applicants	1
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	13
Reports - Research	8
Reports - Evaluative	6
Tests/Questionnaires	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Armed Services Vocational…	1
California Learning…	1
Law School Admission Test	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Integrating Test-Form Formatting into Automated Test Assembly

Peer reviewed

Direct link

Diao, Qi; van der Linden, Wim J. – Applied Psychological Measurement, 2013

Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…

Descriptors: Automation, Test Construction, Test Format, Item Banks

The Potential Impact of Not Being Able to Create Parallel Tests on Expected Classification Accuracy

Peer reviewed

Direct link

Wyse, Adam E. – Applied Psychological Measurement, 2011

In many practical testing situations, alternate test forms from the same testing program are not strictly parallel to each other and instead the test forms exhibit small psychometric differences. This article investigates the potential practical impact that these small psychometric differences can have on expected classification accuracy. Ten…

Descriptors: Test Format, Test Construction, Testing Programs, Psychometrics

A Monte Carlo Approach to the Design, Assembly, and Evaluation of Multistage Adaptive Tests

Peer reviewed

Direct link

Belov, Dmitry I.; Armstrong, Ronald D. – Applied Psychological Measurement, 2008

This article presents an application of Monte Carlo methods for developing and assembling multistage adaptive tests (MSTs). A major advantage of the Monte Carlo assembly over other approaches (e.g., integer programming or enumerative heuristics) is that it provides a uniform sampling from all MSTs (or MST paths) available from a given item pool.…

Descriptors: Monte Carlo Methods, Adaptive Testing, Sampling, Item Response Theory

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating.

Peer reviewed

Wang, Tianyou; Kolen, Michael J. – Applied Psychological Measurement, 1996

A quadratic curve test equating method for equating different test forms under a random-groups data collection design is proposed that equates the first three central moments of the test forms. When applied to real test data, the method performs as well as other equating methods. Procedures from implementing the test are described. (SLD)

Descriptors: Data Collection, Equated Scores, Standardized Tests, Test Construction

IRT Test Assembly Using Network-Flow Programming.

Peer reviewed

Armstrong, Ronald D.; Jones, Douglas H.; Kunce, Charles S. – Applied Psychological Measurement, 1998

Investigated the use of mathematical programming techniques to generate parallel test forms with passages and items based on item-response theory (IRT) using the Fundamentals of Engineering Examination. Generated four parallel test forms from the item bank of almost 1,100 items. Comparison with human-generated forms supports the mathematical…

Descriptors: Engineering, Item Banks, Item Response Theory, Test Construction

A General Approach to Algorithmic Design of Fixed-Form Tests, Adaptive Tests, and Testlets.

Peer reviewed

Berger, Martijn P. F. – Applied Psychological Measurement, 1994

This paper focuses on similarities of optimal design of fixed-form tests, adaptive tests, and testlets within the framework of the general theory of optimal designs. A sequential design procedure is proposed that uses these similarities to obtain consistent estimates for the trait level distribution. (SLD)

Descriptors: Achievement Tests, Adaptive Testing, Algorithms, Estimation (Mathematics)

Complex Composites: Issues That Arise in Combining Different Modes of Assessment.

Peer reviewed

Wilson, Mark; Wang, Wen-chung – Applied Psychological Measurement, 1995

Data from the California Learning Assessment System mathematics assessment were used to examine issues that arise when scores from different assessment modes are combined. Multiple-choice, open-ended, and investigation items were combined in a test across three test forms. Results illustrate the difficulties faced in evaluating combined…

Descriptors: Educational Assessment, Equated Scores, Evaluation Methods, Item Response Theory

Model-Based Versus Empirical Equating of Test Forms

Peer reviewed

Direct link

Quenette, Mary A.; Nicewander, W. Alan; Thomasson, Gary L. – Applied Psychological Measurement, 2006

Model-based equating was compared to empirical equating of an Armed Services Vocational Aptitude Battery (ASVAB) test form. The model-based equating was done using item pretest data to derive item response theory (IRT) item parameter estimates for those items that were retained in the final version of the test. The analysis of an ASVAB test form…

Descriptors: Item Response Theory, Multiple Choice Tests, Test Items, Computation

Ordering Power of Separate versus Grouped True-False Tests: Interaction of Type of Test with Knowledge Levels of Examinees.

Peer reviewed

Hsu, Louis M. – Applied Psychological Measurement, 1979

A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)

Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores

A Multidimensional Partial Credit Model with Associated Item and Test Statistics: An Application to Mixed-Format Tests

Peer reviewed

Direct link

Yao, Lihua; Schwarz, Richard D. – Applied Psychological Measurement, 2006

Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…

Descriptors: Models, Item Response Theory, Markov Processes, Monte Carlo Methods

Scaling Behavioral Anchors.

Peer reviewed

Barnes, Janet L.; Landy, Frank J. – Applied Psychological Measurement, 1979

Although behaviorally anchored rating scales have both intuitive and empirical appeal, they have not always yielded superior results in contrast with graphic rating scales. Results indicate that the choice of an anchoring procedure will depend on the nature of the actual rating process. (Author/JKS)

Descriptors: Behavior Rating Scales, Comparative Testing, Higher Education, Rating Scales

An Examination of Methodological Issues Relevant to the Use and Interpretation of the Semantic Differential.

Peer reviewed

And Others; Mann, Irene T. – Applied Psychological Measurement, 1979

Several methodological problems (particularly the assumed bipolarity of scales, instructions regarding use of the midpoint, and concept-scale interaction) which may contribute to a lack of precision in the semantic differential technique were investigated. Results generally supported the use of the semantic differential. (Author/JKS)

Descriptors: Analysis of Variance, Computer Assisted Testing, Higher Education, Rating Scales

On the Feasibility of Multiple Matching Tests--Variations on a Theme by Gulliksen.

Peer reviewed

Budescu, David V. – Applied Psychological Measurement, 1988

A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)

Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)

Armstrong, Ronald D.	2
Barnes, Janet L.	1
Belov, Dmitry I.	1
Berger, Martijn P. F.	1
Budescu, David V.	1
Diao, Qi	1
Hsu, Louis M.	1
Jones, Douglas H.	1
Kolen, Michael J.	1
Kunce, Charles S.	1
Landy, Frank J.	1
Mann, Irene T.	1
Nicewander, W. Alan	1
Quenette, Mary A.	1
Schwarz, Richard D.	1
Thomasson, Gary L.	1
Wang, Tianyou	1
Wang, Wen-chung	1
Wilson, Mark	1
Wyse, Adam E.	1
Yao, Lihua	1
van der Linden, Wim J.	1
More ▼