NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Diao, Qi; van der Linden, Wim J. – Applied Psychological Measurement, 2013
Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
Descriptors: Automation, Test Construction, Test Format, Item Banks
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Applied Psychological Measurement, 2011
In many practical testing situations, alternate test forms from the same testing program are not strictly parallel to each other and instead the test forms exhibit small psychometric differences. This article investigates the potential practical impact that these small psychometric differences can have on expected classification accuracy. Ten…
Descriptors: Test Format, Test Construction, Testing Programs, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Belov, Dmitry I.; Armstrong, Ronald D. – Applied Psychological Measurement, 2008
This article presents an application of Monte Carlo methods for developing and assembling multistage adaptive tests (MSTs). A major advantage of the Monte Carlo assembly over other approaches (e.g., integer programming or enumerative heuristics) is that it provides a uniform sampling from all MSTs (or MST paths) available from a given item pool.…
Descriptors: Monte Carlo Methods, Adaptive Testing, Sampling, Item Response Theory
Peer reviewed Peer reviewed
Wang, Tianyou; Kolen, Michael J. – Applied Psychological Measurement, 1996
A quadratic curve test equating method for equating different test forms under a random-groups data collection design is proposed that equates the first three central moments of the test forms. When applied to real test data, the method performs as well as other equating methods. Procedures from implementing the test are described. (SLD)
Descriptors: Data Collection, Equated Scores, Standardized Tests, Test Construction
Peer reviewed Peer reviewed
Armstrong, Ronald D.; Jones, Douglas H.; Kunce, Charles S. – Applied Psychological Measurement, 1998
Investigated the use of mathematical programming techniques to generate parallel test forms with passages and items based on item-response theory (IRT) using the Fundamentals of Engineering Examination. Generated four parallel test forms from the item bank of almost 1,100 items. Comparison with human-generated forms supports the mathematical…
Descriptors: Engineering, Item Banks, Item Response Theory, Test Construction
Peer reviewed Peer reviewed
Berger, Martijn P. F. – Applied Psychological Measurement, 1994
This paper focuses on similarities of optimal design of fixed-form tests, adaptive tests, and testlets within the framework of the general theory of optimal designs. A sequential design procedure is proposed that uses these similarities to obtain consistent estimates for the trait level distribution. (SLD)
Descriptors: Achievement Tests, Adaptive Testing, Algorithms, Estimation (Mathematics)
Peer reviewed Peer reviewed
Wilson, Mark; Wang, Wen-chung – Applied Psychological Measurement, 1995
Data from the California Learning Assessment System mathematics assessment were used to examine issues that arise when scores from different assessment modes are combined. Multiple-choice, open-ended, and investigation items were combined in a test across three test forms. Results illustrate the difficulties faced in evaluating combined…
Descriptors: Educational Assessment, Equated Scores, Evaluation Methods, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Quenette, Mary A.; Nicewander, W. Alan; Thomasson, Gary L. – Applied Psychological Measurement, 2006
Model-based equating was compared to empirical equating of an Armed Services Vocational Aptitude Battery (ASVAB) test form. The model-based equating was done using item pretest data to derive item response theory (IRT) item parameter estimates for those items that were retained in the final version of the test. The analysis of an ASVAB test form…
Descriptors: Item Response Theory, Multiple Choice Tests, Test Items, Computation
Peer reviewed Peer reviewed
Hsu, Louis M. – Applied Psychological Measurement, 1979
A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)
Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Yao, Lihua; Schwarz, Richard D. – Applied Psychological Measurement, 2006
Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…
Descriptors: Models, Item Response Theory, Markov Processes, Monte Carlo Methods
Peer reviewed Peer reviewed
Barnes, Janet L.; Landy, Frank J. – Applied Psychological Measurement, 1979
Although behaviorally anchored rating scales have both intuitive and empirical appeal, they have not always yielded superior results in contrast with graphic rating scales. Results indicate that the choice of an anchoring procedure will depend on the nature of the actual rating process. (Author/JKS)
Descriptors: Behavior Rating Scales, Comparative Testing, Higher Education, Rating Scales
Peer reviewed Peer reviewed
And Others; Mann, Irene T. – Applied Psychological Measurement, 1979
Several methodological problems (particularly the assumed bipolarity of scales, instructions regarding use of the midpoint, and concept-scale interaction) which may contribute to a lack of precision in the semantic differential technique were investigated. Results generally supported the use of the semantic differential. (Author/JKS)
Descriptors: Analysis of Variance, Computer Assisted Testing, Higher Education, Rating Scales
Peer reviewed Peer reviewed
Budescu, David V. – Applied Psychological Measurement, 1988
A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)
Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)