Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 5 |
Descriptor
Multiple Choice Tests | 13 |
Test Items | 13 |
Item Response Theory | 4 |
Test Construction | 4 |
Test Reliability | 4 |
Guessing (Tests) | 3 |
Higher Education | 3 |
Models | 3 |
Scoring | 3 |
Test Format | 3 |
Classification | 2 |
More ▼ |
Source
Applied Psychological… | 13 |
Author
Bennett, Randy Elliot | 1 |
Budescu, David V. | 1 |
Chen, Hanwei | 1 |
Corter, James E. | 1 |
Cui, Zhongmin | 1 |
Dorans, Neil J. | 1 |
Fang, Yu | 1 |
Geisinger, Kurt F. | 1 |
He, Yong | 1 |
Hsu, Louis M. | 1 |
Kane, Michael | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 8 |
Reports - Evaluative | 4 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Advanced Placement… | 1 |
Armed Services Vocational… | 1 |
Iowa Tests of Basic Skills | 1 |
What Works Clearinghouse Rating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores
Penfield, Randall D. – Applied Psychological Measurement, 2010
In 2008, Penfield showed that measurement invariance across all response options of a multiple-choice item (correct option and the "J" distractors) can be modeled using a nominal response model that included a differential distractor functioning (DDF) effect for each of the "J" distractors. This article extends this concept to consider how the…
Descriptors: Test Bias, Test Items, Models, Multiple Choice Tests
Lee, Jihyun; Corter, James E. – Applied Psychological Measurement, 2011
Diagnosis of misconceptions or "bugs" in procedural skills is difficult because of their unstable nature. This study addresses this problem by proposing and evaluating a probability-based approach to the diagnosis of bugs in children's multicolumn subtraction performance using Bayesian networks. This approach assumes a causal network relating…
Descriptors: Misconceptions, Probability, Children, Subtraction

Potenza, Maria T.; Dorans, Neil J. – Applied Psychological Measurement, 1995
A classification scheme is presented for procedures to detect differential item functioning (DIF) for dichotomously scored items that is applicable to new DIF procedures for polytomously scored items. A formal development of a polytomous version of a dichotomous DIF technique is presented. (SLD)
Descriptors: Classification, Evaluation Methods, Identification, Item Bias
Sotaridona, Leonardo S.; van der Linden, Wim J.; Meijer, Rob R. – Applied Psychological Measurement, 2006
A statistical test for detecting answer copying on multiple-choice tests based on Cohen's kappa is proposed. The test is free of any assumptions on the response processes of the examinees suspected of copying and having served as the source, except for the usual assumption that these processes are probabilistic. Because the asymptotic null and…
Descriptors: Cheating, Test Items, Simulation, Statistical Analysis

Kane, Michael; Moloney, James – Applied Psychological Measurement, 1978
The answer-until-correct (AUC) procedure requires that examinees respond to a multi-choice item until they answer it correctly. Using a modified version of Horst's model for examinee behavior, this paper compares the effect of guessing on item reliability for the AUC procedure and the zero-one scoring procedure. (Author/CTM)
Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests

Sireci, Stephen G.; Geisinger, Kurt F. – Applied Psychological Measurement, 1992
A new method for evaluating the content representation of a test is illustrated. Item similarity ratings were obtained from three content domain experts to assess whether ratings corresponded to item groupings specified in the test blueprint. Multidimensional scaling and cluster analysis provided substantial information about the test's content…
Descriptors: Cluster Analysis, Content Analysis, Multidimensional Scaling, Multiple Choice Tests
Quenette, Mary A.; Nicewander, W. Alan; Thomasson, Gary L. – Applied Psychological Measurement, 2006
Model-based equating was compared to empirical equating of an Armed Services Vocational Aptitude Battery (ASVAB) test form. The model-based equating was done using item pretest data to derive item response theory (IRT) item parameter estimates for those items that were retained in the final version of the test. The analysis of an ASVAB test form…
Descriptors: Item Response Theory, Multiple Choice Tests, Test Items, Computation

Poizner, Sharon B.; And Others – Applied Psychological Measurement, 1978
Binary, probability, and ordinal scoring procedures for multiple-choice items were examined. In two situations, it was found that both the probability and ordinal scoring systems were more reliable than the binary scoring method. (Author/CTM)
Descriptors: Confidence Testing, Guessing (Tests), Higher Education, Multiple Choice Tests

Hsu, Louis M. – Applied Psychological Measurement, 1979
A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)
Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores

Bennett, Randy Elliot; And Others – Applied Psychological Measurement, 1990
The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Descriptors: College Students, Comparative Testing, Computer Assisted Testing, Computer Science

Samejima, Fumiko – Applied Psychological Measurement, 1994
The Level-11 vocabulary subtest of the Iowa Tests of Basic Skills was analyzed using a two-stage latent trait approach and data set of 2,356 examinees, approximately 11 years of age. It is concluded that the nonparametric approach leads to efficient estimation of the latent trait. (SLD)
Descriptors: Achievement Tests, Distractors (Tests), Elementary Education, Elementary School Students

Budescu, David V. – Applied Psychological Measurement, 1988
A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)
Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)