ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	5

Descriptor

Multiple Choice Tests	13
Test Items	13
Item Response Theory	4
Test Construction	4
Test Reliability	4
Guessing (Tests)	3
Higher Education	3
Models	3
Scoring	3
Test Format	3
Classification	2
Comparative Analysis	2
Equated Scores	2
Evaluation Methods	2
Probability	2
Scoring Formulas	2
Statistical Analysis	2
Vocabulary	2
Academic Ability	1
Achievement Tests	1
Bayesian Statistics	1
Behavior	1
Cheating	1
Children	1
Cluster Analysis	1
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	11
Reports - Research	8
Reports - Evaluative	4
Speeches/Meeting Papers	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Armed Services Vocational…	1
Iowa Tests of Basic Skills	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Peer reviewed

Direct link

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores

Modeling DIF Effects Using Distractor-Level Invariance Effects: Implications for Understanding the Causes of DIF

Peer reviewed

Direct link

Penfield, Randall D. – Applied Psychological Measurement, 2010

In 2008, Penfield showed that measurement invariance across all response options of a multiple-choice item (correct option and the "J" distractors) can be modeled using a nominal response model that included a differential distractor functioning (DDF) effect for each of the "J" distractors. This article extends this concept to consider how the…

Descriptors: Test Bias, Test Items, Models, Multiple Choice Tests

Diagnosis of Subtraction Bugs Using Bayesian Networks

Peer reviewed

Direct link

Lee, Jihyun; Corter, James E. – Applied Psychological Measurement, 2011

Diagnosis of misconceptions or "bugs" in procedural skills is difficult because of their unstable nature. This study addresses this problem by proposing and evaluating a probability-based approach to the diagnosis of bugs in children's multicolumn subtraction performance using Bayesian networks. This approach assumes a causal network relating…

Descriptors: Misconceptions, Probability, Children, Subtraction

DIF Assessment for Polytomously Scored Items: A Framework for Classification and Evaluation.

Peer reviewed

Potenza, Maria T.; Dorans, Neil J. – Applied Psychological Measurement, 1995

A classification scheme is presented for procedures to detect differential item functioning (DIF) for dichotomously scored items that is applicable to new DIF procedures for polytomously scored items. A formal development of a polytomous version of a dichotomous DIF technique is presented. (SLD)

Descriptors: Classification, Evaluation Methods, Identification, Item Bias

Detecting Answer Copying Using the Kappa Statistic

Peer reviewed

Direct link

Sotaridona, Leonardo S.; van der Linden, Wim J.; Meijer, Rob R. – Applied Psychological Measurement, 2006

A statistical test for detecting answer copying on multiple-choice tests based on Cohen's kappa is proposed. The test is free of any assumptions on the response processes of the examinees suspected of copying and having served as the source, except for the usual assumption that these processes are probabilistic. Because the asymptotic null and…

Descriptors: Cheating, Test Items, Simulation, Statistical Analysis

The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring

Peer reviewed

Kane, Michael; Moloney, James – Applied Psychological Measurement, 1978

The answer-until-correct (AUC) procedure requires that examinees respond to a multi-choice item until they answer it correctly. Using a modified version of Horst's model for examinee behavior, this paper compares the effect of guessing on item reliability for the AUC procedure and the zero-one scoring procedure. (Author/CTM)

Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests

Analyzing Test Content Using Cluster Analysis and Multidimensional Scaling.

Peer reviewed

Sireci, Stephen G.; Geisinger, Kurt F. – Applied Psychological Measurement, 1992

A new method for evaluating the content representation of a test is illustrated. Item similarity ratings were obtained from three content domain experts to assess whether ratings corresponded to item groupings specified in the test blueprint. Multidimensional scaling and cluster analysis provided substantial information about the test's content…

Descriptors: Cluster Analysis, Content Analysis, Multidimensional Scaling, Multiple Choice Tests

Model-Based Versus Empirical Equating of Test Forms

Peer reviewed

Direct link

Quenette, Mary A.; Nicewander, W. Alan; Thomasson, Gary L. – Applied Psychological Measurement, 2006

Model-based equating was compared to empirical equating of an Armed Services Vocational Aptitude Battery (ASVAB) test form. The model-based equating was done using item pretest data to derive item response theory (IRT) item parameter estimates for those items that were retained in the final version of the test. The analysis of an ASVAB test form…

Descriptors: Item Response Theory, Multiple Choice Tests, Test Items, Computation

Alternative Response and Scoring Methods for Multiple Choice Items: An Empirical Study of Probabilistic and Ordinal Response Modes

Peer reviewed

Poizner, Sharon B.; And Others – Applied Psychological Measurement, 1978

Binary, probability, and ordinal scoring procedures for multiple-choice items were examined. In two situations, it was found that both the probability and ordinal scoring systems were more reliable than the binary scoring method. (Author/CTM)

Descriptors: Confidence Testing, Guessing (Tests), Higher Education, Multiple Choice Tests

Ordering Power of Separate versus Grouped True-False Tests: Interaction of Type of Test with Knowledge Levels of Examinees.

Peer reviewed

Hsu, Louis M. – Applied Psychological Measurement, 1979

A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)

Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores

The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

Peer reviewed

Bennett, Randy Elliot; And Others – Applied Psychological Measurement, 1990

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)

Descriptors: College Students, Comparative Testing, Computer Assisted Testing, Computer Science

Nonparametric Estimation of the Plausibility Functions of the Distractors of Vocabulary Test Items.

Peer reviewed

Samejima, Fumiko – Applied Psychological Measurement, 1994

The Level-11 vocabulary subtest of the Iowa Tests of Basic Skills was analyzed using a two-stage latent trait approach and data set of 2,356 examinees, approximately 11 years of age. It is concluded that the nonparametric approach leads to efficient estimation of the latent trait. (SLD)

Descriptors: Achievement Tests, Distractors (Tests), Elementary Education, Elementary School Students

On the Feasibility of Multiple Matching Tests--Variations on a Theme by Gulliksen.

Peer reviewed

Budescu, David V. – Applied Psychological Measurement, 1988

A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)

Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)

Bennett, Randy Elliot	1
Budescu, David V.	1
Chen, Hanwei	1
Corter, James E.	1
Cui, Zhongmin	1
Dorans, Neil J.	1
Fang, Yu	1
Geisinger, Kurt F.	1
He, Yong	1
Hsu, Louis M.	1
Kane, Michael	1
Lee, Jihyun	1
Meijer, Rob R.	1
Moloney, James	1
Nicewander, W. Alan	1
Penfield, Randall D.	1
Poizner, Sharon B.	1
Potenza, Maria T.	1
Quenette, Mary A.	1
Samejima, Fumiko	1
Sireci, Stephen G.	1
Sotaridona, Leonardo S.	1
Thomasson, Gary L.	1
van der Linden, Wim J.	1
More ▼