ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	3

Descriptor

Test Items	13
Test Reliability	13
Higher Education	5
Item Analysis	5
Test Validity	5
Item Response Theory	4
Multiple Choice Tests	4
Scores	4
Scoring	4
Scoring Formulas	4
Test Construction	4
Difficulty Level	3
Guessing (Tests)	3
Models	3
Psychometrics	3
Simulation	3
Weighted Scores	3
Error of Measurement	2
Technical Reports	2
Test Format	2
Test Length	2
Academic Ability	1
Achievement Tests	1
Adaptive Testing	1
Answer Keys	1
More ▼

Source

Applied Psychological…

Author

Bejar, Isaac I.	1
Budescu, David V.	1
Claudy, John G.	1
Downey, Ronald G.	1
Embretson, Susan E.	1
Goh, David S.	1
Gorin, Joanna S.	1
Hsu, Louis M.	1
Johnson, Richard W.	1
Kane, Michael	1
Lee, Won-Chan	1
Meijer, Rob R.	1
Moloney, James	1
Poizner, Sharon B.	1
Yao, Lihua	1
Yocom, Peter	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	5
Reports - Evaluative	4
Reports - Descriptive	2

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Armed Forces Qualification…	1
Graduate Record Examinations	1
Hidden Figures Test	1
Strong Campbell Interest…	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Multinomial and Compound Multinomial Error Models for Tests with Complex Item Scoring

Peer reviewed

Direct link

Lee, Won-Chan – Applied Psychological Measurement, 2007

This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…

Descriptors: Simulation, Error of Measurement, Scoring, Test Items

Biserial Weights: A New Approach to Test Item Option Weighting

Peer reviewed

Claudy, John G. – Applied Psychological Measurement, 1978

Option weighting is an alternative to increasing test length as a means of improving the reliability of a test. The effects on test reliability of option weighting procedures were compared in two empirical studies using four independent sets of items. Biserial weights were found to be superior. (Author/CTM)

Descriptors: Higher Education, Item Analysis, Scoring Formulas, Test Items

The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring

Peer reviewed

Kane, Michael; Moloney, James – Applied Psychological Measurement, 1978

The answer-until-correct (AUC) procedure requires that examinees respond to a multi-choice item until they answer it correctly. Using a modified version of Horst's model for examinee behavior, this paper compares the effect of guessing on item reliability for the AUC procedure and the zero-one scoring procedure. (Author/CTM)

Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests

Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement.

Peer reviewed

Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994

The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)

Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics

Alternative Response and Scoring Methods for Multiple Choice Items: An Empirical Study of Probabilistic and Ordinal Response Modes

Peer reviewed

Poizner, Sharon B.; And Others – Applied Psychological Measurement, 1978

Binary, probability, and ordinal scoring procedures for multiple-choice items were examined. In two situations, it was found that both the probability and ordinal scoring systems were more reliable than the binary scoring method. (Author/CTM)

Descriptors: Confidence Testing, Guessing (Tests), Higher Education, Multiple Choice Tests

Ordering Power of Separate versus Grouped True-False Tests: Interaction of Type of Test with Knowledge Levels of Examinees.

Peer reviewed

Hsu, Louis M. – Applied Psychological Measurement, 1979

A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)

Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores

Empirical versus Random Item Selection in the Design of Intelligence Test Short Forms--The WISC-R Example.

Peer reviewed

Goh, David S. – Applied Psychological Measurement, 1979

The advantages of using psychometric thoery to design short forms of intelligence tests are demonstrated by comparing such usage to a systematic random procedure that has previously been used. The Wechsler Intelligence Scale for Children Revised (WISC-R) Short Form is presented as an example. (JKS)

Descriptors: Elementary Secondary Education, Intelligence Tests, Item Analysis, Psychometrics

A "Unisex" Occupational Scale for the Strong-Campbell Interest Inventory.

Peer reviewed

Johnson, Richard W. – Applied Psychological Measurement, 1979

Strong-Campbell Interest Inventory items which differentiated between males and females by more than nine percentage points were removed in an attempt to develop a unisex occupational scale for pharmacists. The remaining items formed a unisex scale nearly as reliable and valid as the original, over short term periods. (MH)

Descriptors: Females, Graduate Students, Higher Education, Interest Inventories

A Generative Approach to the Modeling of Isomorphic Hidden-Figure Items.

Peer reviewed

Bejar, Isaac I.; Yocom, Peter – Applied Psychological Measurement, 1991

An approach to test modeling is illustrated that encompasses both response consistency and response difficulty. This generative approach makes validation an ongoing process. An analysis of hidden figure items with 60 high school students supports the feasibility of the method. (SLD)

Descriptors: Construct Validity, Difficulty Level, Evaluation Methods, High School Students

Item Difficulty Modeling of Paragraph Comprehension Items

Peer reviewed

Direct link

Gorin, Joanna S.; Embretson, Susan E. – Applied Psychological Measurement, 2006

Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…

Descriptors: Difficulty Level, Test Items, Modeling (Psychology), Paragraph Composition

On the Feasibility of Multiple Matching Tests--Variations on a Theme by Gulliksen.

Peer reviewed

Budescu, David V. – Applied Psychological Measurement, 1988

A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)

Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)

Item-Option Weighting of Achievement Tests: Comparative Study of Methods.

Peer reviewed

Downey, Ronald G. – Applied Psychological Measurement, 1979

This research attempted to interrelate several methods of producing option weights (i.e., Guttman internal and external weights and judges' weights) and examined their effects on reliability and on concurrent, predictive, and face validity. It was concluded that option weighting offered limited, if any, improvement over unit weighting. (Author/CTM)

Descriptors: Achievement Tests, Answer Keys, Comparative Testing, High Schools