ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	2

Descriptor

Difficulty Level	12
Test Reliability	12
Test Items	10
Error of Measurement	4
Higher Education	4
Item Analysis	4
Multiple Choice Tests	3
Test Format	3
Test Validity	3
Analysis of Variance	2
Comparative Analysis	2
Mathematical Models	2
Statistical Analysis	2
Test Construction	2
True Scores	2
Ability	1
Achievement Tests	1
Bayesian Statistics	1
Behavior	1
College Entrance Examinations	1
Computer Programs	1
Computer Software	1
Correlation	1
Essay Tests	1
Estimation (Mathematics)	1
More ▼

Source

Educational and Psychological…

Author

Huck, Schuyler W.	2
Aiken, Lewis R.	1
Catts, Ralph M.	1
Cizek, Gregory J.	1
Feldt, Leonard S.	1
Hamby, Tyler	1
Jiayi Deng	1
Joseph A. Rios	1
Lu, K. H.	1
O'Day, Denis M.	1
Robinson, K. Lynne	1
Shoemaker, David M.	1
Straton, Ralph G.	1
Taylor, Wyn	1
Tollefson, Nona	1
Willoughby, T. Lee	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	8
Reports - Evaluative	1

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

United Kingdom (Wales)

Laws, Policies, & Programs

Assessments and Surveys

Rosenberg Self Esteem Scale

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…

Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory

Survey Satisficing Inflates Reliability and Validity Measures: An Experimental Comparison of College and Amazon Mechanical Turk Samples

Peer reviewed

Direct link

Hamby, Tyler; Taylor, Wyn – Educational and Psychological Measurement, 2016

This study examined the predictors and psychometric outcomes of survey satisficing, wherein respondents provide quick, "good enough" answers (satisficing) rather than carefully considered answers (optimizing). We administered surveys to university students and respondents--half of whom held college degrees--from a for-pay survey website,…

Descriptors: Surveys, Test Reliability, Test Validity, Comparative Analysis

Handling "Tied Items" When Using Lu's Method of Reliability Estimation

Peer reviewed

Huck, Schuyler W. – Educational and Psychological Measurement, 1978

A modification of Hoyt's analysis of variance model for test analysis was proposed by Lu. A difficulty that may be encountered in using Lu's modification is examined, and a solution is proposed. (JKS)

Descriptors: Analysis of Variance, Difficulty Level, Item Analysis, Test Items

Statistical Control of "Impurity" in the Estimation of Test Reliability

Peer reviewed

Lu, K. H. – Educational and Psychological Measurement, 1971

Descriptors: Difficulty Level, Statistical Analysis, Statistical Significance, Test Items

Standard Errors of Estimate in Item-Examinee Sampling as a Function of Test Reliability, Variation in Item Difficulty Indices and Degree of Skewness in the Normative Distribution

Peer reviewed

Shoemaker, David M. – Educational and Psychological Measurement, 1972

Descriptors: Difficulty Level, Error of Measurement, Item Sampling, Simulation

A Comparison of the Item Difficulty and Item Discrimination of Multiple-Choice Items Using the "None of the Above" and One Correct Response Options.

Peer reviewed

Tollefson, Nona – Educational and Psychological Measurement, 1987

This study compared the item difficulty, item discrimination, and test reliability of three forms of multiple-choice items: (1) one correct answer; (2) "none of the above" as a foil; and (3) "none of the above" as the correct answer. Twelve items in the three formats were administered in a college statistics examination. (BS)

Descriptors: Difficulty Level, Higher Education, Item Analysis, Multiple Choice Tests

Nonfunctioning Options: A Closer Look.

Peer reviewed

Cizek, Gregory J.; Robinson, K. Lynne; O'Day, Denis M. – Educational and Psychological Measurement, 1998

The effect of removing nonfunctioning items from multiple-choice tests was studied by examining change in difficulty, discrimination, and dimensionality. Results provide additional support for the benefits of eliminating nonfunctioning options, such as enhanced score reliability, reduced testing time, potential for broader domain sampling, and…

Descriptors: Difficulty Level, Multiple Choice Tests, Sampling, Scores

Some Relationships between the Binomial Error Model and Classical Test Theory.

Peer reviewed

Feldt, Leonard S. – Educational and Psychological Measurement, 1984

The binomial error model includes form-to-form difficulty differences as error variance and leads to Ruder-Richardson formula 21 as an estimate of reliability. If the form-to-form component is removed from the estimate of error variance, the binomial model leads to KR 20 as the reliability estimate. (Author/BW)

Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Mathematical Formulas

Reliability and Validity of a Priori Estimates of Item Characteristics for an Examination of Health Science Information.

Peer reviewed

Willoughby, T. Lee – Educational and Psychological Measurement, 1980

The reliability and validity of a priori estimates of item characteristics are assessed. Results suggest that judges can make a modest contribution to estimation prior to actual administration. (Author/GK)

Descriptors: Difficulty Level, Higher Education, Item Analysis, Medical School Faculty

A Comparison of Two, Three and Four-Choice Item Tests Given a Fixed Total Number of Choices.

Peer reviewed

Straton, Ralph G.; Catts, Ralph M. – Educational and Psychological Measurement, 1980

Multiple-choice tests composed entirely of two-, three-, or four-choice items were investigated. Results indicated that number of alternatives per item was inversely related to item difficulty, but directly related to item discrimination. Reliability and standard error of measurement of three-choice item tests was equivalent or superior.…

Descriptors: Difficulty Level, Error of Measurement, Foreign Countries, Higher Education

An Empirical Investigation of Lu's Method of Reliability Estimation.

Peer reviewed

Huck, Schuyler W.; And Others – Educational and Psychological Measurement, 1981

Believing that examinee-by-item interaction should be conceptualized as true score variability rather than as a result of errors of measurement, Lu proposed a modification of Hoyt's analysis of variance reliability procedure. Via a computer simulation study, it is shown that Lu's approach does not separate interaction from error. (Author/RL)

Descriptors: Analysis of Variance, Comparative Analysis, Computer Programs, Difficulty Level

Analyzing Optional Test Items.

Peer reviewed

Aiken, Lewis R. – Educational and Psychological Measurement, 1989

Two alternatives to traditional item analysis and reliability estimation procedures are considered for determining the difficulty, discrimination, and reliability of optional items on essay and other tests. A computer program to compute these measures is described, and illustrations are given. (SLD)

Descriptors: College Entrance Examinations, Computer Software, Difficulty Level, Essay Tests