ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	2

Descriptor

Difficulty Level	12
Item Analysis	12
Test Length	12
Test Items	9
Test Construction	5
Simulation	4
Latent Trait Theory	3
Scoring	3
Statistical Analysis	3
Test Reliability	3
Testing	3
Achievement Tests	2
Adaptive Testing	2
Computer Assisted Testing	2
Estimation (Mathematics)	2
Higher Education	2
Item Banks	2
Item Response Theory	2
Mathematical Models	2
Matrices	2
Monte Carlo Methods	2
Sampling	2
Statistical Studies	2
Test Bias	2
Test Validity	2
More ▼

Source

ETS Research Report Series	1
Educational Measurement:…	1
Educational and Psychological…	1

Author

Hambleton, Ronald K.	2
Arikan, Serkan	1
Aybek, Eren Can	1
Cliff, Norman	1
Cook, Linda L.	1
Dorans, Neil J.	1
Forsyth, Robert A.	1
Guo, Hongwen	1
Harris, Dickie A.	1
Lu, Ru	1
Melican, Gerald J.	1
Penell, Roger J.	1
Plake, Barbara S.	1
Robertson, David W.	1
Samejima, Fumiko	1
Scheetz, James P.	1
Wainer, Howard	1
More ▼

Publication Type

Reports - Research	11
Journal Articles	3
Speeches/Meeting Papers	2
Guides - Non-Classroom	1

Education Level

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	1
Stanford Binet Intelligence…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

A Special Case of Brennan's Index for Tests That Aim to Select a Limited Number of Students: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Arikan, Serkan; Aybek, Eren Can – Educational Measurement: Issues and Practice, 2022

Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number…

Descriptors: Monte Carlo Methods, Item Analysis, Correlation, Individual Differences

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Simulated and Empirical Studies of Flexilevel Testing in Air Force Technical Training Courses. Final Report for Period 1 May 1975-30 April 1977.

Harris, Dickie A.; Penell, Roger J. – 1977

This study used a series of simulations to answer questions about the efficacy of adaptive testing raised by empirical studies. The first study showed that for reasonable high entry points, parameters estimated from paper-and-pencil test protocols cross-validated remarkably well to groups actually tested at a computer terminal. This suggested that…

Descriptors: Adaptive Testing, Computer Assisted Testing, Cost Effectiveness, Difficulty Level

Some Results on the Robustness of Latent Trait Models.

Download full text

Hambleton, Ronald K.; Cook, Linda L. – 1978

The purpose of the present research was to study, systematically, the "goodness-of-fit" of the one-, two-, and three-parameter logistic models. We studied, using computer-simulated test data, the effects of four variables: variation in item discrimination parameters, the average value of the pseudo-chance level parameters, test length,…

Descriptors: Career Development, Difficulty Level, Goodness of Fit, Item Analysis

Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.

Wainer, Howard – 1985

It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…

Descriptors: College Entrance Examinations, Difficulty Level, Estimation (Mathematics), High Schools

A Comparison of Simple Random Sampling Versus Stratification for Allocating Items to Subtests in Multiple Matrix Sampling.

Download full text

Scheetz, James P.; Forsyth, Robert A. – 1977

Empirical evidence is presented related to the effects of using a stratified sampling of items in multiple matrix sampling on the accuracy of estimates of the population mean. Data were obtained from a sample of 600 high school students for a 36-item mathematics test and a 40-item vocabulary test, both subtests of the Iowa Tests of Educational…

Descriptors: Achievement Tests, Difficulty Level, Item Analysis, Item Sampling

Comparative Racial Analysis of Enlisted Advancement Exams: Item Differentiation. Final Report.

Download full text

Robertson, David W.; And Others – 1977

A comparative study of item analysis was conducted on the basis of race to determine whether alternative test construction or processing might increase the proportion of black enlisted personnel among those passing various military technical knowledge examinations. The study used data from six specialists at four grade levels and investigated item…

Descriptors: Difficulty Level, Enlisted Personnel, Item Analysis, Occupational Tests

Optimal Item Selection with Credentialing Examinations.

Download full text

Hambleton, Ronald K.; And Others – 1987

The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…

Descriptors: Comparative Analysis, Content Validity, Cutting Scores, Difficulty Level

Effect of the Guessing Parameter on the Estimation of the Item Discrimination and Difficulty Parameters When Three-Parameter Logistic Model Is Assumed.

Samejima, Fumiko – 1986

Item analysis data fitting the normal ogive model were simulated in order to investigate the problems encountered when applying the three-parameter logistic model. Binary item tests containing 10 and 35 items were created, and Monte Carlo methods simulated the responses of 2,000 and 500 examinees. Item parameters were obtained using Logist 5.…

Descriptors: Computer Simulation, Difficulty Level, Guessing (Tests), Item Analysis

Effects of Item Context on Intrajudge Consistency of Expert Judgments via the Nedelsky Standard Setting Method.

Peer reviewed

Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989

The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…

Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education

Manual for the USES Basic Occupational Literacy Test. Section 2: Development.

PDF pending restoration

Manpower Administration (DOL), Washington, DC. – 1972

The Basic Occupational Literacy Test (BOLT) was developed as an achievement test of basic skills in reading and arithmetic, for educationally disadvantaged adults. The objective was to develop a test appropriate for this population with regard to content, format, instructions, timing, norms, and difficulty level. A major issue, the use of grade…

Descriptors: Achievement Tests, Adult Basic Education, Adults, Basic Skills

Evaluations of Implied Orders as a Basis for Tailored Testing Using Simulations. Technical Report No. 4.

Cliff, Norman; And Others – 1977

TAILOR is a computer program that uses the implied orders concept as the basis for computerized adaptive testing. The basic characteristics of TAILOR, which does not involve pretesting, are reviewed here and two studies of it are reported. One is a Monte Carlo simulation based on the four-parameter Birnbaum model and the other uses a matrix of…

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Difficulty Level