ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	5

Descriptor

Difficulty Level	10
Test Construction	10
Test Items	8
Error of Measurement	3
Item Response Theory	3
Multiple Choice Tests	3
Psychometrics	3
Cutting Scores	2
Equated Scores	2
Scores	2
Scoring	2
Simulation	2
Standard Setting	2
Test Bias	2
Ability	1
Achievement Tests	1
Attitude Measures	1
Black Students	1
Cognitive Processes	1
College Entrance Examinations	1
College Students	1
Comparative Analysis	1
Comparative Testing	1
Computation	1
Computer Assisted Testing	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	10
Reports - Research	7
Reports - Evaluative	3
Speeches/Meeting Papers	1

Education Level

Elementary Education	1
Grade 5	1
High Schools	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing all 10 results Save | Export

The Effect of Anchor Test Construction on Scale Drift

Peer reviewed

Direct link

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…

Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory

Estimating Non-Normal Latent Trait Distributions within Item Response Theory Using True and Estimated Item Parameters

Peer reviewed

Direct link

Sass, D. A.; Schmitt, T. A.; Walker, C. M. – Applied Measurement in Education, 2008

Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal…

Descriptors: Difficulty Level, Item Response Theory, Test Items, Computation

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Peer reviewed

Direct link

Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…

Descriptors: Test Items, Test Content, Testing Programs, Simulation

A Qualitative Investigation of Panelists' Experiences of Standard Setting Using Two Variations of the Bookmark Method

Peer reviewed

Direct link

Hein, Serge F.; Skaggs, Gary E. – Applied Measurement in Education, 2009

Only a small number of qualitative studies have investigated panelists' experiences during standard-setting activities or the thought processes associated with panelists' actions. This qualitative study involved an examination of the experiences of 11 panelists who participated in a prior, one-day standard-setting meeting in which either the…

Descriptors: Focus Groups, Standard Setting, Cutting Scores, Cognitive Processes

Peer reviewed

Direct link

Ascalon, M. Evelina; Meyers, Lawrence S.; Davis, Bruce W.; Smits, Niels – Applied Measurement in Education, 2007

This article examined two item-writing guidelines: the format of the item stem and homogeneity of the answer set. Answering the call of Haladyna, Downing, and Rodriguez (2002) for empirical tests of item writing guidelines and extending the work of Smith and Smith (1988) on differential use of item characteristics, a mock multiple-choice driver's…

Descriptors: Guidelines, Difficulty Level, Standard Setting, Driver Education

Relations between Observed Item Difficulty Levels and Angoff Minimum Passing Levels for a Group of Borderline Examinees.

Peer reviewed

Goodwin, Laura D. – Applied Measurement in Education, 1999

The relations between Angoff ratings (minimum passing levels) and the actual "p" values for borderline examinees were studied with 115 examinees taking the Certified Financial Planner examination. Findings do not suggest that the Angoff judges' task is nearly impossible, but they do suggest the need to improve standard-setting…

Descriptors: Cutting Scores, Difficulty Level, Judges, Licensing Examinations (Professions)

Experiences in the Application of Item Response Theory in Test Construction.

Peer reviewed

Green, Donald Ross; And Others – Applied Measurement in Education, 1989

Potential benefits of using item response theory in test construction are evaluated using the experience and evidence accumulated during nine years of using a three-parameter model in the development of major achievement batteries. Topics addressed include error of measurement, test equating, item bias, and item difficulty. (TJH)

Descriptors: Achievement Tests, Computer Assisted Testing, Difficulty Level, Equated Scores

The Relationship between the Distribution of Item Difficulties and Test Reliability.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1993

The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)

Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models

The None-of-the-Above Option: An Empirical Study.

Peer reviewed

Frary, Robert B. – Applied Measurement in Education, 1991

The use of the "none-of-the-above" option (NOTA) in 20 college-level multiple-choice tests was evaluated for classes with 100 or more students. Eight academic disciplines were represented, and 295 NOTA and 724 regular test items were used. It appears that the NOTA can be compatible with good classroom measurement. (TJH)

Descriptors: College Students, Comparative Testing, Difficulty Level, Discriminant Analysis

Does the Use of Test Assembly Procedures Proposed in Legislation Make Any Difference in Test Properties and in the Test Performance of Black and White Test Takers?

Peer reviewed

Marco, Gary L. – Applied Measurement in Education, 1988

Four simulated mathematical and verbal test forms were produced by test assembly procedures proposed in legislative bills in California and New York in 1986 to minimize differences between majority and minority scores. Item response theory analyses of data for about 22,000 black and 28,000 White high-school students were conducted. (SLD)

Descriptors: Black Students, College Entrance Examinations, Comparative Analysis, Culture Fair Tests

Antal, Judit	1
Ascalon, M. Evelina	1
Davis, Bruce W.	1
Feldt, Leonard S.	1
Frary, Robert B.	1
Goodwin, Laura D.	1
Green, Donald Ross	1
Hein, Serge F.	1
Marco, Gary L.	1
Melican, Gerald J.	1
Meyers, Jason L.	1
Meyers, Lawrence S.	1
Miller, G. Edward	1
Proctor, Thomas P.	1
Sass, D. A.	1
Schmitt, T. A.	1
Skaggs, Gary E.	1
Smits, Niels	1
Walker, C. M.	1
Way, Walter D.	1
More ▼