ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	5

Descriptor

Test Items	33
Test Construction	13
Computer Assisted Testing	11
Adaptive Testing	10
Test Format	8
Item Response Theory	7
Testing Problems	7
Difficulty Level	6
Test Length	6
College Entrance Examinations	5
Item Banks	5
Item Bias	5
Scores	5
Scoring	5
Statistical Analysis	5
Test Validity	5
Comparative Analysis	4
High Schools	4
Latent Trait Theory	4
Mathematical Models	4
Psychometrics	4
Test Reliability	4
Elementary Secondary Education	3
Equated Scores	3
Item Analysis	3
More ▼

Source

Journal of Educational…	6
Journal of Educational and…	6
Educational Measurement:…	3
Applied Measurement in…	2
ETS Research Report Series	1
Educational and Psychological…	1
Journal of College Admissions	1
Journal of Educational…	1
Review of Educational Research	1

Author

Wainer, Howard	33
Wang, Xiaohui	4
Thissen, David	3
Bradlow, Eric T.	2
Kiely, Gerard L.	2
Lukhele, Robert	2
Allen, Nancy L.	1
Bradlow, Eric	1
Feinberg, Richard A.	1
Lewis, Charles	1
Muller, Eric S.	1
Robinson, Daniel H.	1
Sakworawich, Arnond	1
More ▼

Publication Type

Journal Articles	22
Reports - Evaluative	17
Reports - Research	8
Opinion Papers	4
Reports - Descriptive	4
Information Analyses	3
Guides - Non-Classroom	1
Speeches/Meeting Papers	1

Education Level

Higher Education

Audience

Location

Canada	1
Israel	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	5
Advanced Placement…	2
National Assessment of…	2
Test of English as a Foreign…	2
Law School Admission Test	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

Scoring Tests with Contaminated Response Vectors

Peer reviewed

Direct link

Sakworawich, Arnond; Wainer, Howard – Journal of Educational and Behavioral Statistics, 2020

Test scoring models vary in their generality, some even adjust for examinees answering multiple-choice items correctly by accident (guessing), but no models, that we are aware of, automatically adjust an examinee's score when there is internal evidence of cheating. In this study, we use a combination of jackknife technology with an adaptive robust…

Descriptors: Scoring, Cheating, Test Items, Licensing Examinations (Professions)

When Can We Improve Subscores by Making Them Shorter?: The Case against Subscores with Overlapping Items

Peer reviewed

Direct link

Feinberg, Richard A.; Wainer, Howard – Educational Measurement: Issues and Practice, 2014

Subscores can be of diagnostic value for tests that cover multiple underlying traits. Some items require knowledge or ability that spans more than a single trait. It is thus natural for such items to be included on more than a single subscore. Subscores only have value if they are reliable enough to justify conclusions drawn from them and if they…

Descriptors: Scores, Test Items, Reliability

Detecting DIF: Many Paths to Salvation

Peer reviewed

Direct link

Wainer, Howard; Bradlow, Eric; Wang, Xiaohui – Journal of Educational and Behavioral Statistics, 2010

Confucius pointed out that the first step toward wisdom is calling things by the right name. The term "Differential Item Functioning" (DIF) did not arise fully formed from the miasma of psychometrics, it evolved from a variety of less accurate terms. Among its forebears was "item bias" but that term has a pejorative connotation…

Descriptors: Test Bias, Difficulty Level, Test Items, Statistical Analysis

A Bayesian Method for Studying DIF: A Cautionary Tale Filled with Surprises and Delights

Peer reviewed

Direct link

Wang, Xiaohui; Bradlow, Eric T.; Wainer, Howard; Muller, Eric S. – Journal of Educational and Behavioral Statistics, 2008

In the course of screening a form of a medical licensing exam for items that function differentially (DIF) between men and women, the authors used the traditional Mantel-Haenszel (MH) statistic for initial screening and a Bayesian method for deeper analysis. For very easy items, the MH statistic unexpectedly often found DIF where there was none.…

Descriptors: Bayesian Statistics, Licensing Examinations (Professions), Medicine, Test Items

Profiles in Research: Fumiko Samejima

Peer reviewed

Direct link

Wainer, Howard; Robinson, Daniel H. – Journal of Educational and Behavioral Statistics, 2007

Fumiko Samejima is best known for her pioneering work in polytomous response item response theory (IRT), yielding the eponymous model that has been used broadly for more than 30 years. In this interview, Samejima, on the verge of retiring from her faculty position at the University of Tennessee, discusses her life and career. She also describes…

Descriptors: Foreign Countries, Psychometrics, Item Response Theory, Test Items

Rescuing Computerized Testing by Breaking Zipf's Law.

Peer reviewed

Wainer, Howard – Journal of Educational and Behavioral Statistics, 2000

Suggests that because of the nonlinear relationship between item usage and item security, the problems of test security posed by continuous administration of standardized tests cannot be resolved merely by increasing the size of the item pool. Offers alternative strategies to overcome these problems, distributing test items so as to avoid the…

Descriptors: Computer Assisted Testing, Standardized Tests, Test Items, Testing Problems

Using a New Statistical Model for Testlets To Score TOEFL.

Peer reviewed

Wainer, Howard; Wang, Xiaohui – Journal of Educational Measurement, 2000

Modified the three-parameter model to include an additional random effect for items nested within the same testlet. Fitted the new model to 86 testlets from the Test of English as a Foreign Language (TOEFL) and compared standard parameters (discrimination, difficulty, and guessing) with those obtained through traditional modeling. Discusses the…

Descriptors: English (Second Language), Language Tests, Scoring, Statistical Analysis

Choosing: A Test. ETS Program Statistics Research.

Download full text

Wainer, Howard; Thissen, David – 1992

If examinees are permitted to choose to answer a subset of the questions on a test, just knowing which questions were chosen can provide a measure of proficiency that may be as reliable as would have been obtained from the test graded traditionally. This new method of scoring is much less time consuming and expensive for both the examinee and the…

Descriptors: Adaptive Testing, Cost Effectiveness, Responses, Scoring

Testing and Test Theory: Whither and Whence.

Download full text

Wainer, Howard – 1982

This paper is the transcript of a talk given to those who use test information but who have little technical background in test theory. The concepts of modern test theory are compared with traditional test theory, as well as a probable future test theory. The explanations given are couched within an extended metaphor that allows a full description…

Descriptors: Difficulty Level, Latent Trait Theory, Metaphors, Test Items

Managing the Influence of DIF from Big Items: The 1988 Advanced Placement History Test as an Example.

Peer reviewed

Wainer, Howard; Lukhele, Robert – Applied Measurement in Education, 1997

The screening for flaws done for multiple-choice items is often not done for large items. Examines continuous item weighting as a way to manage the influence of differential item functioning (DIF). Data from the College Board Advanced Placement History Test are used to illustrate the method. (SLD)

Descriptors: Advanced Placement, College Entrance Examinations, History, Item Bias

Comparing the Incomparable: An Essay on the Importance of Big Assumptions and Scant Evidence.

Peer reviewed

Wainer, Howard – Educational Measurement: Issues and Practice, 1999

Discusses the comparison of groups of individuals who were administered different forms of a test. Focuses on the situation in which there is little overlap in content between the test forms. Reviews equating problems in national tests in Canada and Israel. (SLD)

Descriptors: Comparative Analysis, Equated Scores, Foreign Countries, National Competency Tests

Precision and Differential Item Functioning on a Testlet- Based Test: The 1991 Law School Admissions Test as an Example.

Peer reviewed

Wainer, Howard – Applied Measurement in Education, 1995

Analysis of the 1991 Law School Admission Test (LSAT) shows that the testlet structure of the reading comprehension and analytic reasoning sections has a significant effect on the statistical characteristics of the test. The testlet-based reliability of these two sections is lower than had been previously calculated. (SLD)

Descriptors: Admission (School), Item Bias, Law Schools, Psychometrics

On Examinee Choice in Educational Testing.

Peer reviewed

Wainer, Howard; Thissen, David – Review of Educational Research, 1994

This article summarizes results from tests that have allowed examinee choice of test items. It paints a bleak psychometric picture for the use of examinee choice within fair tests. Choice is anathema to standardized testing unless the aspects that characterize the test are irrelevant to what is being tested. (SLD)

Descriptors: Adaptive Testing, Educational Assessment, Elementary Secondary Education, Equal Education

Was There One Distractor Too Many?

Peer reviewed

Wainer, Howard; And Others – Journal of Educational Statistics, 1984

A mathematics item on the Scholastic Aptitude Test (SAT) was found to be faulty and received wide publicity. A detailed investigation into its mathematical and psychometric properties is presented. It was found that the problem could be considered ambiguous but that almost no one noticed the ambiguity. (Author/JKS)

Descriptors: Classification, College Entrance Examinations, Geometry, High Schools

How Reliable Are TOEFL Scores?

Peer reviewed

Wainer, Howard; Lukhele, Robert – Educational and Psychological Measurement, 1997

The reliability of scores from four forms of the Test of English as a Foreign Language (TOEFL) was estimated using a hybrid item response theory model. It was found that there was very little difference between overall reliability when the testlet items were assumed to be independent and when their dependence was modeled. (Author/SLD)

Descriptors: English (Second Language), Item Response Theory, Scores, Second Language Learning

Previous Page | Next Page »

Pages: 1 | 2 | 3