ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	6

Descriptor

Test Items	45
Test Construction	24
Latent Trait Theory	15
Item Analysis	12
Item Response Theory	12
Test Bias	10
Criterion Referenced Tests	9
Comparative Analysis	8
Evaluation Methods	8
Mathematical Models	8
Computer Simulation	7
Testing Problems	7
Elementary Secondary Education	6
Goodness of Fit	6
Item Banks	6
Sample Size	6
Cutting Scores	5
Estimation (Mathematics)	5
Higher Education	5
Item Bias	5
Research Methodology	5
Scores	5
Test Format	5
Test Interpretation	5
Test Validity	5
More ▼

Source

Applied Measurement in…	4
Educational and Psychological…	4
International Journal of…	3
Journal of Educational…	3
Educational Measurement:…	2
Applied Psychological…	1
Educational Assessment	1
Educational Research Quarterly	1
Evaluation and the Health…	1
Practical Assessment,…	1

Author

Hambleton, Ronald K.	45
Rogers, H. Jane	7
Jones, Russell W.	4
Sireci, Stephen G.	3
Wells, Craig S.	3
Zenisky, April L.	3
Cook, Linda L.	2
Robin, Frederic	2
Xing, Dehui	2
Allalouf, Avi	1
Baldwin, Peter	1
Baldwin, Su	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
De Gruijter, Dato N. M.	1
Eignor, Daniel R.	1
Gifford, Janice A.	1
Han, Kyung T.	1
Jirka, Stephen	1
Kanjee, Anil	1
Karatonis, Ana	1
Kirkpatrick, Robert	1
Lyren, Per-Erik	1
Mazor, Kathleen M.	1
More ▼

Publication Type

Reports - Research	23
Journal Articles	21
Reports - Evaluative	16
Speeches/Meeting Papers	16
Guides - Non-Classroom	3
Information Analyses	3
Reports - Descriptive	3
Tests/Questionnaires	2
Guides - Classroom - Teacher	1
Opinion Papers	1

Education Level

Elementary Secondary Education	2
Grade 5	1
Grade 6	1
Grade 8	1
Higher Education	1

Audience

Researchers	4
Practitioners	1

Location

California	1
Estonia	1
Florida	1
Israel	1
Massachusetts	1
Netherlands	1
New York	1
North Carolina	1
Oklahoma	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Management Admission…	1
Medical College Admission Test	1
National Assessment of…	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 45 results Save | Export

Digital Module 08: Foundations of Operational Item Analysis https://ncme.elevate.commpartners.com

Peer reviewed

Direct link

Yoo, Hanwook; Hambleton, Ronald K. – Educational Measurement: Issues and Practice, 2019

Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these…

Descriptors: Item Analysis, Item Response Theory, Guidelines, Test Construction

The Effect of Rating Unfamiliar Items on Angoff Passing Scores

Peer reviewed

Direct link

Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017

The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…

Descriptors: Scores, Item Analysis, Classification, Decision Making

An Examination of Two Procedures for Identifying Consequential Item Parameter Drift

Peer reviewed

Direct link

Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu – Applied Measurement in Education, 2014

The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…

Descriptors: Test Items, Test Bias, Equated Scores, Item Response Theory

Effect of Adjusting Pseudo-Guessing Parameter Estimates on Test Scaling When Item Parameter Drift Is Present

Peer reviewed
PDF on ERIC

Download full text

Han, Kyung T.; Wells, Craig S.; Hambleton, Ronald K. – Practical Assessment, Research & Evaluation, 2015

In item response theory test scaling/equating with the three-parameter model, the scaling coefficients A and B have no impact on the c-parameter estimates of the test items since the cparameter estimates are not adjusted in the scaling/equating procedure. The main research question in this study concerned how serious the consequences would be if…

Descriptors: Item Response Theory, Monte Carlo Methods, Scaling, Test Items

Consequences of Violated Equating Assumptions under the Equivalent Groups Design

Peer reviewed

Direct link

Lyren, Per-Erik; Hambleton, Ronald K. – International Journal of Testing, 2011

The equal ability distribution assumption associated with the equivalent groups equating design was investigated in the context of a selection test for admission to higher education. The purpose was to assess the consequences for the test-takers in terms of receiving improperly high or low scores compared to their peers, and to find strong…

Descriptors: Evidence, Test Items, Ability Grouping, Item Response Theory

Evaluating Score Equity Assessment for State NAEP

Peer reviewed

Direct link

Wells, Craig S.; Baldwin, Su; Hambleton, Ronald K.; Sireci, Stephen G.; Karatonis, Ana; Jirka, Stephen – Applied Measurement in Education, 2009

Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study…

Descriptors: National Competency Tests, Test Bias, Equated Scores, Grade 8

Multidimensional DIF Analyses: The Effects of Matching on Unidimensional Subtest Scores.

Peer reviewed

Mazor, Kathleen M.; Hambleton, Ronald K.; Clauser, Brian E. – Applied Psychological Measurement, 1998

Studied whether matching on multiple test scores would reduce false-positive error rates compared to matching on a single number-correct score using simulation. False-positive error rates were reduced for most datasets. Findings suggest that assessing the dimensional structure of a test can be important in analysis of differential item functioning…

Descriptors: Error of Measurement, Item Bias, Scores, Test Items

Small Sample Studies To Detect Flaws in Item Translations.

Peer reviewed

Muniz, Jose; Hambleton, Ronald K.; Xing, Dehui – International Journal of Testing, 2001

Studied two procedures for detecting potentially flawed items in translated tests with small samples: (1) conditional item "p" value comparisons; and (2) delta plots. Varied several factors in this simulation study. Findings show that the two procedures can be valuable in identifying flawed test items, especially when the size of the…

Descriptors: Identification, Sample Size, Simulation, Test Items

Detection of Differential Item Functioning in Large-Scale State Assessments: A Study Evaluating a Two-Stage Approach.

Peer reviewed

Zenisky, April L.; Hambleton, Ronald K.; Robin, Frederic – Educational and Psychological Measurement, 2003

Studied a two-stage methodology for evaluating differential item functioning (DIF) in large-scale assessment data using a sample of 60,000 students taking a large-scale assessment. Findings illustrate the merit of iterative approached for DIF detection, since items identified at one stage were not necessarily the same as those identified at the…

Descriptors: Item Bias, Large Scale Assessment, Research Methodology, Test Items

International Test Commission: Its History, Current Status, and Future Directions.

Peer reviewed

Oakland, Thomas; Poortinga, Ype H.; Schlegel, Justin; Hambleton, Ronald K. – International Journal of Testing, 2001

Traces the history of the International Test Commission (ITC), reviewing the context in which it was formed, its goals, and major milestones in its development. Suggests ways the ITC may continue to impact test development positively, and introduces this inaugural journal issue. (SLD)

Descriptors: Educational History, Educational Testing, International Education, Test Construction

Impact of Test Design, Item Quality, and Item Bank Size on the Psychometric Properties of Computer-Based Credentialing Examinations

Peer reviewed

Direct link

Xing, Dehui; Hambleton, Ronald K. – Educational and Psychological Measurement, 2004

Computer-based testing by credentialing agencies has become common; however, selecting a test design is difficult because several good ones are available - parallel forms, computer adaptive (CAT), and multistage (MST). In this study, three computer-based test designs under some common examination conditions were investigated. Item bank size and…

Descriptors: Test Construction, Psychometrics, Item Banks, Computer Assisted Testing

DIF Detection and Interpretation in Large-Scale Science Assessments: Informing Item Writing Practices

Peer reviewed

Direct link

Zenisky, April L.; Hambleton, Ronald K.; Robin, Frederic – Educational Assessment, 2004

Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content,…

Descriptors: Program Effectiveness, Test Format, Science Tests, Test Items

Effects of Local Item Dependence on the Validity of IRT Item, Test, and Ability Statistics. MCAT Monograph.

Download full text

Zenisky, April L.; Hambleton, Ronald K.; Sireci, Stephen G. – 2001

Measurement specialists routinely assume examinee responses to test items are independent of one another. However, previous research has shown that many contemporary tests contain item dependencies and not accounting for these dependencies leads to misleading estimates of item, test, and ability parameters. In this study, methods for detecting…

Descriptors: Ability, College Applicants, College Entrance Examinations, Higher Education

Comparison of Empirical and Judgmental Procedures for Detecting Differential Item Functioning.

Peer reviewed

Hambleton, Ronald K.; Jones, Russell W. – Educational Research Quarterly, 1994

A judgmental method for determining item bias was applied to test data from 2,000 Native American and 2,000 Anglo-American students for a statewide proficiency test. Results indicated some shortcomings of the judgmental method but supported the use of cross-validation in empirically identifying potential bias. (SLD)

Descriptors: American Indians, Anglo Americans, Comparative Analysis, Decision Making

Influence of Item Parameter Errors in Test Development.

Download full text

Hambleton, Ronald K.; And Others – 1990

Item response theory (IRT) model parameter estimates have considerable merit and open up new directions for test development, but misleading results are often obtained because of errors in the item parameter estimates. The problem of the effects of item parameter estimation errors on the test development process is discussed, and the seriousness…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Sampling

Previous Page | Next Page »

Pages: 1 | 2 | 3