ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Descriptor

Comparative Testing	26
Item Response Theory	26
Test Items	26
Item Bias	10
Test Construction	8
Estimation (Mathematics)	7
Higher Education	7
Mathematical Models	7
Difficulty Level	6
Computer Assisted Testing	5
Foreign Countries	5
Adaptive Testing	4
College Students	4
Cross Cultural Studies	4
Equated Scores	4
Equations (Mathematics)	4
Test Format	4
Undergraduate Students	4
Computer Simulation	3
Factor Analysis	3
Item Analysis	3
Research Methodology	3
Simulation	3
Test Validity	3
Ability	2
More ▼

Source

Journal of Educational…	3
Applied Psychological…	2
Journal of Cross-Cultural…	2
Applied Measurement in…	1
Educational Research and…	1
Educational and Psychological…	1
Intelligence	1
Journal of Economic Education	1
Journal of Educational…	1
Practical Assessment,…	1
School Science and Mathematics	1
More ▼

Publication Type

Journal Articles	15
Reports - Research	13
Reports - Evaluative	12
Speeches/Meeting Papers	9
Reports - Descriptive	1

Education Level

Higher Education	3
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 7	1
Grade 8	1
Postsecondary Education	1

Audience

Location

United States	4
Germany	2
China	1
France	1
Indonesia	1
Taiwan (Taipei)	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

From Investigating the Alignment of a Priori Item Characteristics Based on the CTT and Four-Parameter Logistic (4-PL) IRT Models to Further Exploring the Comparability of the Two Models

Peer reviewed
PDF on ERIC

Download full text

Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024

The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…

Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction

Re-Examining Test Item Issues in the TIMSS Mathematics and Science Assessments

Peer reviewed

Direct link

Wang, Jianjun – School Science and Mathematics, 2011

As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…

Descriptors: Test Items, Figurative Language, Item Response Theory, Benchmarking

The Analysis of Measurement Equivalence in International Studies Using the Rasch Model

Peer reviewed

Direct link

Schulz, Wolfram; Fraillon, Julian – Educational Research and Evaluation, 2011

When comparing data derived from tests or questionnaires in cross-national studies, researchers commonly assume measurement invariance in their underlying scaling models. However, different cultural contexts, languages, and curricula can have powerful effects on how students respond in different countries. This article illustrates how the…

Descriptors: Citizenship Education, International Studies, Item Response Theory, International Education

Examination of Various Influences on the Mantel-Haenszel Statistic.

Clauser, Brian E.; And Others – 1991

Item bias has been a major concern for test developers during recent years. The Mantel-Haenszel statistic has been among the preferred methods for identifying biased items. The statistic's performance in identifying uniform bias in simulated data modeled by producing various levels of difference in the (item difficulty) b-parameter for reference…

Descriptors: Comparative Testing, Difficulty Level, Item Bias, Item Response Theory

Setting the Response Time Threshold Parameter to Differentiate Solution Behavior from Rapid-Guessing Behavior

Peer reviewed

Direct link

Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007

This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…

Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory

Identification of Non-Uniform Differential Item Functioning Using a Variation of the Mantel-Haenszel Procedure.

Download full text

Mazor, Kathleen M.; And Others – 1993

The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning (DIF). One of the most troublesome criticisms of this procedure is that while detection rates for uniform DIF are very good, the procedure is not sensitive to non-uniform DIF. In this study, examinee responses were generated…

Descriptors: Comparative Testing, Computer Simulation, Item Bias, Item Response Theory

Stability of IRT b-Values over Time and Position.

Download full text

Sykes, Robert C. – 1989

An analysis-of-covariance methodology was used to investigate whether there were population differences between tryout and operational Rasch item b-values relative to differences between pairs of item response theory (IRT) b-values from consecutive operational item administrations. This methodology allowed the evaluation of whether any such…

Descriptors: Analysis of Covariance, Certification, Comparative Testing, Item Response Theory

Assessing Dimensionality of a Set of Items--Comparison of Different Approaches.

Download full text

Nandakumar, Ratna – 1992

The performance of the following four methodologies for assessing unidimensionality was examined: (1) DIMTEST; (2) the approach of P. W. Holland and P. R. Rosenbaum; (3) linear factor analysis; and (4) non-linear factor analysis. Each method is examined and compared with other methods using simulated data sets and real data sets. Seven data sets,…

Descriptors: Ability, Comparative Testing, Correlation, Equations (Mathematics)

Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic. Lab Report 198.

Clauser, Brian E.; And Others – 1991

This paper explores the effectiveness of the Mantel-Haenszel (MH) statistic in detecting differentially functioning test items when the internal criterion is varied. Using a data set from the 1982 statewide administration of a 150-item life skills examination (the New Mexico High School Proficiency Examination), a randomly selected sample of 1,000…

Descriptors: American Indians, Anglo Americans, Comparative Testing, High School Students

The Nominal Response Model in Computerized Adaptive Testing.

Peer reviewed

De Ayala, R. J. – Applied Psychological Measurement, 1992

A computerized adaptive test (CAT) based on the nominal response model (NR CAT) was implemented, and the performance of the NR CAT and a CAT based on the three-parameter logistic model was compared. The NR CAT produced trait estimates comparable to those of the three-parameter test. (SLD)

Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Equations (Mathematics)

Assessing the Effects of Computer Administration on Scores and Parameter Estimates Using IRT Models.

Download full text

Sykes, Robert C.; And Others – 1991

To investigate the psychometric feasibility of replacing a paper-and-pencil licensing examination with a computer-administered test, a validity study was conducted. The computer-administered test (Cadm) was a common set of items for all test takers, distinct from computerized adaptive testing, in which test takers receive items appropriate to…

Descriptors: Adults, Certification, Comparative Testing, Computer Assisted Testing

Assessing Intelligence Cross-Nationally: A Case for Differential Item Functioning Detection.

Peer reviewed

Ellis, Barbara B. – Intelligence, 1990

Intellectual abilities were measured for 217 German and 205 American college students using tests (in the subjects' native languages) in which equivalence was established by an item-response theory-based differential-item-functioning (DIF) analysis. Comparisons between groups were not the same before and after removal of DIF items. (SLD)

Descriptors: College Students, Comparative Testing, Cross Cultural Studies, Culture Fair Tests

The Nominal Response Model in Computerized Adaptive Testing.

Download full text

De Ayala, R. J. – 1992

One important and promising application of item response theory (IRT) is computerized adaptive testing (CAT). The implementation of a nominal response model-based CAT (NRCAT) was studied. Item pool characteristics for the NRCAT as well as the comparative performance of the NRCAT and a CAT based on the three-parameter logistic (3PL) model were…

Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation

Monte Carlo Simulation Comparison of Two-Stage Testing and Computerized Adaptive Testing.

Download full text

Kim, Haeok; Plake, Barbara S. – 1993

A two-stage testing strategy is one method of adapting the difficulty of a test to an individual's ability level in an effort to achieve more precise measurement. A routing test provides an initial estimate of ability level, and a second-stage measurement test then evaluates the examinee further. The measurement accuracy and efficiency of item…

Descriptors: Ability, Adaptive Testing, Comparative Testing, Computer Assisted Testing

A Comparison of Two Area Measures for Detecting Differential Item Functioning.

Peer reviewed

Kim, Seock-Ho; Cohen, Allan S. – Applied Psychological Measurement, 1991

The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)

Descriptors: Black Students, College Students, Comparative Testing, Equations (Mathematics)

Previous Page | Next Page »

Pages: 1 | 2

Clauser, Brian E.	2
De Ayala, R. J.	2
Ellis, Barbara B.	2
Sykes, Robert C.	2
Wise, Steven L.	2
Agus Santoso	1
Bhola, Dennison S.	1
Bontempo, Robert	1
Chan, Jason C.	1
Cohen, Allan S.	1
Davey, Beth	1
Dorans, Neil J.	1
Fraillon, Julian	1
Green, Kathy E.	1
Gulzhaina K. Kassymova	1
Heri Retnawati	1
Ibnu Rafi	1
Kim, Haeok	1
Kim, Seock-Ho	1
Kluever, Raymond C.	1
Kong, Xiaojing J.	1
Kramer, Gene A.	1
Li, Yuan H.	1
Lissitz, Robert W.	1
More ▼