Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 4 |
Descriptor
Comparative Testing | 26 |
Item Response Theory | 26 |
Test Items | 26 |
Item Bias | 10 |
Test Construction | 8 |
Estimation (Mathematics) | 7 |
Higher Education | 7 |
Mathematical Models | 7 |
Difficulty Level | 6 |
Computer Assisted Testing | 5 |
Foreign Countries | 5 |
More ▼ |
Source
Author
Clauser, Brian E. | 2 |
De Ayala, R. J. | 2 |
Ellis, Barbara B. | 2 |
Sykes, Robert C. | 2 |
Wise, Steven L. | 2 |
Agus Santoso | 1 |
Bhola, Dennison S. | 1 |
Bontempo, Robert | 1 |
Chan, Jason C. | 1 |
Cohen, Allan S. | 1 |
Davey, Beth | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 13 |
Reports - Evaluative | 12 |
Speeches/Meeting Papers | 9 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 3 |
Elementary Secondary Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
Postsecondary Education | 1 |
Audience
Location
United States | 4 |
Germany | 2 |
China | 1 |
France | 1 |
Indonesia | 1 |
Taiwan (Taipei) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Raven Progressive Matrices | 1 |
SAT (College Admission Test) | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024
The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…
Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction
Wang, Jianjun – School Science and Mathematics, 2011
As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…
Descriptors: Test Items, Figurative Language, Item Response Theory, Benchmarking
Schulz, Wolfram; Fraillon, Julian – Educational Research and Evaluation, 2011
When comparing data derived from tests or questionnaires in cross-national studies, researchers commonly assume measurement invariance in their underlying scaling models. However, different cultural contexts, languages, and curricula can have powerful effects on how students respond in different countries. This article illustrates how the…
Descriptors: Citizenship Education, International Studies, Item Response Theory, International Education
Clauser, Brian E.; And Others – 1991
Item bias has been a major concern for test developers during recent years. The Mantel-Haenszel statistic has been among the preferred methods for identifying biased items. The statistic's performance in identifying uniform bias in simulated data modeled by producing various levels of difference in the (item difficulty) b-parameter for reference…
Descriptors: Comparative Testing, Difficulty Level, Item Bias, Item Response Theory
Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007
This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…
Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory
Mazor, Kathleen M.; And Others – 1993
The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning (DIF). One of the most troublesome criticisms of this procedure is that while detection rates for uniform DIF are very good, the procedure is not sensitive to non-uniform DIF. In this study, examinee responses were generated…
Descriptors: Comparative Testing, Computer Simulation, Item Bias, Item Response Theory
Sykes, Robert C. – 1989
An analysis-of-covariance methodology was used to investigate whether there were population differences between tryout and operational Rasch item b-values relative to differences between pairs of item response theory (IRT) b-values from consecutive operational item administrations. This methodology allowed the evaluation of whether any such…
Descriptors: Analysis of Covariance, Certification, Comparative Testing, Item Response Theory
Nandakumar, Ratna – 1992
The performance of the following four methodologies for assessing unidimensionality was examined: (1) DIMTEST; (2) the approach of P. W. Holland and P. R. Rosenbaum; (3) linear factor analysis; and (4) non-linear factor analysis. Each method is examined and compared with other methods using simulated data sets and real data sets. Seven data sets,…
Descriptors: Ability, Comparative Testing, Correlation, Equations (Mathematics)
Clauser, Brian E.; And Others – 1991
This paper explores the effectiveness of the Mantel-Haenszel (MH) statistic in detecting differentially functioning test items when the internal criterion is varied. Using a data set from the 1982 statewide administration of a 150-item life skills examination (the New Mexico High School Proficiency Examination), a randomly selected sample of 1,000…
Descriptors: American Indians, Anglo Americans, Comparative Testing, High School Students

De Ayala, R. J. – Applied Psychological Measurement, 1992
A computerized adaptive test (CAT) based on the nominal response model (NR CAT) was implemented, and the performance of the NR CAT and a CAT based on the three-parameter logistic model was compared. The NR CAT produced trait estimates comparable to those of the three-parameter test. (SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Equations (Mathematics)
Assessing the Effects of Computer Administration on Scores and Parameter Estimates Using IRT Models.
Sykes, Robert C.; And Others – 1991
To investigate the psychometric feasibility of replacing a paper-and-pencil licensing examination with a computer-administered test, a validity study was conducted. The computer-administered test (Cadm) was a common set of items for all test takers, distinct from computerized adaptive testing, in which test takers receive items appropriate to…
Descriptors: Adults, Certification, Comparative Testing, Computer Assisted Testing

Ellis, Barbara B. – Intelligence, 1990
Intellectual abilities were measured for 217 German and 205 American college students using tests (in the subjects' native languages) in which equivalence was established by an item-response theory-based differential-item-functioning (DIF) analysis. Comparisons between groups were not the same before and after removal of DIF items. (SLD)
Descriptors: College Students, Comparative Testing, Cross Cultural Studies, Culture Fair Tests
De Ayala, R. J. – 1992
One important and promising application of item response theory (IRT) is computerized adaptive testing (CAT). The implementation of a nominal response model-based CAT (NRCAT) was studied. Item pool characteristics for the NRCAT as well as the comparative performance of the NRCAT and a CAT based on the three-parameter logistic (3PL) model were…
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation
Kim, Haeok; Plake, Barbara S. – 1993
A two-stage testing strategy is one method of adapting the difficulty of a test to an individual's ability level in an effort to achieve more precise measurement. A routing test provides an initial estimate of ability level, and a second-stage measurement test then evaluates the examinee further. The measurement accuracy and efficiency of item…
Descriptors: Ability, Adaptive Testing, Comparative Testing, Computer Assisted Testing

Kim, Seock-Ho; Cohen, Allan S. – Applied Psychological Measurement, 1991
The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)
Descriptors: Black Students, College Students, Comparative Testing, Equations (Mathematics)
Previous Page | Next Page ยป
Pages: 1 | 2