Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 4 |
Descriptor
Comparative Testing | 13 |
Item Response Theory | 13 |
Test Construction | 13 |
Test Items | 8 |
Difficulty Level | 4 |
Mathematical Models | 4 |
Test Format | 4 |
Computer Assisted Testing | 3 |
Foreign Countries | 3 |
Higher Education | 3 |
Item Bias | 3 |
More ▼ |
Source
Educational Measurement:… | 1 |
Journal of Cross-Cultural… | 1 |
Journal of Educational… | 1 |
Journal of Educational and… | 1 |
Practical Assessment,… | 1 |
Research Papers in Education | 1 |
Author
Agus Santoso | 1 |
Ang, Cheng | 1 |
Clauser, Brian E. | 1 |
Crisp, Victoria | 1 |
Ellis, Barbara B. | 1 |
Green, Kathy E. | 1 |
Gulzhaina K. Kassymova | 1 |
Heri Retnawati | 1 |
Ibnu Rafi | 1 |
Jimmy de la Torre | 1 |
Jinran Wu | 1 |
More ▼ |
Publication Type
Reports - Research | 9 |
Journal Articles | 6 |
Speeches/Meeting Papers | 5 |
Reports - Evaluative | 4 |
Tests/Questionnaires | 1 |
Education Level
High Schools | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Location
Germany | 1 |
Indonesia | 1 |
United Kingdom | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Raven Progressive Matrices | 1 |
What Works Clearinghouse Rating
Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…
Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability
Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024
The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…
Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction
Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024
Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…
Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment
Crisp, Victoria – Research Papers in Education, 2008
This research set out to compare the quality, length and nature of (1) exam responses in combined question and answer booklets, with (2) responses in separate answer booklets in order to inform choices about response format. Combined booklets are thought to support candidates by giving more information on what is expected of them. Anecdotal…
Descriptors: Geography Instruction, High School Students, Test Format, Test Construction
Clauser, Brian E.; And Others – 1991
Item bias has been a major concern for test developers during recent years. The Mantel-Haenszel statistic has been among the preferred methods for identifying biased items. The statistic's performance in identifying uniform bias in simulated data modeled by producing various levels of difference in the (item difficulty) b-parameter for reference…
Descriptors: Comparative Testing, Difficulty Level, Item Bias, Item Response Theory
Ang, Cheng; Miller, M. David – 1993
The power of the procedure of W. Stout to detect deviations from essential unidimensionality in two-dimensional data was investigated for minor, moderate, and large deviations from unidimensionality using criteria for deviations from unidimensionality based on prior research. Test lengths of 20 and 40 items and sample sizes of 700 and 1,500 were…
Descriptors: Ability, Comparative Testing, Correlation, Item Response Theory
Nandakumar, Ratna – 1992
The performance of the following four methodologies for assessing unidimensionality was examined: (1) DIMTEST; (2) the approach of P. W. Holland and P. R. Rosenbaum; (3) linear factor analysis; and (4) non-linear factor analysis. Each method is examined and compared with other methods using simulated data sets and real data sets. Seven data sets,…
Descriptors: Ability, Comparative Testing, Correlation, Equations (Mathematics)
Assessing the Effects of Computer Administration on Scores and Parameter Estimates Using IRT Models.
Sykes, Robert C.; And Others – 1991
To investigate the psychometric feasibility of replacing a paper-and-pencil licensing examination with a computer-administered test, a validity study was conducted. The computer-administered test (Cadm) was a common set of items for all test takers, distinct from computerized adaptive testing, in which test takers receive items appropriate to…
Descriptors: Adults, Certification, Comparative Testing, Computer Assisted Testing
Smith, Michael W. – 1990
The insights provided by Rasch analysis of results from a literature test were explored. Students in grades 9, 10, and 11 (n=261) responded to a 28-item test before they received one of three treatments: (1) direct instruction, based on research on metacognition in reading, which attempts to give conscious control of strategies used to understand…
Descriptors: Comparative Testing, High School Students, High Schools, Irony
Miao, Chang Y.; Kramer, Gene A. – 1992
An approach to detecting differential item functioning using the Rasch model with equivalent-group cross-validation was investigated. College students taking the Dental Admission Test, were divided by gender (936 females and 1,537 males) into 2 different samples. Rasch analyses were performed on both samples. Data were recalibrated after…
Descriptors: College Entrance Examinations, College Students, Comparative Testing, Dental Schools
Green, Kathy E.; Kluever, Raymond C. – 1991
Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Descriptors: Academically Gifted, Children, Comparative Testing, Difficulty Level

Ellis, Barbara B.; And Others – Journal of Cross-Cultural Psychology, 1993
Evaluates the measurement equivalence of an English-language version of the Trier Personality Inventory, using statistical methods based on item response theory to identify items displaying differential item functioning (DIF). Results with 295 U.S. and 213 West German undergraduates and 203 U.S. college students indicate significant agreement in…
Descriptors: College Students, Comparative Testing, Cross Cultural Studies, Cultural Differences

Wise, Steven L.; And Others – Journal of Educational Measurement, 1992
Performance of 156 undergraduate and 48 graduate students on a self-adapted test (SFAT)--students choose the difficulty level of their test items--was compared with performance on a computer-adapted test (CAT). Those taking the SFAT obtained higher ability scores and reported lower posttest state anxiety than did CAT takers. (SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Difficulty Level