Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 4 |
Descriptor
Comparative Testing | 9 |
Item Response Theory | 9 |
Test Reliability | 9 |
Test Validity | 4 |
Research Methodology | 3 |
Undergraduate Students | 3 |
Ability Identification | 2 |
Adaptive Testing | 2 |
College Students | 2 |
Computer Assisted Testing | 2 |
Higher Education | 2 |
More ▼ |
Source
Applied Measurement in… | 2 |
Educational and Psychological… | 1 |
Journal of Cross-Cultural… | 1 |
Journal of Educational and… | 1 |
Online Submission | 1 |
ProQuest LLC | 1 |
Author
Bhola, Dennison S. | 1 |
Bontempo, Robert | 1 |
DeMars, Christine E. | 1 |
Jiayi Deng | 1 |
Kong, Xiaojing J. | 1 |
Lane, Suzanne | 1 |
Lee, Yoonsun | 1 |
Lunz, Mary E. | 1 |
Luping Niu | 1 |
Melancon, Janet G. | 1 |
Seung W. Choi | 1 |
More ▼ |
Publication Type
Journal Articles | 5 |
Reports - Research | 5 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 3 |
Dissertations/Theses -… | 1 |
Education Level
Higher Education | 2 |
Grade 10 | 1 |
Grade 4 | 1 |
Grade 7 | 1 |
Audience
Location
China | 1 |
France | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Embedded Figures Test | 1 |
What Works Clearinghouse Rating
Jiayi Deng – ProQuest LLC, 2024
Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…
Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement
Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…
Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007
This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…
Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory
DeMars, Christine E. – Online Submission, 2005
Several methods for estimating item response theory scores for multiple subtests were compared. These methods included two multidimensional item response theory models: a bi-factor model where each subtest was a composite score based on the primary trait measured by the set of tests and a secondary trait measured by the individual subtest, and a…
Descriptors: Item Response Theory, Multidimensional Scaling, Correlation, Scoring Rubrics

Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991
A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)
Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education
Lunz, Mary E.; And Others – 1990
This study explores the test-retest consistency of computer adaptive tests of varying lengths. The testing model used was designed as a mastery model to determine whether an examinee's estimated ability level is above or below a pre-established criterion expressed in the metric (logits) of the calibrated item pool scale. The Rasch model was used…
Descriptors: Ability Identification, Adaptive Testing, College Students, Comparative Testing

Bontempo, Robert – Journal of Cross-Cultural Psychology, 1993
Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)
Descriptors: Chinese, Comparative Testing, Cross Cultural Studies, Foreign Countries
Melancon, Janet G.; Thompson, Bruce – 1990
Latent trait measurement theory was used to investigate the measurement characteristics of both parts of a multiple-choice measure of field-independence, the Finding Embedded Figures Test (FEFT). Analysis was based on data provided by 1,528 students enrolled in one of two middle schools located in the southern United States. Of the subjects, 731…
Descriptors: Cognitive Processes, Comparative Testing, Field Dependence Independence, Item Response Theory