ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	4

Descriptor

Comparative Testing	9
Item Response Theory	9
Test Reliability	9
Test Validity	4
Research Methodology	3
Undergraduate Students	3
Ability Identification	2
Adaptive Testing	2
College Students	2
Computer Assisted Testing	2
Higher Education	2
Mathematics Tests	2
Multidimensional Scaling	2
Psychometrics	2
Response Style (Tests)	2
Scoring Rubrics	2
Test Format	2
Test Items	2
Algorithms	1
Chinese	1
Cognitive Processes	1
Compensatory Education	1
Correlation	1
Cross Cultural Studies	1
Early Childhood Education	1
More ▼

Source

Applied Measurement in…	2
Educational and Psychological…	1
Journal of Cross-Cultural…	1
Journal of Educational and…	1
Online Submission	1
ProQuest LLC	1

Author

Bhola, Dennison S.	1
Bontempo, Robert	1
DeMars, Christine E.	1
Jiayi Deng	1
Kong, Xiaojing J.	1
Lane, Suzanne	1
Lee, Yoonsun	1
Lunz, Mary E.	1
Luping Niu	1
Melancon, Janet G.	1
Seung W. Choi	1
Stone, Clement A.	1
Taylor, Catherine S.	1
Thompson, Bruce	1
Wim J. van der Linden	1
Wise, Steven L.	1
More ▼

Publication Type

Journal Articles	5
Reports - Research	5
Reports - Evaluative	3
Speeches/Meeting Papers	3
Dissertations/Theses -…	1

Education Level

Higher Education	2
Grade 10	1
Grade 4	1
Grade 7	1

Audience

Location

China	1
France	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Embedded Figures Test

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Linking Errors Introduced by Rapid Guessing Responses When Employing Multigroup Concurrent IRT Scaling

Direct link

Jiayi Deng – ProQuest LLC, 2024

Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…

Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement

A Two-Level Adaptive Test Battery

Peer reviewed

Direct link

Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024

A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…

Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability

Stability of Rasch Scales over Time

Peer reviewed

Direct link

Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…

Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

Setting the Response Time Threshold Parameter to Differentiate Solution Behavior from Rapid-Guessing Behavior

Peer reviewed

Direct link

Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007

This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…

Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory

Scoring Subscales Using Multidimensional Item Response Theory Models

Download full text

DeMars, Christine E. – Online Submission, 2005

Several methods for estimating item response theory scores for multiple subtests were compared. These methods included two multidimensional item response theory models: a bi-factor model where each subtest was a composite score based on the primary trait measured by the set of tests and a secondary trait measured by the individual subtest, and a…

Descriptors: Item Response Theory, Multidimensional Scaling, Correlation, Scoring Rubrics

Use of Restricted Item Response Theory Models for Examining the Stability of Item Parameter Estimates over Time.

Peer reviewed

Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991

A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)

Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education

Test-Retest Consistency of Computer Adaptive Tests.

Lunz, Mary E.; And Others – 1990

This study explores the test-retest consistency of computer adaptive tests of varying lengths. The testing model used was designed as a mastery model to determine whether an examinee's estimated ability level is above or below a pre-established criterion expressed in the metric (logits) of the calibrated item pool scale. The Rasch model was used…

Descriptors: Ability Identification, Adaptive Testing, College Students, Comparative Testing

Translation Fidelity of Psychological Scales: An Item Response Theory Analysis of an Individualism-Collectivism Scale.

Peer reviewed

Bontempo, Robert – Journal of Cross-Cultural Psychology, 1993

Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)

Descriptors: Chinese, Comparative Testing, Cross Cultural Studies, Foreign Countries

Latent Trait Calibrations for the Finding Embedded Figures Test: A Study with Middle School Students.

Download full text

Melancon, Janet G.; Thompson, Bruce – 1990

Latent trait measurement theory was used to investigate the measurement characteristics of both parts of a multiple-choice measure of field-independence, the Finding Embedded Figures Test (FEFT). Analysis was based on data provided by 1,528 students enrolled in one of two middle schools located in the southern United States. Of the subjects, 731…

Descriptors: Cognitive Processes, Comparative Testing, Field Dependence Independence, Item Response Theory