ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	22

Descriptor

Difficulty Level	23
Test Items	17
Foreign Countries	10
Item Response Theory	9
Test Bias	7
Models	5
Psychometrics	5
Cognitive Processes	4
Reading Tests	4
Scores	4
Adults	3
Comparative Analysis	3
Error of Measurement	3
Grade 4	3
Measurement	3
Reading Comprehension	3
Regression (Statistics)	3
Secondary School Students	3
Simulation	3
Test Construction	3
Test Format	3
Test Validity	3
Academic Standards	2
Achievement Tests	2
Attention	2
More ▼

Source

International Journal of…

Publication Type

Journal Articles	23
Reports - Research	16
Reports - Evaluative	6
Reports - Descriptive	1

Education Level

Secondary Education	5
Elementary Education	4
Grade 4	3
Intermediate Grades	2
Elementary Secondary Education	1
Grade 3	1
Grade 5	1
Grade 6	1
Grade 7	1
High Schools	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
More ▼

Audience

Location

United Kingdom (England)	2
Canada	1
China	1
Colombia	1
Iran	1
Malaysia	1
Netherlands	1
Philippines	1
Singapore	1
Slovakia	1
Turkey	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Progress in International…	2
Big Five Inventory	1
International English…	1
National Assessment of…	1
Program for International…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Interaction of Social Deference and Cognitive Processing in the Prediction of Acquiescence

Peer reviewed

Direct link

Patrik Havan; Michal Kohút; Peter Halama – International Journal of Testing, 2025

Acquiescence is the tendency of participants to shift their responses to agreement. Lechner et al. (2019) introduced the following mechanisms of acquiescence: social deference and cognitive processing. We added their interaction into a theoretical framework. The sample consists of 557 participants. We found significant medium strong relationship…

Descriptors: Cognitive Processes, Attention, Difficulty Level, Reflection

Exploring a Source of Uneven Score Equity across the Test Score Range

Peer reviewed

Direct link

Huggins-Manley, Anne Corinne; Qiu, Yuxi; Penfield, Randall D. – International Journal of Testing, 2018

Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have…

Descriptors: Equated Scores, Test Bias, Test Items, Difficulty Level

Exploring Task Features That Predict Psychometric Quality of Test Items: The Case for the Dutch Driving Theory Exam

Peer reviewed

Direct link

Roelofs, Erik C.; Emons, Wilco H. M.; Verschoor, Angela J. – International Journal of Testing, 2021

This study reports on an Evidence Centered Design (ECD) project in the Netherlands, involving the theory exam for prospective car drivers. In particular, we illustrate how cognitive load theory, task-analysis, response process models, and explanatory item-response theory can be used to systematically develop and refine task models. Based on a…

Descriptors: Foreign Countries, Psychometrics, Test Items, Evidence Based Practice

FIPC Linking across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions

Peer reviewed

Direct link

Kim, Sohee; Cole, Ki Lynn; Mwavita, Mwarumba – International Journal of Testing, 2018

This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The…

Descriptors: Test Items, Item Response Theory, Test Format, Difficulty Level

Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling

Peer reviewed

Direct link

Holmes, Stephen D.; Meadows, Michelle; Stockford, Ian; He, Qingping – International Journal of Testing, 2018

The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers…

Descriptors: Foreign Countries, Secondary School Students, Mathematics Tests, Test Items

Investigating How Test-Takers Change Their Strategies to Handle Difficulty in Taking a Reading Comprehension Test: Implications for Score Validation

Peer reviewed

Direct link

Wu, Amery D.; Chen, Michelle Y.; Stone, Jake E. – International Journal of Testing, 2018

This article investigates how test-takers change their strategies to handle increased test difficulty. An adult sample reported their test-taking strategies immediately after completing the tasks in a reading test. Data were analyzed using structural equation modeling specifying a measurement-invariant, ability-moderated, latent transition…

Descriptors: Test Wiseness, Reading Tests, Reading Comprehension, Difficulty Level

Invariance Properties for General Diagnostic Classification Models

Peer reviewed

Direct link

Bradshaw, Laine P.; Madison, Matthew J. – International Journal of Testing, 2016

In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic…

Descriptors: Classification, Models, Simulation, Psychometrics

The Effect of Sequential Cues of Item Contexts in Science Assessment

Peer reviewed

Direct link

Wang, Ting; Li, Min; Thummaphan, Phonraphee; Ruiz-Primo, Maria Araceli – International Journal of Testing, 2017

Contextualized items have been widely used in science testing. Despite common use of item contexts, how the influence of a chosen context on the reliability and validity of the score inferences remains unclear. We focused on sequential cues of contextual information, referring to the order of events or descriptions presented in item contexts. We…

Descriptors: Science Tests, Cues, Difficulty Level, Test Items

International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

Peer reviewed

Direct link

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey – International Journal of Testing, 2016

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…

Descriptors: Semiotics, Difficulty Level, Test Items, Science Tests

Use of Evidence-Centered Design to Develop Learning Maps-Based Assessments

Peer reviewed

Direct link

Sue Bechard; Amy Clark; Russell Swinburne Romine; Meagan Karvonen; Neal Kingston; Karen Erickson – International Journal of Testing, 2019

Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment's validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered…

Descriptors: Evidence Based Practice, Test Construction, Test Validity, Measurement

Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

Peer reviewed

Direct link

Baghaei, Purya; Aryadoust, Vahid – International Journal of Testing, 2015

Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…

Descriptors: Test Format, Item Response Theory, Models, Test Items

Recursive Partitioning to Identify Potential Causes of Differential Item Functioning in Cross-National Data

Peer reviewed

Direct link

Finch, W. Holmes; Hernández Finch, Maria E.; French, Brian F. – International Journal of Testing, 2016

Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments…

Descriptors: Test Bias, Scores, Native Language, Language Usage

Examining the Relationship between Gender DIF and Language Complexity in Mathematics Assessments

Peer reviewed

Direct link

Kan, Adnan; Bulut, Okan – International Journal of Testing, 2014

This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word…

Descriptors: Gender Differences, Test Bias, Difficulty Level, Test Items

Evaluating the Bookmark Standard Setting Method: The Impact of Random Item Ordering

Peer reviewed

Direct link

Davis-Becker, Susan L.; Buckendahl, Chad W.; Gerrow, Jack – International Journal of Testing, 2011

Throughout the world, cut scores are an important aspect of a high-stakes testing program because they are a key operational component of the interpretation of test scores. One method for setting standards that is prevalent in educational testing programs--the Bookmark method--is intended to be a less cognitively complex alternative to methods…

Descriptors: Standard Setting (Scoring), Cutting Scores, Educational Testing, Licensing Examinations (Professions)

Applying Rasch Model and Generalizability Theory to Study Modified-Angoff Cut Scores

Peer reviewed

Direct link

Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012

The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…

Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling

Previous Page | Next Page »

Pages: 1 | 2

Allalouf, Avi	1
Amy Clark	1
Arce, Alvaro J.	1
Aryadoust, Vahid	1
Baghaei, Purya	1
Bradshaw, Laine P.	1
Bryant, Damon U.	1
Buckendahl, Chad W.	1
Bulut, Okan	1
Chen, Michelle Y.	1
Cole, Ki Lynn	1
Davis-Becker, Susan L.	1
DeMars, Christine E.	1
Duong, Minh Q.	1
Emons, Wilco H. M.	1
Ercikan, Kadriye	1
Finch, W. Holmes	1
French, Brian F.	1
Garcia, Paula	1
Gerrow, Jack	1
Gorin, Joanna S.	1
He, Qingping	1
Hernández Finch, Maria E.	1
Holmes, Stephen D.	1
Huggins-Manley, Anne Corinne	1
More ▼