ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	16

Descriptor

Difficulty Level	17
Test Items	17
Item Response Theory	8
Foreign Countries	7
Test Bias	5
Models	4
Cognitive Processes	3
Psychometrics	3
Test Format	3
Adults	2
Attention	2
Comparative Analysis	2
Correlation	2
Cutting Scores	2
English (Second Language)	2
Error of Measurement	2
Gender Differences	2
High Stakes Tests	2
Language Tests	2
Listening Comprehension Tests	2
Mathematics Tests	2
Raw Scores	2
Reading Comprehension	2
Reading Tests	2
Regression (Statistics)	2
More ▼

Source

International Journal of…

Publication Type

Journal Articles	17
Reports - Research	11
Reports - Evaluative	6

Education Level

Secondary Education	4
Elementary Education	1
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
More ▼

Audience

Location

China	1
Iran	1
Malaysia	1
Netherlands	1
Philippines	1
Singapore	1
Slovakia	1
Turkey	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

Big Five Inventory	1
International English…	1
National Assessment of…	1
Program for International…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

Interaction of Social Deference and Cognitive Processing in the Prediction of Acquiescence

Peer reviewed

Direct link

Patrik Havan; Michal Kohút; Peter Halama – International Journal of Testing, 2025

Acquiescence is the tendency of participants to shift their responses to agreement. Lechner et al. (2019) introduced the following mechanisms of acquiescence: social deference and cognitive processing. We added their interaction into a theoretical framework. The sample consists of 557 participants. We found significant medium strong relationship…

Descriptors: Cognitive Processes, Attention, Difficulty Level, Reflection

Exploring a Source of Uneven Score Equity across the Test Score Range

Peer reviewed

Direct link

Huggins-Manley, Anne Corinne; Qiu, Yuxi; Penfield, Randall D. – International Journal of Testing, 2018

Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have…

Descriptors: Equated Scores, Test Bias, Test Items, Difficulty Level

Exploring Task Features That Predict Psychometric Quality of Test Items: The Case for the Dutch Driving Theory Exam

Peer reviewed

Direct link

Roelofs, Erik C.; Emons, Wilco H. M.; Verschoor, Angela J. – International Journal of Testing, 2021

This study reports on an Evidence Centered Design (ECD) project in the Netherlands, involving the theory exam for prospective car drivers. In particular, we illustrate how cognitive load theory, task-analysis, response process models, and explanatory item-response theory can be used to systematically develop and refine task models. Based on a…

Descriptors: Foreign Countries, Psychometrics, Test Items, Evidence Based Practice

FIPC Linking across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions

Peer reviewed

Direct link

Kim, Sohee; Cole, Ki Lynn; Mwavita, Mwarumba – International Journal of Testing, 2018

This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The…

Descriptors: Test Items, Item Response Theory, Test Format, Difficulty Level

Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling

Peer reviewed

Direct link

Holmes, Stephen D.; Meadows, Michelle; Stockford, Ian; He, Qingping – International Journal of Testing, 2018

The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers…

Descriptors: Foreign Countries, Secondary School Students, Mathematics Tests, Test Items

The Effect of Sequential Cues of Item Contexts in Science Assessment

Peer reviewed

Direct link

Wang, Ting; Li, Min; Thummaphan, Phonraphee; Ruiz-Primo, Maria Araceli – International Journal of Testing, 2017

Contextualized items have been widely used in science testing. Despite common use of item contexts, how the influence of a chosen context on the reliability and validity of the score inferences remains unclear. We focused on sequential cues of contextual information, referring to the order of events or descriptions presented in item contexts. We…

Descriptors: Science Tests, Cues, Difficulty Level, Test Items

International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

Peer reviewed

Direct link

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey – International Journal of Testing, 2016

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…

Descriptors: Semiotics, Difficulty Level, Test Items, Science Tests

Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model

Peer reviewed

Direct link

Baghaei, Purya; Aryadoust, Vahid – International Journal of Testing, 2015

Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…

Descriptors: Test Format, Item Response Theory, Models, Test Items

Examining the Relationship between Gender DIF and Language Complexity in Mathematics Assessments

Peer reviewed

Direct link

Kan, Adnan; Bulut, Okan – International Journal of Testing, 2014

This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word…

Descriptors: Gender Differences, Test Bias, Difficulty Level, Test Items

Evaluating the Bookmark Standard Setting Method: The Impact of Random Item Ordering

Peer reviewed

Direct link

Davis-Becker, Susan L.; Buckendahl, Chad W.; Gerrow, Jack – International Journal of Testing, 2011

Throughout the world, cut scores are an important aspect of a high-stakes testing program because they are a key operational component of the interpretation of test scores. One method for setting standards that is prevalent in educational testing programs--the Bookmark method--is intended to be a less cognitively complex alternative to methods…

Descriptors: Standard Setting (Scoring), Cutting Scores, Educational Testing, Licensing Examinations (Professions)

Applying Rasch Model and Generalizability Theory to Study Modified-Angoff Cut Scores

Peer reviewed

Direct link

Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012

The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…

Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling

Can Differential Rapid-Guessing Behavior Lead to Differential Item Functioning?

Peer reviewed

Direct link

DeMars, Christine E.; Wise, Steven L. – International Journal of Testing, 2010

This investigation examined whether different rates of rapid guessing between groups could lead to detectable levels of differential item functioning (DIF) in situations where the item parameters were the same for both groups. Two simulation studies were designed to explore this possibility. The groups in Study 1 were simulated to reflect…

Descriptors: Guessing (Tests), Test Bias, Motivation, Gender Differences

Defining and Comparing the Reading Comprehension Construct: A Cognitive-Psychometric Modeling Approach

Peer reviewed

Direct link

Svetina, Dubravka; Gorin, Joanna S.; Tatsuoka, Kikumi K. – International Journal of Testing, 2011

As a construct definition, the current study develops a cognitive model describing the knowledge, skills, and abilities measured by critical reading test items on a high-stakes assessment used for selection decisions in the United States. Additionally, in order to establish generalizability of the construct meaning to other similarly structured…

Descriptors: Reading Tests, Reading Comprehension, Critical Reading, Test Items

High Stakes Tests with Self-Selected Essay Questions: Addressing Issues of Fairness

Peer reviewed

Direct link

Lamprianou, Iasonas – International Journal of Testing, 2008

This study investigates the effect of reporting the unadjusted raw scores in a high-stakes language exam when raters differ significantly in severity and self-selected questions differ significantly in difficulty. More sophisticated models, introducing meaningful facets and parameters, are successively used to investigate the characteristics of…

Descriptors: High Stakes Tests, Raw Scores, Item Response Theory, Language Tests

Which Item Types Are Better Suited to the Linking of Verbal Adapted Tests?

Peer reviewed

Direct link

Allalouf, Avi; Rapp, Joel; Stoller, Reuven – International Journal of Testing, 2009

When a test is adapted from a source language (SL) into a target language (TL), the two forms are usually not psychometrically equivalent. If linking between test forms is necessary, those items that have had their psychometric characteristics altered by the translation (differential item functioning [DIF] items) should be eliminated from the…

Descriptors: Test Items, Test Format, Verbal Tests, Psychometrics

Previous Page | Next Page »

Pages: 1 | 2

Allalouf, Avi	1
Arce, Alvaro J.	1
Aryadoust, Vahid	1
Baghaei, Purya	1
Bryant, Damon U.	1
Buckendahl, Chad W.	1
Bulut, Okan	1
Cole, Ki Lynn	1
Davis-Becker, Susan L.	1
DeMars, Christine E.	1
Emons, Wilco H. M.	1
Garcia, Paula	1
Gerrow, Jack	1
Gorin, Joanna S.	1
He, Qingping	1
Holmes, Stephen D.	1
Huggins-Manley, Anne Corinne	1
Jamieson, Joan	1
Kan, Adnan	1
Kim, Sohee	1
Lamprianou, Iasonas	1
Li, Min	1
Meadows, Michelle	1
Michal Kohút	1
Mwavita, Mwarumba	1
More ▼