Publication Date
In 2025 | 3 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 24 |
Descriptor
Error of Measurement | 26 |
Item Response Theory | 10 |
Test Items | 9 |
Scores | 8 |
Foreign Countries | 7 |
Psychometrics | 7 |
Evaluation Methods | 6 |
Comparative Analysis | 5 |
Factor Analysis | 5 |
Reliability | 5 |
Simulation | 5 |
More ▼ |
Source
International Journal of… | 26 |
Author
Sijtsma, Klaas | 2 |
Affum-Osei, Emmanuel | 1 |
Aksu Dunya, Beyza | 1 |
Arce, Alvaro J. | 1 |
Asante, Eric Adom | 1 |
Asil, Mustafa | 1 |
Backhoff, Eduardo | 1 |
Benjamin Lugu | 1 |
Brown, Allison R. | 1 |
Brown, Gavin T. L. | 1 |
Cole, Ki Lynn | 1 |
More ▼ |
Publication Type
Journal Articles | 26 |
Reports - Research | 19 |
Reports - Evaluative | 4 |
Reports - Descriptive | 2 |
Book/Product Reviews | 1 |
Education Level
Higher Education | 4 |
Secondary Education | 3 |
Elementary Secondary Education | 2 |
Postsecondary Education | 2 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
Audience
Location
Australia | 1 |
Ghana | 1 |
Greece | 1 |
Ireland (Dublin) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 3 |
ACT Assessment | 1 |
Cognitive Abilities Test | 1 |
Depression Anxiety and Stress… | 1 |
Iowa Tests of Basic Skills | 1 |
What Works Clearinghouse Rating
Stefanie A. Wind; Benjamin Lugu; Yurou Wang – International Journal of Testing, 2025
Mokken Scale Analysis (MSA) is a nonparametric approach that offers exploratory tools for understanding the nature of item responses while emphasizing invariance requirements. MSA is often discussed as it relates to Rasch measurement theory, which also emphasizes invariance, but uses parametric models. Researchers who have compared and combined…
Descriptors: Item Response Theory, Scaling, Surveys, Evaluation Methods
Xiaowen Liu – International Journal of Testing, 2024
Differential item functioning (DIF) often arises from multiple sources. Within the context of multidimensional item response theory, this study examined DIF items with varying secondary dimensions using the three DIF methods: SIBTEST, Mantel-Haenszel, and logistic regression. The effect of the number of secondary dimensions on DIF detection rates…
Descriptors: Item Analysis, Test Items, Item Response Theory, Correlation
Cristian Zanon; Nan Zhao; Nursel Topkaya; Ertugrul Sahin; David L. Vogel; Melissa M. Ertl; Samineh Sanatkar; Hsin-Ya Liao; Mark Rubin; Makilim N. Baptista; Winnie W. S. Mak; Fatima Rashed Al-Darmaki; Georg Schomerus; Ying-Fen Wang; Dalia Nasvytiene – International Journal of Testing, 2025
Examinations of the internal structure of the Depression, Anxiety, and Stress Scale-21 (DASS-21) have yielded inconsistent conclusions within and across cultural contexts. This study examined the dimensionality and reliability of the DASS-21 across three theoretically plausible factor structures (i.e., unidimensional, oblique three-factor, and…
Descriptors: Anxiety, Depression (Psychology), Psychometrics, Cultural Context
Maritza Casas; Stephen G. Sireci – International Journal of Testing, 2025
In this study, we take a critical look at the degree to which the measurement of bullying and sense of belonging at school is invariant across groups of students defined by immigrant status. Our study focuses on the invariance of these constructs as measured on a recent PISA administration and includes a discussion of two statistical methods for…
Descriptors: Error of Measurement, Immigrants, Peer Groups, Bullying
Affum-Osei, Emmanuel; Mensah, Henry Kofi; Forkuoh, Solomon Kwarteng; Asante, Eric Adom – International Journal of Testing, 2021
The purpose of this study was to examine the psychometric properties of the goal orientation (GO) scale across job search contexts to facilitate its use in large and varied search settings. A sample of 720 job losers and new entrants' job seekers in Ghana completed the survey. Confirmatory factor analysis supported the three-factor theoretical…
Descriptors: Goal Orientation, Job Search Methods, Psychometrics, Factor Analysis
Rujun Xu; James Soland – International Journal of Testing, 2024
International surveys are increasingly being used to understand nonacademic outcomes like math and science motivation, and to inform education policy changes within countries. Such instruments assume that the measure works consistently across countries, ethnicities, and languages--that is, they assume measurement invariance. While studies have…
Descriptors: Surveys, Statistical Bias, Achievement Tests, Foreign Countries
FIPC Linking across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions
Kim, Sohee; Cole, Ki Lynn; Mwavita, Mwarumba – International Journal of Testing, 2018
This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The…
Descriptors: Test Items, Item Response Theory, Test Format, Difficulty Level
Aksu Dunya, Beyza – International Journal of Testing, 2018
This study was conducted to analyze potential item parameter drift (IPD) impact on person ability estimates and classification accuracy when drift affects an examinee subgroup. Using a series of simulations, three factors were manipulated: (a) percentage of IPD items in the CAT exam, (b) percentage of examinees affected by IPD, and (c) item pool…
Descriptors: Adaptive Testing, Classification, Accuracy, Computer Assisted Testing
Karakolidis, Anastasios; O'Leary, Michael; Scully, Darina – International Journal of Testing, 2021
The linguistic complexity of many text-based tests can be a source of construct-irrelevant variance, as test-takers' performance may be affected by factors that are beyond the focus of the assessment itself, such as reading comprehension skills. This experimental study examined the extent to which the use of animated videos, as opposed to written…
Descriptors: Animation, Vignettes, Video Technology, Test Format
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan – International Journal of Testing, 2012
Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
Descriptors: Mathematics Tests, Achievement Tests, College Entrance Examinations, Error of Measurement
Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012
Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…
Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement
Asil, Mustafa; Brown, Gavin T. L. – International Journal of Testing, 2016
The use of the Programme for International Student Assessment (PISA) across nations, cultures, and languages has been criticized. The key criticisms point to the linguistic and cultural biases potentially underlying the design of reading comprehension tests, raising doubts about the legitimacy of comparisons across economies. Our research focused…
Descriptors: Comparative Analysis, Reading Achievement, Achievement Tests, Secondary School Students
In'nami, Yo; Koizumi, Rie – International Journal of Testing, 2013
The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of…
Descriptors: Structural Equation Models, Sample Size, Second Language Instruction, Monte Carlo Methods
Previous Page | Next Page ยป
Pages: 1 | 2