ERIC - Search Results

Publication Date

In 2025	3
Since 2024	5
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	24

Descriptor

Error of Measurement	26
Item Response Theory	10
Test Items	9
Scores	8
Foreign Countries	7
Psychometrics	7
Evaluation Methods	6
Comparative Analysis	5
Factor Analysis	5
Reliability	5
Simulation	5
Achievement Tests	4
Correlation	4
Item Analysis	4
Measurement	4
Measurement Techniques	4
Sampling	4
Test Construction	4
Cross Cultural Studies	3
Cultural Differences	3
Culture Fair Tests	3
Difficulty Level	3
Factor Structure	3
Generalizability Theory	3
Models	3
More ▼

Source

International Journal of…

Publication Type

Journal Articles	26
Reports - Research	19
Reports - Evaluative	4
Reports - Descriptive	2
Book/Product Reviews	1

Education Level

Higher Education	4
Secondary Education	3
Elementary Secondary Education	2
Postsecondary Education	2
Grade 3	1
Grade 5	1
Grade 7	1

Audience

Location

Australia	1
Ghana	1
Greece	1
Ireland (Dublin)	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	3
ACT Assessment	1
Cognitive Abilities Test	1
Depression Anxiety and Stress…	1
Iowa Tests of Basic Skills	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Combining Mokken Scale Analysis with Rasch Measurement Theory to Explore Differences in Measurement Quality between Subgroups

Peer reviewed

Direct link

Stefanie A. Wind; Benjamin Lugu; Yurou Wang – International Journal of Testing, 2025

Mokken Scale Analysis (MSA) is a nonparametric approach that offers exploratory tools for understanding the nature of item responses while emphasizing invariance requirements. MSA is often discussed as it relates to Rasch measurement theory, which also emphasizes invariance, but uses parametric models. Researchers who have compared and combined…

Descriptors: Item Response Theory, Scaling, Surveys, Evaluation Methods

Detecting Differential Item Functioning with Multiple Causes: A Comparison of Three Methods

Peer reviewed

Direct link

Xiaowen Liu – International Journal of Testing, 2024

Differential item functioning (DIF) often arises from multiple sources. Within the context of multidimensional item response theory, this study examined DIF items with varying secondary dimensions using the three DIF methods: SIBTEST, Mantel-Haenszel, and logistic regression. The effect of the number of secondary dimensions on DIF detection rates…

Descriptors: Item Analysis, Test Items, Item Response Theory, Correlation

Psychometric Properties of the Depression, Anxiety, and Stress Scale-21 (DASS-21) across Nine Countries/Regions

Peer reviewed

Direct link

Cristian Zanon; Nan Zhao; Nursel Topkaya; Ertugrul Sahin; David L. Vogel; Melissa M. Ertl; Samineh Sanatkar; Hsin-Ya Liao; Mark Rubin; Makilim N. Baptista; Winnie W. S. Mak; Fatima Rashed Al-Darmaki; Georg Schomerus; Ying-Fen Wang; Dalia Nasvytiene – International Journal of Testing, 2025

Examinations of the internal structure of the Depression, Anxiety, and Stress Scale-21 (DASS-21) have yielded inconsistent conclusions within and across cultural contexts. This study examined the dimensionality and reliability of the DASS-21 across three theoretically plausible factor structures (i.e., unidimensional, oblique three-factor, and…

Descriptors: Anxiety, Depression (Psychology), Psychometrics, Cultural Context

Measurement Invariance across Immigrant and Nonimmigrant Populations on PISA Non-Cognitive Scales

Peer reviewed

Direct link

Maritza Casas; Stephen G. Sireci – International Journal of Testing, 2025

In this study, we take a critical look at the degree to which the measurement of bullying and sense of belonging at school is invariant across groups of students defined by immigrant status. Our study focuses on the invariance of these constructs as measured on a recent PISA administration and includes a discussion of two statistical methods for…

Descriptors: Error of Measurement, Immigrants, Peer Groups, Bullying

Goal Orientation in Job Search: Psychometric Characteristics and Construct Validation across Job Search Contexts

Peer reviewed

Direct link

Affum-Osei, Emmanuel; Mensah, Henry Kofi; Forkuoh, Solomon Kwarteng; Asante, Eric Adom – International Journal of Testing, 2021

The purpose of this study was to examine the psychometric properties of the goal orientation (GO) scale across job search contexts to facilitate its use in large and varied search settings. A sample of 720 job losers and new entrants' job seekers in Ghana completed the survey. Confirmatory factor analysis supported the three-factor theoretical…

Descriptors: Goal Orientation, Job Search Methods, Psychometrics, Factor Analysis

Beyond Group Comparisons: Accounting for Intersectional Sources of Bias in International Survey Measures

Peer reviewed

Direct link

Rujun Xu; James Soland – International Journal of Testing, 2024

International surveys are increasingly being used to understand nonacademic outcomes like math and science motivation, and to inform education policy changes within countries. Such instruments assume that the measure works consistently across countries, ethnicities, and languages--that is, they assume measurement invariance. While studies have…

Descriptors: Surveys, Statistical Bias, Achievement Tests, Foreign Countries

FIPC Linking across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions

Peer reviewed

Direct link

Kim, Sohee; Cole, Ki Lynn; Mwavita, Mwarumba – International Journal of Testing, 2018

This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The…

Descriptors: Test Items, Item Response Theory, Test Format, Difficulty Level

Item Parameter Drift in Computer Adaptive Testing Due to Lack of Content Knowledge

Peer reviewed

Direct link

Aksu Dunya, Beyza – International Journal of Testing, 2018

This study was conducted to analyze potential item parameter drift (IPD) impact on person ability estimates and classification accuracy when drift affects an examinee subgroup. Using a series of simulations, three factors were manipulated: (a) percentage of IPD items in the CAT exam, (b) percentage of examinees affected by IPD, and (c) item pool…

Descriptors: Adaptive Testing, Classification, Accuracy, Computer Assisted Testing

Animated Videos in Assessment: Comparing Validity Evidence from and Test-Takers' Reactions to an Animated and a Text-Based Situational Judgment Test

Peer reviewed

Direct link

Karakolidis, Anastasios; O'Leary, Michael; Scully, Darina – International Journal of Testing, 2021

The linguistic complexity of many text-based tests can be a source of construct-irrelevant variance, as test-takers' performance may be affected by factors that are beyond the focus of the assessment itself, such as reading comprehension skills. This experimental study examined the extent to which the use of animated videos, as opposed to written…

Descriptors: Animation, Vignettes, Video Technology, Test Format

Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

Peer reviewed

Direct link

Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

Descriptors: Test Bias, Test Reliability, Performance, Scores

Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015

The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping

Conditional Standard Errors of Measurement for Composite Scores Using IRT

Peer reviewed

Direct link

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan – International Journal of Testing, 2012

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…

Descriptors: Mathematics Tests, Achievement Tests, College Entrance Examinations, Error of Measurement

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Comparing OECD PISA Reading in English to Other Languages: Identifying Potential Sources of Non-Invariance

Peer reviewed

Direct link

Asil, Mustafa; Brown, Gavin T. L. – International Journal of Testing, 2016

The use of the Programme for International Student Assessment (PISA) across nations, cultures, and languages has been criticized. The key criticisms point to the linguistic and cultural biases potentially underlying the design of reading comprehension tests, raising doubts about the legitimacy of comparisons across economies. Our research focused…

Descriptors: Comparative Analysis, Reading Achievement, Achievement Tests, Secondary School Students

Review of Sample Size for Structural Equation Models in Second Language Testing and Learning Research: A Monte Carlo Approach

Peer reviewed

Direct link

In'nami, Yo; Koizumi, Rie – International Journal of Testing, 2013

The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of…

Descriptors: Structural Equation Models, Sample Size, Second Language Instruction, Monte Carlo Methods

Previous Page | Next Page »

Pages: 1 | 2

Sijtsma, Klaas	2
Affum-Osei, Emmanuel	1
Aksu Dunya, Beyza	1
Arce, Alvaro J.	1
Asante, Eric Adom	1
Asil, Mustafa	1
Backhoff, Eduardo	1
Benjamin Lugu	1
Brown, Allison R.	1
Brown, Gavin T. L.	1
Cole, Ki Lynn	1
Contreras-Nino, Luis Angel	1
Cristian Zanon	1
Dalia Nasvytiene	1
David L. Vogel	1
DeMars, Christine E.	1
Dirkzwager, Arie	1
Duong, Minh Q.	1
Elosua, Paula	1
Emons, Wilco H. M.	1
Ertugrul Sahin	1
Fatima Rashed Al-Darmaki	1
Finney, Sara J.	1
Forkuoh, Solomon Kwarteng	1
Foster, Jeff L.	1
More ▼