NotesFAQContact Us
Collection
Advanced
Search Tips
Source
International Journal of…48
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 48 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Stefanie A. Wind; Benjamin Lugu; Yurou Wang – International Journal of Testing, 2025
Mokken Scale Analysis (MSA) is a nonparametric approach that offers exploratory tools for understanding the nature of item responses while emphasizing invariance requirements. MSA is often discussed as it relates to Rasch measurement theory, which also emphasizes invariance, but uses parametric models. Researchers who have compared and combined…
Descriptors: Item Response Theory, Scaling, Surveys, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025
This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…
Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Badham, Louise; Furlong, Antony – International Journal of Testing, 2023
Multilingual summative assessments face significant challenges due to tensions that exist between multiple language provision and comparability. Yet, conventional approaches for investigating comparability in multilingual assessments fail to accommodate assessments that comprise extended responses that target complex constructs. This article…
Descriptors: Summative Evaluation, Multilingualism, Comparative Analysis, Literature
Peer reviewed Peer reviewed
Direct linkDirect link
Maritza Casas; Stephen G. Sireci – International Journal of Testing, 2025
In this study, we take a critical look at the degree to which the measurement of bullying and sense of belonging at school is invariant across groups of students defined by immigrant status. Our study focuses on the invariance of these constructs as measured on a recent PISA administration and includes a discussion of two statistical methods for…
Descriptors: Error of Measurement, Immigrants, Peer Groups, Bullying
Peer reviewed Peer reviewed
Direct linkDirect link
Morris, Scott B.; Bass, Michael; Howard, Elizabeth; Neapolitan, Richard E. – International Journal of Testing, 2020
The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the "SE" is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative…
Descriptors: Computer Assisted Testing, Adaptive Testing, Item Banks, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Oliveri, María Elena; Lawless, René; Mislevy, Robert J. – International Journal of Testing, 2019
Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such…
Descriptors: Cooperation, Problem Solving, Evidence Based Practice, 21st Century Skills
Peer reviewed Peer reviewed
Direct linkDirect link
Holmes, Stephen D.; Meadows, Michelle; Stockford, Ian; He, Qingping – International Journal of Testing, 2018
The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers…
Descriptors: Foreign Countries, Secondary School Students, Mathematics Tests, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020
Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…
Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level
Peer reviewed Peer reviewed
Direct linkDirect link
Briggs, Derek C.; Circi, Ruhan – International Journal of Testing, 2017
Artificial Neural Networks (ANNs) have been proposed as a promising approach for the classification of students into different levels of a psychological attribute hierarchy. Unfortunately, because such classifications typically rely upon internally produced item response patterns that have not been externally validated, the instability of ANN…
Descriptors: Artificial Intelligence, Classification, Student Evaluation, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Rutkowski, Leslie; Rutkowski, David; Zhou, Yan – International Journal of Testing, 2016
Using an empirically-based simulation study, we show that typically used methods of choosing an item calibration sample have significant impacts on achievement bias and system rankings. We examine whether recent PISA accommodations, especially for lower performing participants, can mitigate some of this bias. Our findings indicate that standard…
Descriptors: Simulation, International Programs, Adolescents, Student Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Peer reviewed Peer reviewed
Direct linkDirect link
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Peer reviewed Peer reviewed
Direct linkDirect link
In'nami, Yo; Koizumi, Rie – International Journal of Testing, 2013
The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of…
Descriptors: Structural Equation Models, Sample Size, Second Language Instruction, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Lindley, Patricia A.; Bartram, Dave – International Journal of Testing, 2012
In this article, we present the background to the development of test reviewing by the British Psychological Society (BPS) in the United Kingdom. We also describe the role played by the BPS in the development of the EFPA test review model and its adaptation for use in test reviewing in the United Kingdom. We conclude with a discussion of lessons…
Descriptors: Test Reviews, Professional Associations, Psychology, Global Approach
Peer reviewed Peer reviewed
Direct linkDirect link
Carlson, Janet F.; Geisinger, Kurt F. – International Journal of Testing, 2012
The test review process used by the Buros Center for Testing is described as a series of 11 steps: (1) identifying tests to be reviewed, (2) obtaining tests and preparing test descriptions, (3) determining whether tests meet review criteria, (4) identifying appropriate reviewers, (5) selecting reviewers, (6) sending instructions and materials to…
Descriptors: Testing, Test Reviews, Evaluation Methods, Evaluation Criteria
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4