ERIC - Search Results

Publication Date

In 2025	3
Since 2024	3
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	38

Descriptor

Evaluation Methods	48
Comparative Analysis	15
Foreign Countries	15
Psychometrics	14
Global Approach	11
Item Response Theory	10
Models	10
Testing	10
Measurement	9
Measurement Techniques	9
Scores	9
Student Evaluation	9
Test Items	9
Test Bias	8
Cultural Differences	7
Evaluation Research	7
Test Interpretation	7
Test Norms	7
Business Administration	6
Computer Assisted Testing	6
Cultural Context	6
Error of Measurement	6
Evaluation Criteria	6
Managerial Occupations	6
Personality Measures	6
More ▼

Source

International Journal of…

Publication Type

Journal Articles	48
Reports - Research	25
Reports - Descriptive	16
Reports - Evaluative	4
Opinion Papers	2
Guides - General	1
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	7
Elementary Secondary Education	4
Elementary Education	3
Secondary Education	3
Adult Education	2
High Schools	2
Grade 3	1
Grade 4	1
Grade 6	1

Audience

Location

United States	4
United Kingdom (England)	3
Australia	2
Canada	2
Netherlands	2
United Kingdom	2
Belgium	1
China	1
Colombia	1
Denmark	1
Germany	1
Iowa	1
Poland	1
South Africa	1
Spain	1
Sweden	1
Zimbabwe	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	3
Graduate Management Admission…	1
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 48 results Save | Export

Combining Mokken Scale Analysis with Rasch Measurement Theory to Explore Differences in Measurement Quality between Subgroups

Peer reviewed

Direct link

Stefanie A. Wind; Benjamin Lugu; Yurou Wang – International Journal of Testing, 2025

Mokken Scale Analysis (MSA) is a nonparametric approach that offers exploratory tools for understanding the nature of item responses while emphasizing invariance requirements. MSA is often discussed as it relates to Rasch measurement theory, which also emphasizes invariance, but uses parametric models. Researchers who have compared and combined…

Descriptors: Item Response Theory, Scaling, Surveys, Evaluation Methods

IRT Linking Methods for the Bifactor Model with Mixed Format Tests

Peer reviewed

Direct link

Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025

This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…

Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis

Summative Assessments in a Multilingual Context: What Comparative Judgment Reveals about Comparability across Different Languages in Literature

Peer reviewed

Direct link

Badham, Louise; Furlong, Antony – International Journal of Testing, 2023

Multilingual summative assessments face significant challenges due to tensions that exist between multiple language provision and comparability. Yet, conventional approaches for investigating comparability in multilingual assessments fail to accommodate assessments that comprise extended responses that target complex constructs. This article…

Descriptors: Summative Evaluation, Multilingualism, Comparative Analysis, Literature

Measurement Invariance across Immigrant and Nonimmigrant Populations on PISA Non-Cognitive Scales

Peer reviewed

Direct link

Maritza Casas; Stephen G. Sireci – International Journal of Testing, 2025

In this study, we take a critical look at the degree to which the measurement of bullying and sense of belonging at school is invariant across groups of students defined by immigrant status. Our study focuses on the invariance of these constructs as measured on a recent PISA administration and includes a discussion of two statistical methods for…

Descriptors: Error of Measurement, Immigrants, Peer Groups, Bullying

Stopping Rules for Computer Adaptive Testing When Item Banks Have Nonuniform Information

Peer reviewed

Direct link

Morris, Scott B.; Bass, Michael; Howard, Elizabeth; Neapolitan, Richard E. – International Journal of Testing, 2020

The standard error (SE) stopping rule, which terminates a computer adaptive test (CAT) when the "SE" is less than a threshold, is effective when there are informative questions for all trait levels. However, in domains such as patient-reported outcomes, the items in a bank might all target one end of the trait continuum (e.g., negative…

Descriptors: Computer Assisted Testing, Adaptive Testing, Item Banks, Item Response Theory

Using Evidence-Centered Design to Support the Development of Culturally and Linguistically Sensitive Collaborative Problem-Solving Assessments

Peer reviewed

Direct link

Oliveri, María Elena; Lawless, René; Mislevy, Robert J. – International Journal of Testing, 2019

Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such…

Descriptors: Cooperation, Problem Solving, Evidence Based Practice, 21st Century Skills

Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling

Peer reviewed

Direct link

Holmes, Stephen D.; Meadows, Michelle; Stockford, Ian; He, Qingping – International Journal of Testing, 2018

The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers…

Descriptors: Foreign Countries, Secondary School Students, Mathematics Tests, Test Items

Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment

Peer reviewed

Direct link

Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020

Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…

Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level

Challenges to the Use of Artificial Neural Networks for Diagnostic Classifications with Student Test Data

Peer reviewed

Direct link

Briggs, Derek C.; Circi, Ruhan – International Journal of Testing, 2017

Artificial Neural Networks (ANNs) have been proposed as a promising approach for the classification of students into different levels of a psychological attribute hierarchy. Unfortunately, because such classifications typically rely upon internally produced item response patterns that have not been externally validated, the instability of ANN…

Descriptors: Artificial Intelligence, Classification, Student Evaluation, Tests

Item Calibration Samples and the Stability of Achievement Estimates and System Rankings: Another Look at the PISA Model

Peer reviewed

Direct link

Rutkowski, Leslie; Rutkowski, David; Zhou, Yan – International Journal of Testing, 2016

Using an empirically-based simulation study, we show that typically used methods of choosing an item calibration sample have significant impacts on achievement bias and system rankings. We examine whether recent PISA accommodations, especially for lower performing participants, can mitigate some of this bias. Our findings indicate that standard…

Descriptors: Simulation, International Programs, Adolescents, Student Evaluation

Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015

The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping

ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations

Peer reviewed

Direct link

International Journal of Testing, 2019

These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…

Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage

Review of Sample Size for Structural Equation Models in Second Language Testing and Learning Research: A Monte Carlo Approach

Peer reviewed

Direct link

In'nami, Yo; Koizumi, Rie – International Journal of Testing, 2013

The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of…

Descriptors: Structural Equation Models, Sample Size, Second Language Instruction, Monte Carlo Methods

Use of the EFPA Test Review Model by the UK and Issues Relating to the Internationalization of Test Standards

Peer reviewed

Direct link

Lindley, Patricia A.; Bartram, Dave – International Journal of Testing, 2012

In this article, we present the background to the development of test reviewing by the British Psychological Society (BPS) in the United Kingdom. We also describe the role played by the BPS in the development of the EFPA test review model and its adaptation for use in test reviewing in the United Kingdom. We conclude with a discussion of lessons…

Descriptors: Test Reviews, Professional Associations, Psychology, Global Approach

Test Reviewing at the Buros Center for Testing

Peer reviewed

Direct link

Carlson, Janet F.; Geisinger, Kurt F. – International Journal of Testing, 2012

The test review process used by the Buros Center for Testing is described as a series of 11 steps: (1) identifying tests to be reviewed, (2) obtaining tests and preparing test descriptions, (3) determining whether tests meet review criteria, (4) identifying appropriate reviewers, (5) selecting reviewers, (6) sending instructions and materials to…

Descriptors: Testing, Test Reviews, Evaluation Methods, Evaluation Criteria

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Bartram, Dave	3
Evers, Arne	3
Mislevy, Robert J.	3
Bauer, Malcolm	2
Behrens, John T.	2
Geisinger, Kurt F.	2
Williamson, David M.	2
Zumbo, Bruno D.	2
Allahyar, Maryam	1
Alves, Cecilia	1
Badham, Louise	1
Bank, Jurgen	1
Bass, Michael	1
Beland, Sebastien	1
Benjamin Lugu	1
Bridgeman, Brent	1
Briggs, Derek C.	1
Campillo-Alvarez, Angela	1
Carlson, Janet F.	1
Chernyshenko, Oleksandr S.	1
Circi, Ruhan	1
Cui, Ying	1
DeMark, Sarah F.	1
DeMars, Christine E.	1
Decady, Yves J.	1
More ▼