ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	9
Correlation	9
Foreign Countries	5
Item Response Theory	4
Scores	4
Achievement Tests	2
English (Second Language)	2
Error of Measurement	2
Evaluation Methods	2
Goodness of Fit	2
Language	2
Predictive Validity	2
Reading Tests	2
Test Bias	2
Test Items	2
Accuracy	1
Adaptive Testing	1
Adults	1
Chinese	1
Cognitive Ability	1
Cognitive Measurement	1
Cognitive Tests	1
College Entrance Examinations	1
Computation	1
Computer Assisted Testing	1
More ▼

Source

International Journal of…

Publication Type

Journal Articles	9
Reports - Research	8
Reports - Evaluative	1

Education Level

Secondary Education	2
High Schools	1

Audience

Location

Australia	2
Canada	1
China	1
Denmark	1
Germany	1
Poland	1
Sweden	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
Progress in International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Detecting Differential Item Functioning with Multiple Causes: A Comparison of Three Methods

Peer reviewed

Direct link

Xiaowen Liu – International Journal of Testing, 2024

Differential item functioning (DIF) often arises from multiple sources. Within the context of multidimensional item response theory, this study examined DIF items with varying secondary dimensions using the three DIF methods: SIBTEST, Mantel-Haenszel, and logistic regression. The effect of the number of secondary dimensions on DIF detection rates…

Descriptors: Item Analysis, Test Items, Item Response Theory, Correlation

Effects of Situational Judgment Test Format on Reliability and Validity

Peer reviewed

Direct link

Martin-Raugh, Michelle P.; Anguiano-Carrsaco, Cristina; Jackson, Teresa; Brenneman, Meghan W.; Carney, Lauren; Barnwell, Patrick; Kochert, Jonathan – International Journal of Testing, 2018

Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far…

Descriptors: Test Format, Test Reliability, Test Validity, Predictive Validity

Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment

Peer reviewed

Direct link

Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020

Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…

Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level

Reading Proficiency and Comparability of Mathematics and Science Scores for Students from English and Non-English Backgrounds: An International Perspective

Peer reviewed

Direct link

Ercikan, Kadriye; Chen, Michelle Y.; Lyons-Thomas, Juliette; Goodrich, Shawna; Sandilands, Debra; Roth, Wolff-Michael; Simon, Marielle – International Journal of Testing, 2015

The purpose of this research is to examine the comparability of mathematics and science scores for students from English language backgrounds (ELB) and non-English language backgrounds (NELB). We examine the relationship between English reading proficiency and performance on mathematics and science assessments in Australia, Canada, the United…

Descriptors: Scores, Mathematics Tests, Science Tests, Native Speakers

An Empirical Investigation of Population Invariance in the Value of Subscores

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014

Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…

Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)

Toward Increasing Fairness in Score Scale Calibrations Employed in International Large-Scale Assessments

Peer reviewed

Direct link

Oliveri, Maria Elena; von Davier, Matthias – International Journal of Testing, 2014

In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often…

Descriptors: Test Bias, Scores, International Programs, Educational Assessment

Comparing OECD PISA Reading in English to Other Languages: Identifying Potential Sources of Non-Invariance

Peer reviewed

Direct link

Asil, Mustafa; Brown, Gavin T. L. – International Journal of Testing, 2016

The use of the Programme for International Student Assessment (PISA) across nations, cultures, and languages has been criticized. The key criticisms point to the linguistic and cultural biases potentially underlying the design of reading comprehension tests, raising doubts about the legitimacy of comparisons across economies. Our research focused…

Descriptors: Comparative Analysis, Reading Achievement, Achievement Tests, Secondary School Students

The Applicability of Multidimensional Computerized Adaptive Testing for Cognitive Ability Measurement in Organizational Assessment

Peer reviewed

Direct link

Makransky, Guido; Glas, Cees A. W. – International Journal of Testing, 2013

Cognitive ability tests are widely used in organizations around the world because they have high predictive validity in selection contexts. Although these tests typically measure several subdomains, testing is usually carried out for a single subdomain at a time. This can be ineffective when the subdomains assessed are highly correlated. This…

Descriptors: Foreign Countries, Cognitive Ability, Adaptive Testing, Feedback (Response)

Linking Scores from Tests of Similar Content Given in Different Languages: An Illustration Involving Methodological Alternatives

Peer reviewed

Direct link

Cascallar, Alicia S.; Dorans, Neil J. – International Journal of Testing, 2005

This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…

Descriptors: Prediction, Correlation, Scores, Multilingual Materials

Anguiano-Carrsaco, Cristina	1
Asil, Mustafa	1
Barnwell, Patrick	1
Brenneman, Meghan W.	1
Brown, Gavin T. L.	1
Carney, Lauren	1
Cascallar, Alicia S.	1
Chen, Michelle Y.	1
Dorans, Neil J.	1
Ercikan, Kadriye	1
Glas, Cees A. W.	1
Goodrich, Shawna	1
Guo, Xiuyan	1
Haberman, Shelby J.	1
Jackson, Teresa	1
Kochert, Jonathan	1
Lei, Pui-Wa	1
Lyons-Thomas, Juliette	1
Makransky, Guido	1
Martin-Raugh, Michelle P.	1
Oliveri, Maria Elena	1
Roth, Wolff-Michael	1
Sandilands, Debra	1
Simon, Marielle	1
Sinharay, Sandip	1
More ▼