ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	39

Descriptor

Comparative Analysis	63
Equated Scores	32
Scores	24
Item Response Theory	18
Test Items	15
Statistical Analysis	10
Simulation	9
Test Format	9
College Entrance Examinations	8
Mathematical Models	7
Testing	7
Achievement Tests	6
Cutting Scores	6
Error of Measurement	6
Estimation (Mathematics)	6
Evaluation Methods	6
Mathematics Tests	6
Test Construction	6
Accuracy	5
Difficulty Level	5
Item Analysis	5
Latent Trait Theory	5
Multiple Choice Tests	5
Reading Tests	5
Scoring	5
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	60
Reports - Research	39
Reports - Evaluative	19
Reports - Descriptive	3
Information Analyses	1

Education Level

Secondary Education	2
High Schools	1
Higher Education	1
Postsecondary Education	1

Audience

Practitioners

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	3
General Educational…	1
Indiana Statewide Testing for…	1
Iowa Tests of Educational…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 63 results Save | Export

Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates

Peer reviewed

Direct link

Moses, Tim – Journal of Educational Measurement, 2022

One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different…

Descriptors: Measures (Individuals), Educational Assessment, Test Construction, Comparative Analysis

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Peer reviewed

Direct link

Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022

While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…

Descriptors: Scoring, Testing, Test Items, Test Format

The Impact of Cheating on Score Comparability via Pool-Based IRT Pre-Equating

Peer reviewed

Direct link

Liu, Jinghua; Becker, Kirk – Journal of Educational Measurement, 2022

For any testing programs that administer multiple forms across multiple years, maintaining score comparability via equating is essential. With continuous testing and high-stakes results, especially with less secure online administrations, testing programs must consider the potential for cheating on their exams. This study used empirical and…

Descriptors: Cheating, Item Response Theory, Scores, High Stakes Tests

Score Comparability Issues with At-Home Testing and How to Address Them

Peer reviewed

Direct link

Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022

As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…

Descriptors: Scores, Scoring, Comparative Analysis, Testing

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

Scale Alignment in Between-Item Multidimensional Rasch Models

Peer reviewed

Direct link

Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019

Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…

Descriptors: Item Response Theory, Models, Scores, Comparative Analysis

Structural Zeros and Their Implications with Log-Linear Bivariate Presmoothing under the Internal-Anchor Design

Peer reviewed

Direct link

Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2017

In equating, when common items are internal and scoring is conducted in terms of the number of correct items, some pairs of total scores ("X") and common-item scores ("V") can never be observed in a bivariate distribution of "X" and "V"; these pairs are called "structural zeros." This simulation…

Descriptors: Test Items, Equated Scores, Comparative Analysis, Methods

Stabilizing Conditional Standard Errors of Measurement in Scale Score Transformations

Peer reviewed

Direct link

Moses, Tim; Kim, YoungKoung – Journal of Educational Measurement, 2017

The focus of this article is on scale score transformations that can be used to stabilize conditional standard errors of measurement (CSEMs). Three transformations for stabilizing the estimated CSEMs are reviewed, including the traditional arcsine transformation, a recently developed general variance stabilization transformation, and a new method…

Descriptors: Error of Measurement, Scores, Comparative Analysis, Item Response Theory

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Peer reviewed

Direct link

Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018

Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…

Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)

Diagnostic Profiles: A Standard Setting Method for Use with a Cognitive Diagnostic Model

Peer reviewed

Direct link

Skaggs, Gary; Hein, Serge F.; Wilkins, Jesse L. M. – Journal of Educational Measurement, 2016

This article introduces the Diagnostic Profiles (DP) standard setting method for setting a performance standard on a test developed from a cognitive diagnostic model (CDM), the outcome of which is a profile of mastered and not-mastered skills or attributes rather than a single test score. In the DP method, the key judgment task for panelists is a…

Descriptors: Models, Standard Setting, Profiles, Diagnostic Tests

How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational Measurement, 2017

Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…

Descriptors: Goodness of Fit, Testing, Test Items, Scores

Equating with Miditests Using IRT

Peer reviewed

Direct link

Fitzpatrick, Joseph; Skorupski, William P. – Journal of Educational Measurement, 2016

The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…

Descriptors: Difficulty Level, Test Items, Comparative Analysis, Test Construction

Non-Numeric Intrajudge Consistency Feedback in an Angoff Procedure

Peer reviewed

Direct link

Harrison, George M. – Journal of Educational Measurement, 2015

The credibility of standard-setting cut scores depends in part on two sources of consistency evidence: intrajudge and interjudge consistency. Although intrajudge consistency feedback has often been provided to Angoff judges in practice, more evidence is needed to determine whether it achieves its intended effect. In this randomized experiment with…

Descriptors: Interrater Reliability, Standard Setting (Scoring), Cutting Scores, Feedback (Response)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Kolen, Michael J.	5
Liu, Jinghua	5
Moses, Tim	4
Sinharay, Sandip	4
Kim, Sooyeon	3
Puhan, Gautam	3
von Davier, Alina A.	3
Baldwin, Peter	2
Brennan, Robert L.	2
Chen, Haiwen	2
Clauser, Brian E.	2
Harris, Deborah J.	2
Holland, Paul W.	2
Kane, Michael T.	2
Lee, Won-Chan	2
Wiberg, Marie	2
Al-Karni, Ali	1
Albano, Anthony D.	1
Baker, Frank B.	1
Becker, Kirk	1
Borglum, Joshua	1
Bucak, Deniz	1
Cahn, Miriam F.	1
Choi, Jiwon	1
More ▼