Publication Date
In 2025 | 3 |
Since 2024 | 11 |
Since 2021 (last 5 years) | 29 |
Since 2016 (last 10 years) | 56 |
Since 2006 (last 20 years) | 105 |
Descriptor
Source
Journal of Educational… | 208 |
Author
Sinharay, Sandip | 9 |
Bridgeman, Brent | 8 |
Clauser, Brian E. | 6 |
Dorans, Neil J. | 5 |
Lee, Won-Chan | 5 |
Lewis, Charles | 5 |
McCaffrey, Daniel F. | 5 |
Kolen, Michael J. | 4 |
Wainer, Howard | 4 |
Brennan, Robert L. | 3 |
DeCarlo, Lawrence T. | 3 |
More ▼ |
Publication Type
Journal Articles | 192 |
Reports - Research | 115 |
Reports - Evaluative | 48 |
Reports - Descriptive | 21 |
Opinion Papers | 9 |
Speeches/Meeting Papers | 3 |
Book/Product Reviews | 2 |
Information Analyses | 2 |
Reports - General | 1 |
Education Level
Higher Education | 10 |
Postsecondary Education | 9 |
Secondary Education | 8 |
High Schools | 7 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 4 | 1 |
Grade 8 | 1 |
Grade 9 | 1 |
Audience
Researchers | 3 |
Location
United States | 2 |
Belgium | 1 |
Colombia | 1 |
Turkey | 1 |
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Ercikan, Kadriye; McCaffrey, Daniel F. – Journal of Educational Measurement, 2022
Artificial-intelligence-based automated scoring is often an afterthought and is considered after assessments have been developed, resulting in nonoptimal possibility of implementing automated scoring solutions. In this article, we provide a review of Artificial intelligence (AI)-based methodologies for scoring in educational assessments. We then…
Descriptors: Artificial Intelligence, Automation, Scores, Educational Assessment
Sinharay, Sandip – Journal of Educational Measurement, 2023
Technical difficulties and other unforeseen events occasionally lead to incomplete data on educational tests, which necessitates the reporting of imputed scores to some examinees. While there exist several approaches for reporting imputed scores, there is a lack of any guidance on the reporting of the uncertainty of imputed scores. In this paper,…
Descriptors: Evaluation Methods, Scores, Standardized Tests, Simulation
Jianbin Fu; Xuan Tan; Patrick C. Kyllonen – Journal of Educational Measurement, 2024
This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's…
Descriptors: Questionnaires, Test Items, Item Response Theory, Models
Johnson, Matthew S.; Liu, Xiang; McCaffrey, Daniel F. – Journal of Educational Measurement, 2022
With the increasing use of automated scores in operational testing settings comes the need to understand the ways in which they can yield biased and unfair results. In this paper, we provide a brief survey of some of the ways in which the predictive methods used in automated scoring can lead to biased, and thus unfair automated scores. After…
Descriptors: Psychometrics, Measurement Techniques, Bias, Automation
Gorney, Kylie; Wollack, James A. – Journal of Educational Measurement, 2023
In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the l[subscript z] and l*[subscript z] person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through…
Descriptors: Test Items, Scores, Goodness of Fit, Statistics
Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025
Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…
Descriptors: Models, Test Items, Educational Assessment, Scores
Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022
In operational testing, item response theory (IRT) models for dichotomous responses are popular for measuring a single latent construct [theta], such as cognitive ability in a content domain. Estimates of [theta], also called IRT scores or [theta hat], can be computed using estimators based on the likelihood function, such as maximum likelihood…
Descriptors: Scores, Item Response Theory, Test Items, Test Format
Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020
The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…
Descriptors: Classification, Accuracy, Scores, Cutting Scores
Sijia Huang; Seungwon Chung; Carl F. Falk – Journal of Educational Measurement, 2024
In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of…
Descriptors: Response Style (Tests), Classification, Data, Models
Setzer, J. Carl; Cheng, Ying; Liu, Cheng – Journal of Educational Measurement, 2023
Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two…
Descriptors: Classification, Accuracy, Psychometrics, Scores
Corinne Huggins-Manley; Anthony W. Raborn; Peggy K. Jones; Ted Myers – Journal of Educational Measurement, 2024
The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the…
Descriptors: Nonparametric Statistics, Test Bias, Scores, Statistical Significance
Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021
For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…
Descriptors: Scores, Regression (Statistics), Demography, Data
Sandip Sinharay; Matthew S. Johnson – Journal of Educational Measurement, 2024
Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a…
Descriptors: Culturally Relevant Education, Evaluation, Equal Education, Disadvantaged
Lee, Yi-Hsuan; Haberman, Shelby J.; Dorans, Neil J. – Journal of Educational Measurement, 2019
In many educational tests, both multiple-choice (MC) and constructed-response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form-specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In…
Descriptors: Scores, Multiple Choice Tests, Test Items, Responses