Publication Date
In 2025 | 2 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 294 |
Since 2006 (last 20 years) | 608 |
Descriptor
Statistical Analysis | 847 |
Test Items | 847 |
Item Response Theory | 225 |
Foreign Countries | 218 |
Item Analysis | 197 |
Difficulty Level | 177 |
Test Construction | 176 |
Comparative Analysis | 167 |
Scores | 153 |
Test Bias | 141 |
Correlation | 127 |
More ▼ |
Source
Author
Sinharay, Sandip | 14 |
Dorans, Neil J. | 8 |
von Davier, Alina A. | 7 |
Guo, Hongwen | 6 |
Holland, Paul W. | 6 |
Raykov, Tenko | 6 |
Chang, Hua-Hua | 5 |
Kim, Sooyeon | 5 |
Liu, Jinghua | 5 |
Livingston, Samuel A. | 5 |
Magis, David | 5 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 30 |
Practitioners | 3 |
Teachers | 3 |
Policymakers | 1 |
Location
Turkey | 31 |
Germany | 15 |
Australia | 13 |
Canada | 11 |
Netherlands | 11 |
Japan | 9 |
Taiwan | 8 |
United States | 8 |
Israel | 7 |
Sweden | 7 |
California | 6 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Jianbin Fu; TsungHan Ho; Xuan Tan – Practical Assessment, Research & Evaluation, 2025
Item parameter estimation using an item response theory (IRT) model with fixed ability estimates is useful in equating with small samples on anchor items. The current study explores the impact of three ability estimation methods (weighted likelihood estimation [WLE], maximum a posteriori [MAP], and posterior ability distribution estimation [PST])…
Descriptors: Item Response Theory, Test Items, Computation, Equated Scores
Leighton, Elizabeth A. – ProQuest LLC, 2022
The use of unidimensional scales that contain both positively and negatively worded items is common in both the educational and psychological fields. However, dimensionality investigations of these instruments often lead to a rejection of the theorized unidimensional model in favor of multidimensional structures, leaving researchers at odds for…
Descriptors: Test Items, Language Usage, Models, Statistical Analysis
Weese, James D.; Turner, Ronna C.; Ames, Allison; Crawford, Brandon; Liang, Xinya – Educational and Psychological Measurement, 2022
A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel-Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item…
Descriptors: Test Bias, Heuristics, Classification, Statistical Analysis
Su, Kun; Henson, Robert A. – Journal of Educational and Behavioral Statistics, 2023
This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of…
Descriptors: Classification, Models, Science Tests, Physics
Wu, Tong; Kim, Stella Y.; Westine, Carl – Educational and Psychological Measurement, 2023
For large-scale assessments, data are often collected with missing responses. Despite the wide use of item response theory (IRT) in many testing programs, however, the existing literature offers little insight into the effectiveness of various approaches to handling missing responses in the context of scale linking. Scale linking is commonly used…
Descriptors: Data Analysis, Responses, Statistical Analysis, Measurement
Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…
Descriptors: Reliability, Scores, Scaling, Statistical Analysis
Guastadisegni, Lucia; Cagnone, Silvia; Moustaki, Irini; Vasdekis, Vassilis – Educational and Psychological Measurement, 2022
This article studies the Type I error, false positive rates, and power of four versions of the Lagrange multiplier test to detect measurement noninvariance in item response theory (IRT) models for binary data under model misspecification. The tests considered are the Lagrange multiplier test computed with the Hessian and cross-product approach,…
Descriptors: Measurement, Statistical Analysis, Item Response Theory, Test Items
Ranger, Jochen; Brauer, Kay – Journal of Educational and Behavioral Statistics, 2022
The generalized S-X[superscript 2]-test is a test of item fit for items with polytomous responses format. The test is based on a comparison of the observed and expected number of responses in strata defined by the test score. In this article, we make four contributions. We demonstrate that the performance of the generalized S-X[superscript 2]-test…
Descriptors: Goodness of Fit, Test Items, Statistical Analysis, Item Response Theory
Wang, Weimeng; Liu, Yang; Liu, Hongyun – Journal of Educational and Behavioral Statistics, 2022
Differential item functioning (DIF) occurs when the probability of endorsing an item differs across groups for individuals with the same latent trait level. The presence of DIF items may jeopardize the validity of an instrument; therefore, it is crucial to identify DIF items in routine operations of educational assessment. While DIF detection…
Descriptors: Test Bias, Test Items, Equated Scores, Regression (Statistics)
Lanrong Li – ProQuest LLC, 2021
When developing a test, it is essential to ensure that the test is free of items with differential item functioning (DIF). DIF occurs when examinees of equal ability, but from different examinee subgroups, have different chances of getting the item correct. According to the multidimensional perspective, DIF occurs because the test measures more…
Descriptors: Test Bias, Test Items, Meta Analysis, Effect Size
Lanrong Li; Betsy Jane Becker – Journal of Educational Measurement, 2021
Differential bundle functioning (DBF) has been proposed to quantify the accumulated amount of differential item functioning (DIF) in an item cluster/bundle (Douglas, Roussos, and Stout). The simultaneous item bias test (SIBTEST, Shealy and Stout) has been used to test for DBF (e.g., Walker, Zhang, and Surber). Research on DBF may have the…
Descriptors: Test Bias, Test Items, Meta Analysis, Effect Size
Demirkaya, Onur; Bezirhan, Ummugul; Zhang, Jinming – Journal of Educational and Behavioral Statistics, 2023
Examinees with item preknowledge tend to obtain inflated test scores that undermine test score validity. With the availability of process data collected in computer-based assessments, the research on detecting item preknowledge has progressed on using both item scores and response times. Item revisit patterns of examinees can also be utilized as…
Descriptors: Test Items, Prior Learning, Knowledge Level, Reaction Time
Mark Wilson – Journal of Educational and Behavioral Statistics, 2024
This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational…
Descriptors: Educational Assessment, Measurement, Standardized Tests, Test Items
Jiajing Huang – ProQuest LLC, 2022
The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…
Descriptors: Item Response Theory, Test Format, Test Items, Test Construction