Publication Date
In 2025 | 22 |
Since 2024 | 51 |
Since 2021 (last 5 years) | 196 |
Since 2016 (last 10 years) | 452 |
Since 2006 (last 20 years) | 772 |
Descriptor
Item Response Theory | 772 |
Test Reliability | 492 |
Test Validity | 294 |
Foreign Countries | 263 |
Test Items | 236 |
Psychometrics | 220 |
Reliability | 210 |
Test Construction | 170 |
Scores | 157 |
Measures (Individuals) | 131 |
Validity | 100 |
More ▼ |
Source
Author
Petscher, Yaacov | 11 |
Schoen, Robert C. | 9 |
Alonzo, Julie | 7 |
Tindal, Gerald | 7 |
Wang, Wen-Chung | 7 |
Wind, Stefanie A. | 7 |
Foorman, Barbara R. | 6 |
Johnson, Evelyn S. | 6 |
Moylan, Laura A. | 6 |
Zheng, Yuzhu | 6 |
Anderson, Daniel | 5 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 3 |
Administrators | 1 |
Practitioners | 1 |
Location
Indonesia | 30 |
Turkey | 26 |
Florida | 17 |
Germany | 16 |
Australia | 15 |
China | 15 |
Malaysia | 15 |
Taiwan | 15 |
United States | 12 |
Hong Kong | 11 |
California | 8 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
American Recovery and… | 1 |
Elementary and Secondary… | 1 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025
This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…
Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items
William C. M. Belzak; Daniel J. Bauer – Journal of Educational and Behavioral Statistics, 2024
Testing for differential item functioning (DIF) has undergone rapid statistical developments recently. Moderated nonlinear factor analysis (MNLFA) allows for simultaneous testing of DIF among multiple categorical and continuous covariates (e.g., sex, age, ethnicity, etc.), and regularization has shown promising results for identifying DIF among…
Descriptors: Test Bias, Algorithms, Factor Analysis, Error of Measurement
Ryan M. Cook; Stefanie A. Wind – Measurement and Evaluation in Counseling and Development, 2024
The purpose of this article is to discuss reliability and precision through the lens of a modern measurement approach, item response theory (IRT). Reliability evidence in the field of counseling is primarily generated using Classical Test Theory (CTT) approaches, although recent studies in the field of counseling have shown the benefits of using…
Descriptors: Item Response Theory, Measurement, Reliability, Accuracy
Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024
This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…
Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction
Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025
Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…
Descriptors: Models, Test Items, Educational Assessment, Scores
Chia-Lin Tsai; Stefanie Wind; Samantha Estrada – Measurement: Interdisciplinary Research and Perspectives, 2025
Researchers who work with ordinal rating scales sometimes encounter situations where the scale categories do not function in the intended or expected way. For example, participants' use of scale categories may result in an empirical difficulty ordering for the categories that does not match what was intended. Likewise, the level of distinction…
Descriptors: Rating Scales, Item Response Theory, Psychometrics, Self Efficacy
Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025
Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…
Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…
Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory
Siqi Huang – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
The goal of this paper is twofold. First, the paper clarifies and elaborates on an important theoretical construct called orientation with respect to understanding in mathematics, which denotes the degree to which students exhibit an inclination towards and demonstrate an earnest concern for understanding in mathematical learning. Second, the…
Descriptors: Mathematics Instruction, Teaching Methods, Problem Solving, Reliability
Thomas W. Frazier; Andrew J. O. Whitehouse; Susan R. Leekam; Sarah J. Carrington; Gail A. Alvares; David W. Evans; Antonio Y. Hardan; Mirko Uljarevic – Journal of Autism and Developmental Disorders, 2024
Purpose: The aim of the present study was to compare scale and conditional reliability derived from item response theory analyses among the most commonly used, as well as several newly developed, observation, interview, and parent-report autism instruments. Methods: When available, data sets were combined to facilitate large sample evaluation.…
Descriptors: Test Reliability, Item Response Theory, Autism Spectrum Disorders, Clinical Diagnosis
Hung-Yu Huang – Educational and Psychological Measurement, 2025
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…
Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Novina Sabila Zahra; Hillman Wirawan – Measurement: Interdisciplinary Research and Perspectives, 2025
Technology development has triggered digital transformation in various organizations, influencing work processes, communication, and innovation. Digital leadership plays a crucial role in directing and managing this transformation. This research aims to develop a new measurement tool for assessing digital leadership using the Rasch Model for…
Descriptors: Leadership, Measures (Individuals), Test Validity, Item Response Theory
Lientje Maas; Matthew J. Madison; Matthieu J. S. Brinkhuis – Grantee Submission, 2024
Diagnostic classification models (DCMs) are psychometric models that yield probabilistic classifications of respondents according to a set of discrete latent variables. The current study examines the recently introduced one-parameter log-linear cognitive diagnosis model (1-PLCDM), which has increased interpretability compared with general DCMs due…
Descriptors: Clinical Diagnosis, Classification, Models, Psychometrics