Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 19 |
| Since 2007 (last 20 years) | 42 |
Descriptor
| Accuracy | 44 |
| Statistical Analysis | 44 |
| Test Items | 44 |
| Item Response Theory | 21 |
| Models | 11 |
| Classification | 10 |
| Comparative Analysis | 10 |
| Computation | 10 |
| Difficulty Level | 9 |
| Sample Size | 9 |
| Equated Scores | 8 |
| More ▼ | |
Source
Author
| Benton, Tom | 2 |
| Chang, Hua-Hua | 2 |
| Abad, Francisco José | 1 |
| Abayeva, Nella F. | 1 |
| Alahmadi, Sarah | 1 |
| Albano, Anthony D. | 1 |
| Ashwell, Tim | 1 |
| Attali, Yigal | 1 |
| Babcock, Ben | 1 |
| Barry, Carol L. | 1 |
| Bashkov, Bozhidar M. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 39 |
| Reports - Research | 39 |
| Dissertations/Theses -… | 4 |
| Reports - Evaluative | 1 |
| Tests/Questionnaires | 1 |
Audience
Location
| California | 1 |
| Germany | 1 |
| Japan | 1 |
| Japan (Tokyo) | 1 |
| Kyrgyzstan | 1 |
| Mississippi | 1 |
| Philippines | 1 |
| Tennessee | 1 |
| Turkey | 1 |
| United Kingdom (Wales) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Graduate Record Examinations | 2 |
| Test of English as a Foreign… | 2 |
| Program for International… | 1 |
| SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Haeju Lee; Kyung Yong Kim – Journal of Educational Measurement, 2025
When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g.,…
Descriptors: Test Items, Equated Scores, Test Bias, Accuracy
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Wu, Tong; Kim, Stella Y.; Westine, Carl – Educational and Psychological Measurement, 2023
For large-scale assessments, data are often collected with missing responses. Despite the wide use of item response theory (IRT) in many testing programs, however, the existing literature offers little insight into the effectiveness of various approaches to handling missing responses in the context of scale linking. Scale linking is commonly used…
Descriptors: Data Analysis, Responses, Statistical Analysis, Measurement
Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023
Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…
Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items
Benton, Tom – Research Matters, 2020
This article reviews the evidence on the extent to which experts' perceptions of item difficulties, captured using comparative judgement, can predict empirical item difficulties. This evidence is drawn from existing published studies on this topic and also from statistical analysis of data held by Cambridge Assessment. Having reviewed the…
Descriptors: Test Items, Difficulty Level, Expertise, Comparative Analysis
Nájera, Pablo; Sorrel, Miguel A.; Abad, Francisco José – Educational and Psychological Measurement, 2019
Cognitive diagnosis models (CDMs) are latent class multidimensional statistical models that help classify people accurately by using a set of discrete latent variables, commonly referred to as attributes. These models require a Q-matrix that indicates the attributes involved in each item. A potential problem is that the Q-matrix construction…
Descriptors: Matrices, Statistical Analysis, Models, Classification
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Chou, Winston; Imai, Kosuke; Rosenfeld, Bryn – Sociological Methods & Research, 2020
Scholars increasingly rely on indirect questioning techniques to reduce social desirability bias and item nonresponse for sensitive survey questions. The major drawback of these approaches, however, is their inefficiency relative to direct questioning. We show how to improve the statistical analysis of the list experiment, randomized response…
Descriptors: Surveys, Test Items, Questioning Techniques, Statistical Analysis
Sünbül, Seçil Ömür – International Journal of Evaluation and Research in Education, 2018
In this study, it was aimed to investigate the impact of different missing data handling methods on DINA model parameter estimation and classification accuracy. In the study, simulated data were used and the data were generated by manipulating the number of items and sample size. In the generated data, two different missing data mechanisms…
Descriptors: Data, Test Items, Sample Size, Statistical Analysis
Qiu, Yuxi; Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2019
This study aimed to assess the accuracy of the empirical item characteristic curve (EICC) preequating method given the presence of test speededness. The simulation design of this study considered the proportion of speededness, speededness point, speededness rate, proportion of missing on speeded items, sample size, and test length. After crossing…
Descriptors: Accuracy, Equated Scores, Test Items, Nonparametric Statistics
Bashkov, Bozhidar M.; Clauser, Jerome C. – Practical Assessment, Research & Evaluation, 2019
Successful testing programs rely on high-quality test items to produce reliable scores and defensible exams. However, determining what statistical screening criteria are most appropriate to support these goals can be daunting. This study describes and demonstrates cost-benefit analysis as an empirical approach to determining appropriate screening…
Descriptors: Test Items, Test Reliability, Evaluation Criteria, Accuracy
Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017
Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…
Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items
Kalkan, Ömür Kaya; Kara, Yusuf; Kelecioglu, Hülya – International Journal of Assessment Tools in Education, 2018
Missing data is a common problem in datasets that are obtained by administration of educational and psychological tests. It is widely known that existence of missing observations in data can lead to serious problems such as biased parameter estimates and inflation of standard errors. Most of the missing data imputation methods are focused on…
Descriptors: Item Response Theory, Statistical Analysis, Data, Test Items
Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
Descriptors: Item Response Theory, Test Items, Bias, Computation
Lee, HyeSun; Geisinger, Kurt F. – Educational and Psychological Measurement, 2016
The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…
Descriptors: Test Bias, Measurement, Accuracy, Statistical Analysis

Peer reviewed
Direct link
