Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 10 |
Since 2016 (last 10 years) | 27 |
Since 2006 (last 20 years) | 51 |
Descriptor
Difficulty Level | 90 |
Simulation | 90 |
Test Items | 90 |
Item Response Theory | 45 |
Sample Size | 19 |
Comparative Analysis | 18 |
Item Analysis | 18 |
Computer Assisted Testing | 17 |
Correlation | 16 |
Statistical Analysis | 14 |
Guessing (Tests) | 13 |
More ▼ |
Source
Author
Jin, Kuan-Yu | 3 |
Wang, Wen-Chung | 3 |
Guo, Hongwen | 2 |
Holland, Paul | 2 |
Kamata, Akihito | 2 |
Reckase, Mark D. | 2 |
Schnipke, Deborah L. | 2 |
Sinharay, Sandip | 2 |
Spray, Judith A. | 2 |
Welch, Catherine J. | 2 |
Wise, Steven L. | 2 |
More ▼ |
Publication Type
Reports - Research | 61 |
Journal Articles | 55 |
Reports - Evaluative | 22 |
Speeches/Meeting Papers | 20 |
Dissertations/Theses -… | 5 |
Reports - Descriptive | 3 |
Numerical/Quantitative Data | 1 |
Tests/Questionnaires | 1 |
Audience
Researchers | 2 |
Location
Minnesota | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 2 |
National Assessment of… | 1 |
Raven Advanced Progressive… | 1 |
SAT (College Admission Test) | 1 |
Stanford Binet Intelligence… | 1 |
Test of English as a Foreign… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Aiman Mohammad Freihat; Omar Saleh Bani Yassin – Educational Process: International Journal, 2025
Background/purpose: This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods: The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the…
Descriptors: Accuracy, Computation, Multiple Choice Tests, Test Items
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Wyse, Adam E.; McBride, James R. – Journal of Educational Measurement, 2021
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch-based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the…
Descriptors: Item Response Theory, Computer Assisted Testing, Adaptive Testing, Test Items
DeCarlo, Lawrence T. – Journal of Educational Measurement, 2023
A conceptualization of multiple-choice exams in terms of signal detection theory (SDT) leads to simple measures of item difficulty and item discrimination that are closely related to, but also distinct from, those used in classical item analysis (CIA). The theory defines a "true split," depending on whether or not examinees know an item,…
Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Test Wiseness
Derek Sauder – ProQuest LLC, 2020
The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters.…
Descriptors: Item Response Theory, Sample Size, Computation, Test Length
Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023
Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…
Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level
Saatcioglu, Fatima Munevver; Atar, Hakan Yavuz – International Journal of Assessment Tools in Education, 2022
This study aims to examine the effects of mixture item response theory (IRT) models on item parameter estimation and classification accuracy under different conditions. The manipulated variables of the simulation study are set as mixture IRT models (Rasch, 2PL, 3PL); sample size (600, 1000); the number of items (10, 30); the number of latent…
Descriptors: Accuracy, Classification, Item Response Theory, Programming Languages
Berger, Stéphanie; Verschoor, Angela J.; Eggen, Theo J. H. M.; Moser, Urs – Journal of Educational Measurement, 2019
Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that…
Descriptors: Simulation, Computer Assisted Testing, Test Items, Difficulty Level
Susanti, Yuni; Tokunaga, Takenobu; Nishikawa, Hitoshi – Research and Practice in Technology Enhanced Learning, 2020
The present study focuses on the integration of an automatic question generation (AQG) system and a computerised adaptive test (CAT). We conducted two experiments. In the first experiment, we administered sets of questions to English learners to gather their responses. We further used their responses in the second experiment, which is a…
Descriptors: Computer Assisted Testing, Test Items, Simulation, English Language Learners
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Marcoulides, Katerina M. – Measurement: Interdisciplinary Research and Perspectives, 2018
This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
Descriptors: Bayesian Statistics, Item Response Theory, Probability, Difficulty Level
Item Order and Speededness: Implications for Test Fairness in Higher Educational High-Stakes Testing
Becker, Benjamin; van Rijn, Peter; Molenaar, Dylan; Debeer, Dries – Assessment & Evaluation in Higher Education, 2022
A common approach to increase test security in higher educational high-stakes testing is the use of different test forms with identical items but different item orders. The effects of such varied item orders are relatively well studied, but findings have generally been mixed. When multiple test forms with different item orders are used, we argue…
Descriptors: Information Security, High Stakes Tests, Computer Security, Test Items
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick – ETS Research Report Series, 2018
For a multiple-choice test under development or redesign, it is important to choose the optimal number of options per item so that the test possesses the desired psychometric properties. On the basis of available data for a multiple-choice assessment with 8 options, we evaluated the effects of changing the number of options on test properties…
Descriptors: Multiple Choice Tests, Test Items, Simulation, Test Construction