Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 17 |
Descriptor
Difficulty Level | 40 |
Test Items | 40 |
Test Length | 40 |
Item Response Theory | 16 |
Sample Size | 13 |
Test Construction | 10 |
Test Reliability | 10 |
Item Analysis | 9 |
Comparative Analysis | 8 |
Simulation | 8 |
Correlation | 7 |
More ▼ |
Source
Author
Publication Type
Reports - Research | 32 |
Journal Articles | 17 |
Speeches/Meeting Papers | 10 |
Dissertations/Theses -… | 4 |
Reports - Evaluative | 4 |
Guides - Non-Classroom | 1 |
Information Analyses | 1 |
Education Level
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 4 |
Location
Australia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Comprehensive Tests of Basic… | 1 |
New Jersey College Basic… | 1 |
Otis Lennon School Ability… | 1 |
SAT (College Admission Test) | 1 |
Stanford Binet Intelligence… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Fellinghauer, Carolina; Debelak, Rudolf; Strobl, Carolin – Educational and Psychological Measurement, 2023
This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation…
Descriptors: True Scores, Equated Scores, Test Items, Sample Size
Lang, Joseph B. – Journal of Educational and Behavioral Statistics, 2023
This article is concerned with the statistical detection of copying on multiple-choice exams. As an alternative to existing permutation- and model-based copy-detection approaches, a simple randomization p-value (RP) test is proposed. The RP test, which is based on an intuitive match-score statistic, makes no assumptions about the distribution of…
Descriptors: Identification, Cheating, Multiple Choice Tests, Item Response Theory
Yu, Albert; Douglas, Jeffrey A. – Journal of Educational and Behavioral Statistics, 2023
We propose a new item response theory growth model with item-specific learning parameters, or ISLP, and two variations of this model. In the ISLP model, either items or blocks of items have their own learning parameters. This model may be used to improve the efficiency of learning in a formative assessment. We show ways that the ISLP model's…
Descriptors: Item Response Theory, Learning, Markov Processes, Monte Carlo Methods
Derek Sauder – ProQuest LLC, 2020
The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters.…
Descriptors: Item Response Theory, Sample Size, Computation, Test Length
Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023
Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…
Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level
Arikan, Serkan; Aybek, Eren Can – Educational Measurement: Issues and Practice, 2022
Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number…
Descriptors: Monte Carlo Methods, Item Analysis, Correlation, Individual Differences
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Sunbul, Onder; Yormaz, Seha – International Journal of Evaluation and Research in Education, 2018
In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
Descriptors: Difficulty Level, Cheating, Duplication, Test Length
Hamby, Tyler – Journal of Psychoeducational Assessment, 2018
In this study, the author examined potential mediators of the negative relationship between the absolute difference in items' lengths and their inter-item correlation size. Fifty-two randomly ordered items from five personality scales were administered to 622 university students, and 46 respondents from a survey website rated the items'…
Descriptors: Correlation, Personality Traits, Undergraduate Students, Difficulty Level
Bazaldua, Diego A. Luna; Lee, Young-Sun; Keller, Bryan; Fellers, Lauren – Asia Pacific Education Review, 2017
The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT:…
Descriptors: Test Items, Monte Carlo Methods, Item Response Theory, Correlation
Wu, Yi-Fang – ProQuest LLC, 2015
Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…
Descriptors: Item Response Theory, Test Items, Accuracy, Computation
Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Descriptors: Sample Size, Test Length, Correlation, Test Format
He, Wei; Reckase, Mark D. – Educational and Psychological Measurement, 2014
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Descriptors: Item Banks, Test Length, Computer Assisted Testing, Adaptive Testing
Sunnassee, Devdass – ProQuest LLC, 2011
Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…
Descriptors: Test Length, Test Format, Sample Size, Simulation
Fu, Qiong – ProQuest LLC, 2010
This research investigated how the accuracy of person ability and item difficulty parameter estimation varied across five IRT models with respect to the presence of guessing, targeting, and varied combinations of sample sizes and test lengths. The data were simulated with 50 replications under each of the 18 combined conditions. Five IRT models…
Descriptors: Item Response Theory, Guessing (Tests), Accuracy, Computation