Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 15 |
Descriptor
Test Length | 28 |
Test Items | 15 |
Item Response Theory | 11 |
Sample Size | 7 |
Simulation | 7 |
Computer Assisted Testing | 6 |
Error of Measurement | 6 |
Scores | 6 |
Test Construction | 6 |
Adaptive Testing | 5 |
Item Banks | 5 |
More ▼ |
Source
Journal of Educational… | 28 |
Author
Hambleton, Ronald K. | 3 |
Lee, Won-Chan | 3 |
Wainer, Howard | 2 |
Andersson, Björn | 1 |
Ankenman, Robert D. | 1 |
Bridgeman, Brent | 1 |
Budescu, David | 1 |
Budescu, David V. | 1 |
Chen, Shu-Ying | 1 |
Cheng, Ying | 1 |
Chon, Kyong Hee | 1 |
More ▼ |
Publication Type
Journal Articles | 27 |
Reports - Research | 18 |
Reports - Evaluative | 8 |
Opinion Papers | 1 |
Reports - Descriptive | 1 |
Education Level
High Schools | 1 |
Audience
Location
Israel | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
He, Yinhong – Journal of Educational Measurement, 2023
Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…
Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods
Kim, Hyung Jin; Lee, Won-Chan – Journal of Educational Measurement, 2022
Orlando and Thissen (2000) introduced the "S - X[superscript 2]" item-fit index for testing goodness-of-fit with dichotomous item response theory (IRT) models. This study considers and evaluates an alternative approach for computing "S - X[superscript 2]" values and other factors associated with collapsing tables of observed…
Descriptors: Goodness of Fit, Test Items, Item Response Theory, Computation
Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022
Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…
Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods
Svetina, Dubravka; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2019
This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s)…
Descriptors: Measurement, Item Analysis, Test Construction, Item Response Theory
Hsu, Chia-Ling; Wang, Wen-Chung – Journal of Educational Measurement, 2015
Cognitive diagnosis models provide profile information about a set of latent binary attributes, whereas item response models yield a summary report on a latent continuous trait. To utilize the advantages of both models, higher order cognitive diagnosis models were developed in which information about both latent binary attributes and latent…
Descriptors: Computer Assisted Testing, Adaptive Testing, Models, Cognitive Measurement
Andersson, Björn – Journal of Educational Measurement, 2016
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests
Veldkamp, Bernard P. – Journal of Educational Measurement, 2016
Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second…
Descriptors: Computer Assisted Testing, Reaction Time, Standardized Tests, Difficulty Level
Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K. – Journal of Educational Measurement, 2014
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Descriptors: Item Response Theory, Measurement Techniques, Nonparametric Statistics, Models
Han, Kyung T. – Journal of Educational Measurement, 2012
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Seo, Minhee; Roussos, Louis A. – Journal of Educational Measurement, 2010
DIMTEST is a widely used and studied method for testing the hypothesis of test unidimensionality as represented by local item independence. However, DIMTEST does not report the amount of multidimensionality that exists in data when rejecting its null. To provide more information regarding the degree to which data depart from unidimensionality, a…
Descriptors: Effect Size, Statistical Bias, Computation, Test Length
Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010
In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…
Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation
Klockars, Alan J.; Lee, Yoonsun – Journal of Educational Measurement, 2008
Monte Carlo simulations with 20,000 replications are reported to estimate the probability of rejecting the null hypothesis regarding DIF using SIBTEST when there is DIF present and/or when impact is present due to differences on the primary dimension to be measured. Sample sizes are varied from 250 to 2000 and test lengths from 10 to 40 items.…
Descriptors: Test Bias, Test Length, Reference Groups, Probability
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Previous Page | Next Page »
Pages: 1 | 2