Publication Date
In 2025 | 3 |
Since 2024 | 18 |
Since 2021 (last 5 years) | 69 |
Since 2016 (last 10 years) | 161 |
Since 2006 (last 20 years) | 317 |
Descriptor
Test Length | 624 |
Test Items | 218 |
Item Response Theory | 197 |
Test Construction | 149 |
Sample Size | 137 |
Test Reliability | 130 |
Computer Assisted Testing | 117 |
Test Validity | 108 |
Simulation | 107 |
Adaptive Testing | 98 |
Comparative Analysis | 96 |
More ▼ |
Source
Author
Hambleton, Ronald K. | 15 |
Wang, Wen-Chung | 9 |
Livingston, Samuel A. | 6 |
Sijtsma, Klaas | 6 |
Wainer, Howard | 6 |
Weiss, David J. | 6 |
Wilcox, Rand R. | 6 |
Cheng, Ying | 5 |
Gessaroli, Marc E. | 5 |
Lee, Won-Chan | 5 |
Lewis, Charles | 5 |
More ▼ |
Publication Type
Education Level
Location
Turkey | 8 |
Australia | 7 |
Canada | 7 |
China | 5 |
Netherlands | 5 |
Japan | 4 |
Taiwan | 4 |
United Kingdom | 4 |
Germany | 3 |
Michigan | 3 |
Singapore | 3 |
More ▼ |
Laws, Policies, & Programs
Americans with Disabilities… | 1 |
Equal Access | 1 |
Job Training Partnership Act… | 1 |
Race to the Top | 1 |
Rehabilitation Act 1973… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Kabasakal, Kübra Atalay; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2015
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Descriptors: Test Bias, Equated Scores, Item Response Theory, Simulation
van der Linden, Wim J.; Xiong, Xinhui – Journal of Educational and Behavioral Statistics, 2013
Two simple constraints on the item parameters in a response--time model are proposed to control the speededness of an adaptive test. As the constraints are additive, they can easily be included in the constraint set for a shadow-test approach (STA) to adaptive testing. Alternatively, a simple heuristic is presented to control speededness in plain…
Descriptors: Adaptive Testing, Heuristics, Test Length, Reaction Time
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Anthony, Christopher James; DiPerna, James Clyde – School Psychology Quarterly, 2017
The Academic Competence Evaluation Scales-Teacher Form (ACES-TF; DiPerna & Elliott, 2000) was developed to measure student academic skills and enablers (interpersonal skills, engagement, motivation, and study skills). Although ACES-TF scores have demonstrated psychometric adequacy, the length of the measure may be prohibitive for certain…
Descriptors: Test Items, Efficiency, Item Response Theory, Test Length
Goegan, Lauren D.; Harrison, Gina L. – Learning Disabilities: A Contemporary Journal, 2017
The effects of extended time on the writing performance of university students with learning disabilities (LD) was examined. Thirty-eight students (19 LD; 19 non-LD) completed a collection of cognitive, linguistic, and literacy measures, and wrote essays under regular and extended time conditions. Limited evidence was found to support the…
Descriptors: Foreign Countries, Undergraduate Students, Testing Accommodations, Learning Disabilities
Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics
Lamsal, Sunil – ProQuest LLC, 2015
Different estimation procedures have been developed for the unidimensional three-parameter item response theory (IRT) model. These techniques include the marginal maximum likelihood estimation, the fully Bayesian estimation using Markov chain Monte Carlo simulation techniques, and the Metropolis-Hastings Robbin-Monro estimation. With each…
Descriptors: Item Response Theory, Monte Carlo Methods, Maximum Likelihood Statistics, Markov Processes
Doskey, Elena M.; Lagunas, Brenda; SooHoo, Michelle; Lomax, Amanda; Bullick, Stephanie – Journal of Psychoeducational Assessment, 2013
The Speed DIAL-4 was developed from the Developmental Indicators for the Assessment of Learning, Fourth Edition (DIAL-4), a screening designed to identify children between the ages of 2 years, 6 months through 5 years, 11 months "who are in need of intervention or diagnostic assessment in the following areas: motor, concepts, language,…
Descriptors: Screening Tests, Young Children, Test Length, Scoring
Kahn, Josh; Nese, Joseph T.; Alonzo, Julie – Behavioral Research and Teaching, 2016
There is strong theoretical support for oral reading fluency (ORF) as an essential building block of reading proficiency. The current and standard ORF assessment procedure requires that students read aloud a grade-level passage (˜ 250 words) in a one-to-one administration, with the number of words read correctly in 60 seconds constituting their…
Descriptors: Teacher Surveys, Oral Reading, Reading Tests, Computer Assisted Testing
Lathrop, Quinn N.; Cheng, Ying – Applied Psychological Measurement, 2013
Within the framework of item response theory (IRT), there are two recent lines of work on the estimation of classification accuracy (CA) rate. One approach estimates CA when decisions are made based on total sum scores, the other based on latent trait estimates. The former is referred to as the Lee approach, and the latter, the Rudner approach,…
Descriptors: Item Response Theory, Accuracy, Classification, Computation
Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi – Applied Psychological Measurement, 2013
Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Length, Ability
Stucky, Brian D.; Thissen, David; Edelen, Maria Orlando – Applied Psychological Measurement, 2013
Test developers often need to create unidimensional scales from multidimensional data. For item analysis, "marginal trace lines" capture the relation with the general dimension while accounting for nuisance dimensions and may prove to be a useful technique for creating short-form tests. This article describes the computations needed to obtain…
Descriptors: Test Construction, Test Length, Item Analysis, Item Response Theory
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K. – Journal of Educational Measurement, 2014
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Descriptors: Item Response Theory, Measurement Techniques, Nonparametric Statistics, Models
Lei, Pui-Wa; Zhao, Yu – Applied Psychological Measurement, 2012
Vertical scaling is necessary to facilitate comparison of scores from test forms of different difficulty levels. It is widely used to enable the tracking of student growth in academic performance over time. Most previous studies on vertical scaling methods assume relatively long tests and large samples. Little is known about their performance when…
Descriptors: Scaling, Item Response Theory, Test Length, Sample Size
Meriac, John P.; Woehr, David J.; Gorman, C. Allen; Thomas, Amanda L. E. – Journal of Vocational Behavior, 2013
The multidimensional work ethic profile (MWEP) has become one of the most widely-used inventories for measuring the work ethic construct. However, its length has been a potential barrier to even more widespread use. We developed a short form of the MWEP, the MWEP-SF. A subset of items from the original measure was identified, using item response…
Descriptors: Work Ethic, Profiles, Measures (Individuals), Test Construction