Publication Date
In 2025 | 3 |
Since 2024 | 18 |
Since 2021 (last 5 years) | 69 |
Since 2016 (last 10 years) | 161 |
Since 2006 (last 20 years) | 317 |
Descriptor
Test Length | 624 |
Test Items | 218 |
Item Response Theory | 197 |
Test Construction | 149 |
Sample Size | 137 |
Test Reliability | 130 |
Computer Assisted Testing | 117 |
Test Validity | 108 |
Simulation | 107 |
Adaptive Testing | 98 |
Comparative Analysis | 96 |
More ▼ |
Source
Author
Hambleton, Ronald K. | 15 |
Wang, Wen-Chung | 9 |
Livingston, Samuel A. | 6 |
Sijtsma, Klaas | 6 |
Wainer, Howard | 6 |
Weiss, David J. | 6 |
Wilcox, Rand R. | 6 |
Cheng, Ying | 5 |
Gessaroli, Marc E. | 5 |
Lee, Won-Chan | 5 |
Lewis, Charles | 5 |
More ▼ |
Publication Type
Education Level
Location
Turkey | 8 |
Australia | 7 |
Canada | 7 |
China | 5 |
Netherlands | 5 |
Japan | 4 |
Taiwan | 4 |
United Kingdom | 4 |
Germany | 3 |
Michigan | 3 |
Singapore | 3 |
More ▼ |
Laws, Policies, & Programs
Americans with Disabilities… | 1 |
Equal Access | 1 |
Job Training Partnership Act… | 1 |
Race to the Top | 1 |
Rehabilitation Act 1973… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
James, Syretta R.; Liu, Shihching Jessica; Maina, Nyambura; Wade, Julie; Wang, Helen; Wilson, Heather; Wolanin, Natalie – Montgomery County Public Schools, 2021
The impact of the COVID-19 pandemic continues to overwhelm the functioning and outcomes of educational systems throughout the nation. The public education system is under particular scrutiny given that students, families, and educators are under considerable stress to maintain academic progress. Since the beginning of the crisis, school-systems…
Descriptors: Achievement Tests, COVID-19, Pandemics, Public Schools
Andersson, Björn – Journal of Educational Measurement, 2016
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests
Jacob, Brian A. – Center on Children and Families at Brookings, 2016
Contrary to popular belief, modern cognitive assessments--including the new Common Core tests--produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. While there are good reasons for this, it means that reported test scores depend on many decisions made by test designers,…
Descriptors: Scores, Common Core State Standards, Test Length, Test Content
Watson, Nicole; Wilkins, Roger – Field Methods, 2015
Computer-assisted personal interviewing (CAPI) offers many attractive benefits over paper-and-pencil interviewing. There is, however, mixed evidence on the impact of CAPI on interview "length," an important survey outcome in the context of length limits imposed by survey budgets and concerns over respondent burden. In this article,…
Descriptors: Interviews, Test Length, Computer Assisted Testing, National Surveys
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K. – Educational and Psychological Measurement, 2016
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Descriptors: Item Response Theory, Test Bias, Simulation, College Entrance Examinations
Runco, Mark A.; Walczyk, Jeffrey John; Acar, Selcuk; Cowger, Ernest L.; Simundson, Melissa; Tripp, Sunny – Journal of Creative Behavior, 2014
This article describes an empirical refinement of the "Runco Ideational Behavior Scale" (RIBS). The RIBS seems to be associated with divergent thinking, and the potential for creative thinking, but it was possible that its validity could be improved. With this in mind, three new scales were developed and the unique benefit (or…
Descriptors: Behavior Rating Scales, Creative Thinking, Test Validity, Psychometrics
Li, Feifei – ETS Research Report Series, 2017
An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…
Descriptors: Item Response Theory, Generalizability Theory, Tests, Error of Measurement
Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe – Journal of Speech, Language, and Hearing Research, 2016
Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…
Descriptors: Item Response Theory, Computer Assisted Testing, Adaptive Testing, Language Tests
Veldkamp, Bernard P. – Journal of Educational Measurement, 2016
Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second…
Descriptors: Computer Assisted Testing, Reaction Time, Standardized Tests, Difficulty Level
Bae, Minryoung; Lee, Byungmin – English Teaching, 2018
This study examines the effects of text length and question type on Korean EFL readers' reading comprehension of the fill-in-the-blank items in Korean CSAT. A total of 100 Korean EFL college students participated in the study. After divided into three different proficiency groups, the participants took a reading comprehension test which consisted…
Descriptors: Test Items, Language Tests, Second Language Learning, Second Language Instruction
Sengul Avsar, Asiye; Tavsancil, Ezel – Educational Sciences: Theory and Practice, 2017
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Descriptors: Test Items, Psychometrics, Nonparametric Statistics, Item Response Theory
Lu, Ying – ETS Research Report Series, 2017
For standard- or criterion-based assessments, the use of cut scores to indicate mastery, nonmastery, or different levels of skill mastery is very common. As part of performance summary, it is of interest to examine the percentage of examinees at or above the cut scores (PAC) and how PAC evolves across administrations. This paper shows that…
Descriptors: Cutting Scores, Evaluation Methods, Mastery Learning, Performance Based Assessment
NWEA, 2018
Thousands of U.S. school districts and many international schools use MAP® Growth™ to monitor the academic growth of their students and to inform instruction. The MAP Growth assessment is untimed, meaning that limits are not placed on how much time a student has to respond to the items. However, to help schools understand the amount of time MAP…
Descriptors: Achievement Tests, Test Length, Achievement Gains, Mathematics Tests
Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014
Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…
Descriptors: Likert Scales, Responses, Item Response Theory, Classification
Gelfand, Jessica T.; Christie, Robert E.; Gelfand, Stanley A. – Journal of Speech, Language, and Hearing Research, 2014
Purpose: Speech recognition may be analyzed in terms of recognition probabilities for perceptual wholes (e.g., words) and parts (e.g., phonemes), where j or the j-factor reveals the number of independent perceptual units required for recognition of the whole (Boothroyd, 1968b; Boothroyd & Nittrouer, 1988; Nittrouer & Boothroyd, 1990). For…
Descriptors: Phonemes, Word Recognition, Vowels, Syllables