Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 18 |
Descriptor
Test Items | 43 |
Test Length | 43 |
Test Reliability | 43 |
Test Construction | 17 |
Test Validity | 16 |
Scores | 12 |
Item Response Theory | 11 |
Difficulty Level | 10 |
Test Format | 9 |
Error of Measurement | 8 |
Item Analysis | 8 |
More ▼ |
Source
Author
Burton, Richard F. | 3 |
Lee, Yi-Hsuan | 2 |
Zhang, Jinming | 2 |
Anthony, Christopher James | 1 |
Battista, Carmela | 1 |
Brooks, William S. | 1 |
Bruce, K. | 1 |
Bulut, Okan | 1 |
Byars, Alvin Gregg | 1 |
Byram, Jessica N. | 1 |
Camilli, Gregory | 1 |
More ▼ |
Publication Type
Reports - Research | 31 |
Journal Articles | 24 |
Reports - Evaluative | 7 |
Speeches/Meeting Papers | 6 |
Reports - Descriptive | 3 |
Guides - Non-Classroom | 1 |
Information Analyses | 1 |
Opinion Papers | 1 |
Reference Materials -… | 1 |
Education Level
Higher Education | 5 |
Postsecondary Education | 4 |
Elementary Education | 3 |
Early Childhood Education | 1 |
Elementary Secondary Education | 1 |
Grade 3 | 1 |
Grade 6 | 1 |
Intermediate Grades | 1 |
Middle Schools | 1 |
Primary Education | 1 |
Audience
Researchers | 2 |
Community | 1 |
Practitioners | 1 |
Laws, Policies, & Programs
Job Training Partnership Act… | 1 |
Assessments and Surveys
Test of English as a Foreign… | 2 |
Armed Forces Qualification… | 1 |
Comprehensive Tests of Basic… | 1 |
MacArthur Communicative… | 1 |
School and College Ability… | 1 |
Stanford Binet Intelligence… | 1 |
What Works Clearinghouse Rating
Raborn, Anthony W.; Leite, Walter L.; Marcoulides, Katerina M. – Educational and Psychological Measurement, 2020
This study compares automated methods to develop short forms of psychometric scales. Obtaining a short form that has both adequate internal structure and strong validity with respect to relationships with other variables is difficult with traditional methods of short-form development. Metaheuristic algorithms can select items for short forms while…
Descriptors: Test Construction, Automation, Heuristics, Mathematics
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Precision of Single-Skill Math CBM Time-Series Data: The Effect of Probe Stratification and Set Size
Solomon, Benjamin G.; Payne, Lexy L.; Campana, Kayla V.; Marr, Erin A.; Battista, Carmela; Silva, Alex; Dawes, Jillian M. – Journal of Psychoeducational Assessment, 2020
Comparatively little research exists on single-skill math (SSM) curriculum-based measurements (CBMs) for the purpose of monitoring growth, as may be done in practice or when monitoring intervention effectiveness within group or single-case research. Therefore, we examined a common variant of SSM-CBM: 1 digit × 1 digit multiplication. Reflecting…
Descriptors: Curriculum Based Assessment, Mathematics Tests, Mathematics Skills, Multiplication
Hamby, Tyler – Journal of Psychoeducational Assessment, 2018
In this study, the author examined potential mediators of the negative relationship between the absolute difference in items' lengths and their inter-item correlation size. Fifty-two randomly ordered items from five personality scales were administered to 622 university students, and 46 respondents from a survey website rated the items'…
Descriptors: Correlation, Personality Traits, Undergraduate Students, Difficulty Level
Kelly, William E.; Daughtry, Don – College Student Journal, 2018
This study developed an abbreviated form of Barron's (1953) Ego Strength Scale for use in research among college student samples. A version of Barron's scale was administered to 100 undergraduate college students. Using item-total score correlations and internal consistency, the scale was reduced to 18 items (Es18). The Es18 possessed adequate…
Descriptors: Undergraduate Students, Self Concept Measures, Test Length, Scores
Chai, Jun Ho; Lo, Chang Huan; Mayor, Julien – Journal of Speech, Language, and Hearing Research, 2020
Purpose: This study introduces a framework to produce very short versions of the MacArthur-Bates Communicative Development Inventories (CDIs) by combining the Bayesian-inspired approach introduced by Mayor and Mani (2019) with an item response theory-based computerized adaptive testing that adapts to the ability of each child, in line with…
Descriptors: Bayesian Statistics, Item Response Theory, Measures (Individuals), Language Skills
Smith, William Zachary; Dickenson, Tammiee S.; Rogers, Bradley David – AERA Online Paper Repository, 2017
Questionnaire refinement and a process for selecting items for elimination are important tools for survey developers. One of the major obstacles in questionnaire refinement and elimination in surveys lies in one's ability to adequately and appropriately reconstruct a survey. Often times, surveys can be long and strenuous on the respondent,…
Descriptors: Surveys, Psychometrics, Test Construction, Test Reliability
Byram, Jessica N.; Seifert, Mark F.; Brooks, William S.; Fraser-Cotlin, Laura; Thorp, Laura E.; Williams, James M.; Wilson, Adam B. – Anatomical Sciences Education, 2017
With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in…
Descriptors: Anatomy, Science Tests, Test Items, Test Reliability
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores
Sengul Avsar, Asiye; Tavsancil, Ezel – Educational Sciences: Theory and Practice, 2017
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Descriptors: Test Items, Psychometrics, Nonparametric Statistics, Item Response Theory
Anthony, Christopher James; DiPerna, James Clyde – School Psychology Quarterly, 2017
The Academic Competence Evaluation Scales-Teacher Form (ACES-TF; DiPerna & Elliott, 2000) was developed to measure student academic skills and enablers (interpersonal skills, engagement, motivation, and study skills). Although ACES-TF scores have demonstrated psychometric adequacy, the length of the measure may be prohibitive for certain…
Descriptors: Test Items, Efficiency, Item Response Theory, Test Length
Sabatini, J.; O'Reilly, T.; Halderman, L.; Bruce, K. – Grantee Submission, 2014
Existing reading comprehension assessments have been criticized by researchers, educators, and policy makers, especially regarding their coverage, utility, and authenticity. The purpose of the current study was to evaluate a new assessment of reading comprehension that was designed to broaden the construct of reading. In light of these issues, we…
Descriptors: Reading Comprehension, Vignettes, Reading Tests, Elementary School Students
Yao, Lihua – Applied Psychological Measurement, 2013
Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Descriptors: Test Bias, Item Response Theory, Test Items, Scores