Publication Date
| In 2026 | 0 |
| Since 2025 | 220 |
| Since 2022 (last 5 years) | 1089 |
| Since 2017 (last 10 years) | 2599 |
| Since 2007 (last 20 years) | 4960 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Clements, Andrea D.; Rothenberg, Lori – Research in the Schools, 1996
Undergraduate psychology examinations from 48 schools were analyzed to determine the proportion of items at each level of Bloom's Taxonomy, item format, and test length. Analyses indicated significant relationships between item complexity and test length even when taking format into account. Use of higher items may be related to shorter tests,…
Descriptors: Classification, Difficulty Level, Educational Objectives, Higher Education
Peer reviewedEmbretson, Susan E. – Journal of Educational Measurement, 1996
Comparison of the correlates of two spatial ability tests that used the same item type but different test design principles (cognitive design versus psychometric design) indicated differences in the factorial complexity of the two tests. For the sample of 209 undergraduates, the impact of verbal abilities was substantially reduced by applying the…
Descriptors: Cognitive Processes, Correlation, Factor Structure, Higher Education
Peer reviewedEggen, T. J. H. M.; Straetmans, G. J. J. M. – Educational and Psychological Measurement, 2000
Studied the use of adaptive testing when examinees are classified into three categories. Established testing algorithms with two different statistical computation procedures and evaluated them through simulation using an operative item bank from Dutch basic adult education. Results suggest a reduction of at least 22% in the mean number of items…
Descriptors: Adaptive Testing, Adult Education, Algorithms, Classification
Peer reviewedBramley, Tom – Evaluation & Research in Education, 2001
Analyzed data from a session of the General Certificate of Secondary Education (GCSE) mathematics examination to identify items displaying a bi-modal expected score distribution, try to explain the bi-modality, rescore the items to remove under-used middle categories, and determine the effect on test reliability of rescoring the data. Discusses…
Descriptors: Foreign Countries, Mathematics Tests, Reliability, Scores
Peer reviewedGinther, April – Language Testing, 2002
A nested cross-over design was used to examine the effects of visual condition, type of stimuli, and language proficiency on listening comprehension items of the Test of English as a Foreign Language. Three two-way interactions were significant: proficiency by type of stimuli, type of stimuli by visual condition, and type of stimuli by time.…
Descriptors: English (Second Language), Language Proficiency, Language Tests, Listening Comprehension
Peer reviewedScialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis – Teaching of Psychology, 2001
Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Descriptors: Difficulty Level, Educational Research, Higher Education, Introductory Courses
Mareschal, Denis; Powell, Daisy; Westermann, Gert; Volein, Agnes – Infant and Child Development, 2005
Young infants are very sensitive to feature distribution information in the environment. However, existing work suggests that they do not make use of correlation information to form certain perceptual categories until at least 7 months of age. We suggest that the failure to use correlation information is a by-product of familiarization procedures…
Descriptors: Infants, Classification, Correlation, Familiarity
Pomplun, Mark; Custer, Michael – Applied Measurement in Education, 2005
In this study, we investigated possible context effects when students chose to defer items and answer those items later during a computerized test. In 4 primary school reading tests, 126 items were studied. Logistic regression analyses identified 4 items across 4 grade levels as statistically significant. However, follow-up analyses indicated that…
Descriptors: Psychometrics, Reading Tests, Effect Size, Test Items
Venkateswaran, Uma – History Teacher, 2004
Over the past two decades, remarkable strides have been made in examining, documenting, and incorporating race and gender issues in history courses, but it is time to take a look at the ways in which these curricular and pedagogical changes have impacted the Advanced Placement United States History Examination. This paper focuses on three…
Descriptors: United States History, Advanced Placement, Standardized Tests, Test Bias
Rupp, Andre A. – International Journal of Testing, 2003
Item response theory (IRT) has become one of the most popular scoring frameworks for measurement data. IRT models are used frequently in computerized adaptive testing, cognitively diagnostic assessment, and test equating. This article reviews two of the most popular software packages for IRT model estimation, BILOG-MG (Zimowski, Muraki, Mislevy, &…
Descriptors: Test Items, Adaptive Testing, Item Response Theory, Computer Software
Schaeffer, Gary A.; Henderson-Montero, Diane; Julian, Marc; Bene, Nancy H. – Educational Assessment, 2002
A number of methods for scoring tests with selected-response (SR) and constructed-response (CR) items are available. The selection of a method depends on the requirements of the program, the particular psychometric model and assumptions employed in the analysis of item and score data, and how scores are to be used. This article compares 3 methods:…
Descriptors: Scoring, Responses, Test Items, Raw Scores
Su, Ya-Hui; Wang, Wen-Chung – Applied Measurement in Education, 2005
Simulations were conducted to investigate factors that influence the Mantel, generalized Mantel-Haenszel (GMH), and logistic discriminant function analysis (LDFA) methods in assessing differential item functioning (DIF) for polytomous items. The results show that the magnitude of DIF contamination in the matching score, as measured by the average…
Descriptors: Discriminant Analysis, Test Bias, Research Methodology, Test Items
Lewis, Kelly M.; Lambert, Michael C. – Assessment, 2006
Studies addressing Black adolescents' social change strategies are nonexistent and might be associated with the absence of social change measures for Black adolescents. In an effort to begin addressing this concern, the 30-item Measure of Social Change for Adolescents (MOSC-A) was designed to measure Black adolescents' first- (i.e., within the…
Descriptors: African Americans, Adolescents, Social Change, Change Strategies
Beretvas, S. Natasha; Williams, Natasha J. – Journal of Educational Measurement, 2004
To assess item dimensionality, the following two approaches are described and compared: hierarchical generalized linear model (HGLM) and multidimensional item response theory (MIRT) model. Two generating models are used to simulate dichotomous responses to a 17-item test: the unidimensional and compensatory two-dimensional (C2D) models. For C2D…
Descriptors: Item Response Theory, Test Items, Mathematics Tests, Reading Ability
Lin, Jie – Alberta Journal of Educational Research, 2006
The Bookmark standard-setting procedure was developed to address the perceived problems with the most popular method for setting cut-scores: the Angoff procedure (Angoff, 1971). The purposes of this article are to review the Bookmark procedure and evaluate it in terms of Berk's (1986) criteria for evaluating cut-score setting methods. The…
Descriptors: Standard Setting (Scoring), Cutting Scores, Evaluation Criteria, Evaluation Research

Direct link
