Publication Date
| In 2026 | 0 |
| Since 2025 | 197 |
| Since 2022 (last 5 years) | 1067 |
| Since 2017 (last 10 years) | 2577 |
| Since 2007 (last 20 years) | 4938 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Trace, Jonathan; Brown, James Dean; Janssen, Gerriet; Kozhevnikova, Liudmila – Language Testing, 2017
Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level.…
Descriptors: Cloze Procedure, Language Tests, Test Items, Item Analysis
van der Ven, Sanne H. G.; Klaiber, Jonathan D.; van der Maas, Han L. J. – Educational Psychology, 2017
Writing down spoken number words (transcoding) is an ability that is predictive of math performance and related to working memory ability. We analysed these relationships in a large sample of over 25,000 children, from kindergarten to the end of primary school, who solved transcoding items with a computer adaptive system. Furthermore, we…
Descriptors: Short Term Memory, Foreign Countries, Mathematics, Mathematics Instruction
Gómez-Benito, Juana; Hidalgo, Maria Dolores; Zumbo, Bruno D. – Educational and Psychological Measurement, 2013
The objective of this article was to find an optimal decision rule for identifying polytomous items with large or moderate amounts of differential functioning. The effectiveness of combining statistical tests with effect size measures was assessed using logistic discriminant function analysis and two effect size measures: R[superscript 2] and…
Descriptors: Item Analysis, Test Items, Effect Size, Statistical Analysis
de la Torre, Jimmy; Lee, Young-Sun – Journal of Educational Measurement, 2013
This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…
Descriptors: Statistical Analysis, Test Items, Goodness of Fit, Error of Measurement
Grunert, Megan L.; Raker, Jeffrey R.; Murphy, Kristen L.; Holme, Thomas A. – Journal of Chemical Education, 2013
The concept of assigning partial credit on multiple-choice test items is considered for items from ACS Exams. Because the items on these exams, particularly the quantitative items, use common student errors to define incorrect answers, it is possible to assign partial credits to some of these incorrect responses. To do so, however, it becomes…
Descriptors: Multiple Choice Tests, Scoring, Scoring Rubrics, Science Tests
Dodonova, Yulia A.; Dodonov, Yury S. – Intelligence, 2013
Using more complex items than those commonly employed within the information-processing approach, but still easier than those used in intelligence tests, this study analyzed how the association between processing speed and accuracy level changes as the difficulty of the items increases. The study involved measuring cognitive ability using Raven's…
Descriptors: Difficulty Level, Intelligence Tests, Cognitive Ability, Accuracy
Hua, Jing; Gu, Guixiong; Meng, Wei; Wu, Zhuochun – Research in Developmental Disabilities: A Multidisciplinary Journal, 2013
The aim of this paper was to examine the validity and reliability of age band 1 of the Movement Assessment Battery for Children-Second Edition (MABC-2) in preparation for its standardization in mainland China. Interrater and test-retest reliability of the MABC-2 was estimated using Intraclass Correlation Coefficient (ICC). Cronbach's alpha for…
Descriptors: Factor Analysis, Test Items, Foreign Countries, Psychometrics
Hong, Eunsook; Peng, Yun; O'Neil, Harold F., Jr.; Wu, Junbin – Journal of Creative Behavior, 2013
The study examined the effects of gender and item content of domain-general and domain-specific creative-thinking tests on four subscale scores of creative-thinking (fluency, flexibility, originality, and elaboration). Chinese tenth-grade students (234 males and 244 females) participated in the study. Domain-general creative thinking was measured…
Descriptors: Creative Thinking, Creativity Tests, Gender Differences, Test Items
Luecht, Richard M. – Journal of Applied Testing Technology, 2013
Assessment engineering is a new way to design and implement scalable, sustainable and ideally lower-cost solutions to the complexities of designing and developing tests. It represents a merger of sorts between cognitive task modeling and engineering design principles--a merger that requires some new thinking about the nature of score scales, item…
Descriptors: Engineering, Test Construction, Test Items, Models
Kim, Jihye; Oshima, T. C. – Educational and Psychological Measurement, 2013
In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Descriptors: Test Bias, Test Items, Statistical Analysis, Error of Measurement
Han, Kyung T. – Applied Psychological Measurement, 2013
Most computerized adaptive testing (CAT) programs do not allow test takers to review and change their responses because it could seriously deteriorate the efficiency of measurement and make tests vulnerable to manipulative test-taking strategies. Several modified testing methods have been developed that provide restricted review options while…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Testing
Store, Davie – ProQuest LLC, 2013
The impact of particular types of context effects on actual scores is less understood although there has been some research carried out regarding certain types of context effects under the nonequivalent anchor test (NEAT) design. In addition, the issue of the impact of item context effects on scores has not been investigated extensively when item…
Descriptors: Test Items, Equated Scores, Accuracy, Item Response Theory
Maydeu-Olivares, Alberto; Montano, Rosa – Psychometrika, 2013
We investigate the performance of three statistics, R [subscript 1], R [subscript 2] (Glas in "Psychometrika" 53:525-546, 1988), and M [subscript 2] (Maydeu-Olivares & Joe in "J. Am. Stat. Assoc." 100:1009-1020, 2005, "Psychometrika" 71:713-732, 2006) to assess the overall fit of a one-parameter logistic model…
Descriptors: Foreign Countries, Item Response Theory, Statistics, Data Analysis
Tijmstra, Jesper; Hessen, David J.; van der Heijden, Peter G. M.; Sijtsma, Klaas – Psychometrika, 2013
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores,…
Descriptors: Item Response Theory, Statistical Inference, Probability, Psychometrics
Koretz, Daniel; Jennings, Jennifer L.; Ng, Hui Leng; Yu, Carol; Braslow, David; Langi, Meredith – Educational Assessment, 2016
Test-based accountability often produces score inflation. Most studies have evaluated inflation by comparing trends on a high-stakes test and a lower stakes audit test. However, Koretz and Beguin (2010) noted weaknesses of audit tests and suggested self-monitoring assessments (SMAs), which incorporate audit items into high-stakes tests. This…
Descriptors: Audits (Verification), Scores, Grade Inflation, Self Evaluation (Individuals)

Peer reviewed
Direct link
