Publication Date
| In 2026 | 0 |
| Since 2025 | 197 |
| Since 2022 (last 5 years) | 1067 |
| Since 2017 (last 10 years) | 2577 |
| Since 2007 (last 20 years) | 4938 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Chen, Jinsong; de la Torre, Jimmy – Applied Psychological Measurement, 2013
Polytomous attributes, particularly those defined as part of the test development process, can provide additional diagnostic information. The present research proposes the polytomous generalized deterministic inputs, noisy, "and" gate (pG-DINA) model to accommodate such attributes. The pG-DINA model allows input from substantive experts…
Descriptors: Models, Cognitive Tests, Diagnostic Tests, Computation
Cheong, Yuk Fai; Kamata, Akihito – Applied Measurement in Education, 2013
In this article, we discuss and illustrate two centering and anchoring options available in differential item functioning (DIF) detection studies based on the hierarchical generalized linear and generalized linear mixed modeling frameworks. We compared and contrasted the assumptions of the two options, and examined the properties of their DIF…
Descriptors: Test Bias, Hierarchical Linear Modeling, Comparative Analysis, Test Items
Grand, James A.; Golubovich, Juliya; Ryan, Ann Marie; Schmitt, Neal – Organizational Behavior and Human Decision Processes, 2013
In organizational and educational practices, sensitivity reviews are commonly advocated techniques for reducing test bias and enhancing fairness. In the present paper, results from two studies are reported which investigate how effective individuals are at detecting problematic test content and the influence such content has on important testing…
Descriptors: Test Items, Test Content, Test Bias, Individual Differences
Meyer, Joseph F.; Faust, Kyle A.; Faust, David; Baker, Aaron M.; Cook, Nathan E. – International Journal of Mental Health and Addiction, 2013
Even when relatively infrequent, careless and random responding (C/RR) can have robust effects on individual and group data and thereby distort clinical evaluations and research outcomes. Given such potential adverse impacts and the broad use of self-report measures when appraising addictions and addictive behavior, the detection of C/RR can…
Descriptors: Addictive Behavior, Response Style (Tests), Test Items, Validity
Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W. – Applied Psychological Measurement, 2013
Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
Descriptors: Test Construction, Test Items, Item Banks, Automation
Cheng, Ying; Chen, Peihua; Qian, Jiahe; Chang, Hua-Hua – Applied Psychological Measurement, 2013
Differential item functioning (DIF) analysis is an important step in the data analysis of large-scale testing programs. Nowadays, many such programs endorse matrix sampling designs to reduce the load on examinees, such as the balanced incomplete block (BIB) design. These designs pose challenges to the traditional DIF analysis methods. For example,…
Descriptors: Test Bias, Equated Scores, Test Items, Effect Size
Koen, Joshua D.; Yonelinas, Andrew P. – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2013
Koen and Yonelinas (2010) contrasted the recollection and encoding variability accounts of the finding that old items are associated with more variable memory strength than new items. The study indicated that (a) increasing encoding variability did not lead to increased measures of old item variance, and (b) old item variance was directly related…
Descriptors: Recall (Psychology), Memory, Cognitive Processes, Models
Matlock, Ki Lynn – ProQuest LLC, 2013
When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…
Descriptors: Test Items, Difficulty Level, Ability, Test Content
Sorensen, Henry L. – ProQuest LLC, 2013
Cut-score setting processes are used to establish the passing standards for all kinds of tests in education and for credentialing. While experts use their best efforts to guide cut-score setting processes to generate valid and reliable results, cut-score participants often have a difficult time understanding the standard at which the cut score is…
Descriptors: Cutting Scores, Standard Setting (Scoring), Comparative Analysis, Difficulty Level
Ke, I-Chung – Language, Culture and Curriculum, 2019
A previous study (Ke, I. 2012. From EFL to English as an international and scientific language: analysing Taiwan's high-school English textbooks in the period 1952-2009. "Language, Culture and Curriculum," 25(2), 173-187) on the trend of English textbooks in Taiwanese high schools showed that the proportion of the lessons embedded in…
Descriptors: English (Second Language), Second Language Learning, Language Tests, Global Approach
Palane, Nelladee McLeod; Howie, Sarah – Perspectives in Education, 2019
In this article, preProgress in Reading Literacy Study (prePIRLS) 2011 data is used to compare the performance of different language of instruction groupings (English, Afrikaans and African languages) in primary schools on the more complex, higher-order reading comprehension items tested in a large-scale international test. PrePIRLS 2011…
Descriptors: Reading Comprehension, Language of Instruction, Models, Elementary School Students
Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas – Educational and Psychological Measurement, 2015
The selection of an appropriate booklet design is an important element of large-scale assessments of student achievement. Two design properties that are typically optimized are the "balance" with respect to the positions the items are presented and with respect to the mutual occurrence of pairs of items in the same booklet. The purpose…
Descriptors: Measurement, Computation, Test Format, Test Items
Nijlen, Daniel Van; Janssen, Rianne – Applied Measurement in Education, 2015
In this study it is investigated to what extent contextualized and non-contextualized mathematics test items have a differential impact on examinee effort. Mixture item response theory (IRT) models are applied to two subsets of items from a national assessment on mathematics in the second grade of the pre-vocational track in secondary education in…
Descriptors: Mathematics Tests, Measurement, Item Response Theory, Test Items
Choi, Kyong Mi; Lee, Young-Sun; Park, Yoon Soo – EURASIA Journal of Mathematics, Science & Technology Education, 2015
International trended assessments have long attempted to provide instructional information to educational researchers and classroom teachers. Studies have shown that traditional methods of item analysis have not provided specific information that can be directly applicable to improve student performance. To this end, cognitive diagnosis models…
Descriptors: International Assessment, Mathematics Tests, Grade 8, Models
Behizadeh, Nadia; Engelhard, George, Jr. – Measurement: Interdisciplinary Research and Perspectives, 2015
In his focus article, Koretz (this issue) argues that accountability has become the primary function of large-scale testing in the United States. He then points out that tests being used for accountability purposes are flawed and that the high-stakes nature of these tests creates a context that encourages score inflation. Koretz is concerned about…
Descriptors: Communities of Practice, High Stakes Tests, Testing, Test Validity

Peer reviewed
Direct link
