Publication Date
In 2025 | 20 |
Since 2024 | 48 |
Since 2021 (last 5 years) | 225 |
Since 2016 (last 10 years) | 503 |
Since 2006 (last 20 years) | 826 |
Descriptor
Difficulty Level | 1030 |
Test Items | 1030 |
Foreign Countries | 362 |
Item Response Theory | 341 |
Test Construction | 228 |
Item Analysis | 201 |
Multiple Choice Tests | 188 |
Test Reliability | 180 |
Test Validity | 158 |
Scores | 149 |
Comparative Analysis | 133 |
More ▼ |
Source
Author
Bulut, Okan | 7 |
Guo, Hongwen | 7 |
Sinharay, Sandip | 6 |
Baghaei, Purya | 5 |
DeMars, Christine E. | 5 |
Dorans, Neil J. | 5 |
Liu, Ou Lydia | 5 |
Long, Caroline | 5 |
Plake, Barbara S. | 5 |
Retnawati, Heri | 5 |
Wilson, Mark | 5 |
More ▼ |
Publication Type
Education Level
Location
Turkey | 44 |
Indonesia | 29 |
Germany | 27 |
Australia | 18 |
Canada | 16 |
United States | 14 |
Nigeria | 13 |
South Africa | 13 |
United Kingdom | 13 |
Iran | 11 |
China | 10 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Assessments and Surveys
What Works Clearinghouse Rating
Rodgers, Emily; D'Agostino, Jerome V.; Berenbon, Rebecca; Johnson, Tracy; Winkler, Christa – Journal of Early Childhood Literacy, 2023
Running Records are thought to be an excellent formative assessment tool because they generate results that educators can use to make their teaching more responsive. Despite the technical nature of scoring Running Records and the kinds of important decisions that are attached to their analysis, few studies have investigated assessor accuracy. We…
Descriptors: Formative Evaluation, Scoring, Accuracy, Difficulty Level
Patrik Havan; Michal Kohút; Peter Halama – International Journal of Testing, 2025
Acquiescence is the tendency of participants to shift their responses to agreement. Lechner et al. (2019) introduced the following mechanisms of acquiescence: social deference and cognitive processing. We added their interaction into a theoretical framework. The sample consists of 557 participants. We found significant medium strong relationship…
Descriptors: Cognitive Processes, Attention, Difficulty Level, Reflection
Aiman Mohammad Freihat; Omar Saleh Bani Yassin – Educational Process: International Journal, 2025
Background/purpose: This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods: The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the…
Descriptors: Accuracy, Computation, Multiple Choice Tests, Test Items
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…
Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Kuan-Yu Jin; Thomas Eckes – Educational and Psychological Measurement, 2024
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable…
Descriptors: Item Response Theory, Test Items, Test Wiseness, Surveys
Wuji Lin; Chenxi Lv; Jiejie Liao; Yuan Hu; Yutong Liu; Jingyuan Lin – npj Science of Learning, 2024
The debate about whether the capacity of working memory (WM) varies with the complexity of memory items continues. This study employed novel experimental materials to investigate the role of complexity in WM capacity. Across seven experiments, we explored the relationship between complexity and WM capacity. The results indicated that the…
Descriptors: Short Term Memory, Difficulty Level, Retention (Psychology), Test Items
Berenbon, Rebecca F.; McHugh, Bridget C. – Educational Measurement: Issues and Practice, 2023
To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ…
Descriptors: Test Items, Multiple Choice Tests, Standards, Difficulty Level
Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2022
The testlet comprises a set of items based on a common stimulus. When the testlet is used in the tests, there may violate the local independence assumption, and in this case, it would not be appropriate to use traditional item response theory models in the tests in which the testlet is included. When the testlet is discussed, one of the most…
Descriptors: Test Items, Test Theory, Models, Sample Size
Gyamfi, Abraham; Acquaye, Rosemary – Acta Educationis Generalis, 2023
Introduction: Item response theory (IRT) has received much attention in validation of assessment instrument because it allows the estimation of students' ability from any set of the items. Item response theory allows the difficulty and discrimination levels of each item on the test to be estimated. In the framework of IRT, item characteristics are…
Descriptors: Item Response Theory, Models, Test Items, Difficulty Level
Kam, Chester Chun Seng – Educational and Psychological Measurement, 2023
When constructing measurement scales, regular and reversed items are often used (e.g., "I am satisfied with my job"/"I am not satisfied with my job"). Some methodologists recommend excluding reversed items because they are more difficult to understand and therefore engender a second, artificial factor distinct from the…
Descriptors: Test Items, Difficulty Level, Test Construction, Construct Validity
Nedungadi, Sachin; Rinco Michels, Olga; Kreke, Patricia J.; Raker, Jeffrey R.; Murphy, Kristen L. – Journal of Chemical Education, 2022
Practice examinations developed at the ACS Examinations Institute ask students to self-report mental effort when answering items. This self-reported mental effort together with performance can be represented in the form of a cognitive efficiency graph for each student giving information on the utilization of cognitive resources and content…
Descriptors: Cognitive Processes, Science Tests, Test Items, Difficulty Level
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation