Publication Date
In 2025 | 82 |
Since 2024 | 330 |
Since 2021 (last 5 years) | 1282 |
Since 2016 (last 10 years) | 2746 |
Since 2006 (last 20 years) | 4973 |
Descriptor
Test Items | 9400 |
Test Construction | 2673 |
Foreign Countries | 2122 |
Item Response Theory | 1843 |
Difficulty Level | 1597 |
Item Analysis | 1480 |
Test Validity | 1375 |
Test Reliability | 1152 |
Multiple Choice Tests | 1134 |
Scores | 1122 |
Computer Assisted Testing | 1040 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Practitioners | 653 |
Teachers | 560 |
Researchers | 249 |
Students | 201 |
Administrators | 79 |
Policymakers | 21 |
Parents | 17 |
Counselors | 8 |
Community | 7 |
Support Staff | 3 |
Media Staff | 1 |
More ▼ |
Location
Canada | 223 |
Turkey | 221 |
Australia | 155 |
Germany | 114 |
United States | 97 |
Florida | 86 |
China | 84 |
Taiwan | 75 |
Indonesia | 73 |
United Kingdom | 70 |
Netherlands | 64 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 4 |
Meets WWC Standards with or without Reservations | 4 |
Does not meet standards | 1 |
Katrin Schuessler; Vanessa Fischer; Maik Walpuski – Instructional Science: An International Journal of the Learning Sciences, 2025
Cognitive load studies are mostly centered on information on perceived cognitive load. Single-item subjective rating scales are the dominant measurement practice to investigate overall cognitive load. Usually, either invested mental effort or perceived task difficulty is used as an overall cognitive load measure. However, the extent to which the…
Descriptors: Cognitive Processes, Difficulty Level, Rating Scales, Construct Validity
Santi Lestari – Research Matters, 2025
The ability to draw visual representations such as diagrams and graphs is considered fundamental to science learning. Science exams therefore often include questions which require students to draw a visual representation, or to augment a partially provided one. The design features of such questions (e.g., layout of diagrams, amount of answer…
Descriptors: Science Education, Secondary Education, Visual Aids, Foreign Countries
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
Jerin Kim; Kent McIntosh – Journal of Positive Behavior Interventions, 2025
We aimed to identify empirically valid cut scores on the positive behavioral interventions and supports (PBIS) Tiered Fidelity Inventory (TFI) through an expert panel process known as bookmarking. The TFI is a measurement tool to evaluate the fidelity of implementation of PBIS. In the bookmark method, experts reviewed all TFI items and item scores…
Descriptors: Positive Behavior Supports, Cutting Scores, Fidelity, Program Evaluation
Roozenbeek, Jon; Maertens, Rakoen; McClanahan, William; van der Linden, Sander – Educational and Psychological Measurement, 2021
Online misinformation is a pervasive global problem. In response, psychologists have recently explored the theory of psychological inoculation: If people are preemptively exposed to a weakened version of a misinformation technique, they can build up cognitive resistance. This study addresses two unanswered methodological questions about a widely…
Descriptors: Games, Intervention, Scores, Pretests Posttests
Semere Kiros Bitew; Amir Hadifar; Lucas Sterckx; Johannes Deleu; Chris Develder; Thomas Demeester – IEEE Transactions on Learning Technologies, 2024
Multiple-choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, owing to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Test Construction, Test Items
Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024
Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…
Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction
Theodore E. G. Alivio; Claire E. Galloway; Blain Mamiya; Vickie M. Williamson – Journal of Science Education and Technology, 2024
The link between a student's math fluency and their success in general chemistry has been thoroughly documented in the literature. One diagnostic instrument that can be used to assess a student's arithmetic skills is the Math-Up Skills Test (MUST), a 20-question, free-response math test completed in 15 min. The MUST instrument assesses the…
Descriptors: Mathematics Tests, Test Items, Item Analysis, Early Intervention
Jing Ma – ProQuest LLC, 2024
This study investigated the impact of scoring polytomous items later on measurement precision, classification accuracy, and test security in mixed-format adaptive testing. Utilizing the shadow test approach, a simulation study was conducted across various test designs, lengths, number and location of polytomous item. Results showed that while…
Descriptors: Scoring, Adaptive Testing, Test Items, Classification
Mostafa Hosseinzadeh; Ki Lynn Matlock Cole – Educational and Psychological Measurement, 2024
In real-world situations, multidimensional data may appear on large-scale tests or psychological surveys. The purpose of this study was to investigate the effects of the quantity and magnitude of cross-loadings and model specification on item parameter recovery in multidimensional Item Response Theory (MIRT) models, especially when the model was…
Descriptors: Item Response Theory, Models, Maximum Likelihood Statistics, Algorithms
Melissa Whatley; Dominique Foster; Stephen Paul – Journal of Studies in International Education, 2024
The purpose of this study was to develop a measurement instrument that scholars and practitioners in international education can use as a means of exploring whether and how individuals who come into contact with international education programs develop a greater sense of cultural humility. Specifically, the study described here outlines the four…
Descriptors: Foreign Students, Cultural Awareness, Consciousness Raising, Test Construction
David L. Westling; Karena Cooper-Duffy; A. Corinne Huggins-Manley – Teacher Education and Special Education, 2024
This study was conducted to collect validity evidence to support the use of an observation instrument to evaluate the performance of special education teachers (SETs) of students with significant disabilities (SWSD). In the study, a purposive sample of 49 SETs of SWSD, who were appropriately credentialed and experienced, evaluated the content of…
Descriptors: Students with Disabilities, Special Education Teachers, Severe Disabilities, Teacher Evaluation
Paula Elosua – Language Assessment Quarterly, 2024
In sociolinguistic contexts where standardized languages coexist with regional dialects, the study of differential item functioning is a valuable tool for examining certain linguistic uses or varieties as threats to score validity. From an ecological perspective, this paper describes three stages in the study of differential item functioning…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Validity
Emily A. Holt; Jessica Duke; Ryan Dunk; Krystal Hinerman – Environmental Education Research, 2024
Student understanding of climate change is an active and growing area of research, but little research has documented undergraduate students' knowledge about the biotic impacts of climate change. Here, we address this literature gap by presenting the Inventory of Biotic Climate Literacy (IBCL), a concept inventory developed to assess undergraduate…
Descriptors: Climate, Undergraduate Students, Knowledge Level, Test Construction
Lahza, Hatim; Smith, Tammy G.; Khosravi, Hassan – British Journal of Educational Technology, 2023
Traditional item analyses such as classical test theory (CTT) use exam-taker responses to assessment items to approximate their difficulty and discrimination. The increased adoption by educational institutions of electronic assessment platforms (EAPs) provides new avenues for assessment analytics by capturing detailed logs of an exam-taker's…
Descriptors: Medical Students, Evaluation, Computer Assisted Testing, Time Factors (Learning)