Publication Date
| In 2026 | 0 |
| Since 2025 | 20 |
| Since 2022 (last 5 years) | 84 |
| Since 2017 (last 10 years) | 235 |
| Since 2007 (last 20 years) | 369 |
Descriptor
| Difficulty Level | 369 |
| Test Reliability | 253 |
| Test Items | 212 |
| Foreign Countries | 181 |
| Test Validity | 149 |
| Test Construction | 103 |
| Item Response Theory | 86 |
| Reliability | 83 |
| Psychometrics | 71 |
| Scores | 62 |
| Multiple Choice Tests | 61 |
| More ▼ | |
Source
Author
| Schoen, Robert C. | 6 |
| Yang, Xiaotong | 4 |
| Al-Jarf, Reima | 3 |
| Alonzo, Julie | 3 |
| Anderson, Daniel | 3 |
| Paek, Insu | 3 |
| Prather, Edward E. | 3 |
| Tindal, Gerald | 3 |
| Alexander, Patricia A. | 2 |
| Atalmis, Erkan Hasan | 2 |
| Barniol, Pablo | 2 |
| More ▼ | |
Publication Type
Education Level
Audience
| Administrators | 1 |
| Community | 1 |
| Counselors | 1 |
| Parents | 1 |
| Teachers | 1 |
Location
| Indonesia | 27 |
| Turkey | 24 |
| Germany | 14 |
| Florida | 9 |
| Nigeria | 8 |
| United Kingdom | 8 |
| United States | 8 |
| Canada | 6 |
| Iran | 6 |
| Japan | 6 |
| Jordan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
| Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Janika Saretzki; Rosalie Andrae; Boris Forthmann; Mathias Benedek – Journal of Creative Behavior, 2025
Divergent thinking (DT) ability is widely regarded as a central cognitive capacity underlying creativity, but its assessment is challenged by the fact that DT tasks yield a variable number of responses. Various approaches for the scoring of DT tasks have been proposed, which differ in how responses are evaluated and aggregated within a task. The…
Descriptors: Creative Thinking, Creativity Tests, Scoring, Metacognition
Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023
The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…
Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability
Hanshu Zhang; Ran Zhou; Cheng-You Cheng; Sheng-Hsu Huang; Ming-Hui Cheng; Cheng-Ta Yang – Cognitive Research: Principles and Implications, 2025
Although it is commonly believed that automation aids human decision-making, conflicting evidence raises questions about whether individuals would gain greater advantages from automation in difficult tasks. Our study examines the combined influence of task difficulty and automation reliability on aided decision-making. We assessed decision…
Descriptors: Task Analysis, Difficulty Level, Decision Making, Automation
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Weingarden, Merav; Heyd-Metzuyanim, Einat – Journal of Mathematics Teacher Education, 2023
In this study, we examine "what went wrong" in our professional development program for encouraging cognitively demanding instruction, focusing on the difficulties we encountered in using an observational tool for evaluating this type of instruction and reaching inter-rater reliability. We do so through the lens of a discursive theory of…
Descriptors: Mathematics Instruction, Interrater Reliability, Cognitive Processes, Difficulty Level
Chia-Ying Chu; Pei-Hua Chen; Yi-Shin Tsai; Chieh-An Chen; Yi-Chih Chan; Yan-Jhe Ciou – Journal of Deaf Studies and Deaf Education, 2024
This study investigated the impact of language sample length on mean length of utterance (MLU) and aimed to determine the minimum number of utterances required for a reliable MLU. Conversations were collected from Mandarin-speaking, hard-of-hearing and typical-hearing children aged 16-81 months. The MLUs were calculated using sample sizes ranging…
Descriptors: Foreign Countries, Mandarin Chinese, Young Children, Language Acquisition
Rivka Gadot; Dina Tsybulsky – Smart Learning Environments, 2025
Critical thinking (CT) consists of a deliberate and reflective process that can lead to informed decisions. It involves scrutinizing the trustworthiness and consistency of underlying assumptions, the sources of data, and the validity of other information. CT embodies deliberate, self-regulated judgment incorporating cognitive abilities such as…
Descriptors: Critical Thinking, Data Collection, Information Management, Decision Making Skills
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
E.?B. Merki; S.?I. Hofer; A. Vaterlaus; A. Lichtenberger – Physical Review Physics Education Research, 2025
When describing motion in physics, the selection of a frame of reference is crucial. The graph of a moving object can look quite different based on the frame of reference. In recent years, various tests have been developed to assess the interpretation of kinematic graphs, but none of these tests have specifically addressed differences in reference…
Descriptors: Graphs, Motion, Physics, Secondary School Students
Krieglstein, Felix; Beege, Maik; Rey, Günter Daniel; Ginns, Paul; Krell, Moritz; Schneider, Sascha – Educational Psychology Review, 2022
For more than three decades, cognitive load theory has been addressing learning from a cognitive perspective. Based on this instructional theory, design recommendations and principles have been derived to manage the load on working memory while learning. The increasing attention paid to cognitive load theory in educational science quickly…
Descriptors: Cognitive Processes, Difficulty Level, Learning Theories, Test Reliability
Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024
Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…
Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction
Monika Lohani; Joel M. Cooper; Amy S. McDonnell; Gus G. Erickson; Trent G. Simmons; Amanda E. Carriero; Kaedyn W. Crabtree; David L. Strayer – Cognitive Research: Principles and Implications, 2024
The reliability of cognitive demand measures in controlled laboratory settings is well-documented; however, limited research has directly established their stability under real-life and high-stakes conditions, such as operating automated technology on actual highways. Partially automated vehicles have advanced to become an everyday mode of…
Descriptors: Cognitive Processes, Difficulty Level, Automation, Psychophysiology
Martin Steinbach; Carolin Eitemüller; Marc Rodemer; Maik Walpuski – International Journal of Science Education, 2025
The intricate relationship between representational competence and content knowledge in organic chemistry has been widely debated, and the ways in which representations contribute to task difficulty, particularly in assessment, remain unclear. This paper presents a multiple-choice test instrument for assessing individuals' knowledge of fundamental…
Descriptors: Organic Chemistry, Difficulty Level, Multiple Choice Tests, Fundamental Concepts

Peer reviewed
Direct link
