Publication Date
In 2025 | 7 |
Since 2024 | 16 |
Since 2021 (last 5 years) | 87 |
Since 2016 (last 10 years) | 177 |
Since 2006 (last 20 years) | 246 |
Descriptor
Difficulty Level | 398 |
Test Reliability | 398 |
Test Items | 270 |
Test Validity | 189 |
Test Construction | 148 |
Foreign Countries | 137 |
Item Analysis | 84 |
Multiple Choice Tests | 80 |
Item Response Theory | 78 |
Psychometrics | 68 |
Scores | 53 |
More ▼ |
Source
Author
Schoen, Robert C. | 6 |
DiLuzio, Geneva J. | 4 |
Yang, Xiaotong | 4 |
Alonzo, Julie | 3 |
Anderson, Daniel | 3 |
Huck, Schuyler W. | 3 |
Paek, Insu | 3 |
Prather, Edward E. | 3 |
Thompson, Bruce | 3 |
Tindal, Gerald | 3 |
Weiten, Wayne | 3 |
More ▼ |
Publication Type
Education Level
Location
Indonesia | 20 |
Turkey | 20 |
Germany | 10 |
Florida | 8 |
Nigeria | 8 |
United States | 7 |
United Kingdom | 6 |
Australia | 5 |
China | 5 |
Japan | 5 |
South Korea | 5 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 2 |
No Child Left Behind Act 2001 | 1 |
Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Chia-Ying Chu; Pei-Hua Chen; Yi-Shin Tsai; Chieh-An Chen; Yi-Chih Chan; Yan-Jhe Ciou – Journal of Deaf Studies and Deaf Education, 2024
This study investigated the impact of language sample length on mean length of utterance (MLU) and aimed to determine the minimum number of utterances required for a reliable MLU. Conversations were collected from Mandarin-speaking, hard-of-hearing and typical-hearing children aged 16-81 months. The MLUs were calculated using sample sizes ranging…
Descriptors: Foreign Countries, Mandarin Chinese, Young Children, Language Acquisition
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Krieglstein, Felix; Beege, Maik; Rey, Günter Daniel; Ginns, Paul; Krell, Moritz; Schneider, Sascha – Educational Psychology Review, 2022
For more than three decades, cognitive load theory has been addressing learning from a cognitive perspective. Based on this instructional theory, design recommendations and principles have been derived to manage the load on working memory while learning. The increasing attention paid to cognitive load theory in educational science quickly…
Descriptors: Cognitive Processes, Difficulty Level, Learning Theories, Test Reliability
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024
Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…
Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction
Miller, Dan J.; Noble, Prisca; Medlen, Sue; Jones, Karina; Munns, Suzanne L. – Journal of Experimental Education, 2023
The cognitive load imposed by instruction is an important consideration for instructional designers. Theoretical models have traditionally divided total cognitive load into intrinsic, extrinsic, and germane load. The 10-item Cognitive Load Inventory (CLI-10) is designed to measure these three types of cognitive load. It is typically administered…
Descriptors: Psychometrics, Cognitive Processes, Difficulty Level, Factor Analysis
Chakrabartty, Satyendra Nath – International Journal of Psychology and Educational Studies, 2021
The paper proposes new measures of difficulty and discriminating values of binary items and test consisting of such items and find their relationships including estimation of test error variance and thereby the test reliability, as per definition using cosine similarities. The measures use entire data. Difficulty value of test and item is defined…
Descriptors: Test Items, Difficulty Level, Scores, Test Reliability
Xin Lin; Sarah R. Powell – Assessment for Effective Intervention, 2024
Developing mathematics proficiency requires an understanding of mathematics vocabulary. Although previous research has developed several measures of mathematics vocabulary at different grade levels, no study focused solely on fraction vocabularies. We developed and tested a measure of fraction vocabulary for students in Grade 4 to determine the…
Descriptors: Mathematics Education, Mathematics Skills, Fractions, Vocabulary
Suwita Suwita; Sulistyo Saputro; Sajidan Sajidan; Sutarno Sutarno – Journal of Baltic Science Education, 2024
The current study uses the Rasch Model to measure lower-secondary school students' critical thinking skills on photosynthesis topics. Critical thinking skills are considered essential in science education, but few valid and practical measurement instruments remain. The current study fills the gap by adapting the instrument from the Watson-Glaser…
Descriptors: Secondary School Students, Critical Thinking, Thinking Skills, Botany
Douglas-Morris, Jan; Ritchie, Helen; Willis, Catherine; Reed, Darren – Anatomical Sciences Education, 2021
Multiple-choice (MC) anatomy "spot-tests" (identification-based assessments on tagged cadaveric specimens) offer a practical alternative to traditional free-response (FR) spot-tests. Conversion of the two spot-tests in an upper limb musculoskeletal anatomy unit of study from FR to a novel MC format, where one of five tagged structures on…
Descriptors: Multiple Choice Tests, Anatomy, Test Reliability, Difficulty Level
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Ruying Li; Gaofeng Li – International Journal of Science and Mathematics Education, 2025
Systems thinking (ST) is an essential competence for future life and biology learning. Appropriate assessment is critical for collecting sufficient information to develop ST in biology education. This research offers an ST framework based on a comprehensive understanding of biological systems, encompassing four skills across three complexity…
Descriptors: Test Construction, Test Validity, Science Tests, Cognitive Tests
Y. Yokhebed; Rexy Maulana Dwi Karmadi; Luvia Ranggi Nastiti – Journal of Biological Education Indonesia (Jurnal Pendidikan Biologi Indonesia), 2025
Although self-assessment in critical thinking is thought to help students recognise their strengths and weaknesses, the reliability and validity of the assessment tool is still questionable, so a more objective evaluation is needed. Objective of this investigation is to assess the self-assessment tools in evaluating students' critical thinking…
Descriptors: Self Evaluation (Individuals), Critical Thinking, Science and Society, Test Validity