Publication Date
In 2025 | 8 |
Since 2024 | 25 |
Since 2021 (last 5 years) | 84 |
Since 2016 (last 10 years) | 186 |
Since 2006 (last 20 years) | 299 |
Descriptor
Difficulty Level | 581 |
Test Construction | 581 |
Test Items | 423 |
Test Validity | 152 |
Item Analysis | 148 |
Test Reliability | 148 |
Foreign Countries | 144 |
Multiple Choice Tests | 131 |
Item Response Theory | 109 |
Higher Education | 78 |
Test Format | 74 |
More ▼ |
Source
Author
Tindal, Gerald | 17 |
Alonzo, Julie | 13 |
Anderson, Daniel | 8 |
Park, Bitnara Jasmine | 8 |
Huntley, Renee M. | 6 |
Irvin, P. Shawn | 6 |
Liu, Kimy | 6 |
Roid, Gale | 6 |
Saven, Jessica L. | 6 |
Bejar, Isaac I. | 5 |
Reckase, Mark D. | 5 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 21 |
Teachers | 10 |
Practitioners | 9 |
Policymakers | 5 |
Administrators | 4 |
Location
Indonesia | 17 |
Turkey | 11 |
Australia | 9 |
China | 8 |
Florida | 8 |
Germany | 7 |
Japan | 7 |
Nigeria | 7 |
Canada | 6 |
United Kingdom (England) | 6 |
Mexico | 5 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Onur Dönmez; Yavuz Akbulut; Gözde Zabzun; Berrin Köseoglu – Applied Cognitive Psychology, 2025
This study investigates the effect of survey order in measuring self-reported cognitive load. Understanding how survey order influences responses is crucial, but it has been largely overlooked in the context of cognitive load. Using a 2 × 2 experimental design with 319 high school students, the study manipulated intrinsic cognitive load (ICL)…
Descriptors: Surveys, Test Construction, Measurement, Cognitive Processes
Camilo Vieira; Andrea Vásquez; Federico Meza; Roxana Quintero-Manes; Pedro Godoy – ACM Transactions on Computing Education, 2024
Currently, there is little evidence about how non-English-speaking students learn computer programming. For example, there are few validated assessment instruments to measure the development of programming skills, especially for the Spanish-speaking population. Having valid assessment instruments is essential to identify the difficulties of the…
Descriptors: Programming, Spanish Speaking, Translation, Test Validity
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Krieglstein, Felix; Beege, Maik; Rey, Günter Daniel; Sanchez-Stockhammer, Christina; Schneider, Sascha – Educational Psychology Review, 2023
According to cognitive load theory, learning can only be successful when instructional materials and procedures are designed in accordance with human cognitive architecture. In this context, one of the biggest challenges is the accurate measurement of the different cognitive load types as these are associated with various activities during…
Descriptors: Test Construction, Test Validity, Questionnaires, Cognitive Processes
Yue Rong – International Journal of Web-Based Learning and Teaching Technologies, 2024
Mental health education in colleges and universities has made considerable progress, but the existing assessment model still faces challenges in terms of time overhead and rank indicators. In response, this paper proposes a new psychological education assessment model for colleges and universities, based on multimedia feature extraction…
Descriptors: Multimedia Instruction, Test Construction, Psychological Evaluation, Mental Health
Kam, Chester Chun Seng – Educational and Psychological Measurement, 2023
When constructing measurement scales, regular and reversed items are often used (e.g., "I am satisfied with my job"/"I am not satisfied with my job"). Some methodologists recommend excluding reversed items because they are more difficult to understand and therefore engender a second, artificial factor distinct from the…
Descriptors: Test Items, Difficulty Level, Test Construction, Construct Validity
Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024
Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…
Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction
Ober, Teresa M.; Lu, Yikai; Blacklock, Chessley B.; Liu, Cheng; Cheng, Ying – Journal of Psychoeducational Assessment, 2023
We develop and validate a self-report measure of intrinsic and extrinsic cognitive load suitable for measuring the constructs in a variety of learning contexts. Data were collected from three independent samples of college students in the U.S. (N[subscript total]= 513; M[subscript age]= 21.13 years). Kane's (2013) framework was used to validate…
Descriptors: Test Construction, Test Validity, Cognitive Processes, Difficulty Level
Douglas-Morris, Jan; Ritchie, Helen; Willis, Catherine; Reed, Darren – Anatomical Sciences Education, 2021
Multiple-choice (MC) anatomy "spot-tests" (identification-based assessments on tagged cadaveric specimens) offer a practical alternative to traditional free-response (FR) spot-tests. Conversion of the two spot-tests in an upper limb musculoskeletal anatomy unit of study from FR to a novel MC format, where one of five tagged structures on…
Descriptors: Multiple Choice Tests, Anatomy, Test Reliability, Difficulty Level
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Tino Endres; Lisa Bender; Stoo Sepp; Shirong Zhang; Louise David; Melanie Trypke; Dwayne Lieck; Juliette C. Désiron; Johanna Bohm; Sophia Weissgerber; Juan Cristobal Castro-Alonso; Fred Paas – Educational Psychology Review, 2025
Assessing cognitive demand is crucial for research on self-regulated learning; however, discrepancies in translating essential concepts across languages can hinder the comparison of research findings. Different languages often emphasize various components and interpret certain constructs differently. This paper aims to develop a translingual set…
Descriptors: Cognitive Processes, Difficulty Level, Metacognition, Translation
Ruying Li; Gaofeng Li – International Journal of Science and Mathematics Education, 2025
Systems thinking (ST) is an essential competence for future life and biology learning. Appropriate assessment is critical for collecting sufficient information to develop ST in biology education. This research offers an ST framework based on a comprehensive understanding of biological systems, encompassing four skills across three complexity…
Descriptors: Test Construction, Test Validity, Science Tests, Cognitive Tests
Alan Shaw – PASAA: Journal of Language Teaching and Learning in Thailand, 2023
Although the TOEFL iBT Listening test is sometimes used for other purposes, it was designed primarily for use as a college entrance examination. Item difficulty in TOEFL iBT Listening tests is the product of interactions between two sets of complex relationships: 1) relationships among numerous item characteristics themselves, and 2) relationships…
Descriptors: English (Second Language), Second Language Instruction, Listening Skills, Language Tests
Rushton, Nicky; Vitello, Sylvia; Suto, Irenka – Research Matters, 2021
It is important to define what an error in a question paper is so that there is a common understanding and to avoid people's own conceptions impacting upon the way in which they write or check question papers. We carried out an interview study to investigate our colleagues' definitions of error. We found that there is no single accepted definition…
Descriptors: Definitions, Tests, Foreign Countries, Problems
Jenna M. T. Vest – ProQuest LLC, 2024
This study focuses on creating a reliable and valid instrument to measure high school students' perceptions of academic challenge. The research is divided into four phases: qualitative analysis, item development, exploratory factor analysis (EFA), and validation. Initial data from college students' retrospective views and high school students'…
Descriptors: Test Construction, Test Validity, Student Attitudes, Academic Achievement