Publication Date
In 2025 | 6 |
Since 2024 | 11 |
Since 2021 (last 5 years) | 52 |
Since 2016 (last 10 years) | 114 |
Since 2006 (last 20 years) | 167 |
Descriptor
Item Response Theory | 193 |
Test Items | 193 |
Test Validity | 193 |
Test Reliability | 109 |
Foreign Countries | 72 |
Test Construction | 64 |
Psychometrics | 56 |
Difficulty Level | 55 |
Item Analysis | 35 |
Scores | 35 |
Mathematics Tests | 27 |
More ▼ |
Source
Author
Bejar, Isaac I. | 4 |
Paek, Insu | 3 |
Petscher, Yaacov | 3 |
Schoen, Robert C. | 3 |
Yang, Xiaotong | 3 |
Ackerman, Terry A. | 2 |
Baghaei, Purya | 2 |
Bichi, Ado Abdu | 2 |
Brown, Ted | 2 |
Bulut, Okan | 2 |
Hartig, Johannes | 2 |
More ▼ |
Publication Type
Education Level
Audience
Location
Indonesia | 10 |
Turkey | 8 |
Iran | 6 |
California | 5 |
Germany | 5 |
Australia | 4 |
Nigeria | 4 |
Alabama | 3 |
Canada | 3 |
Florida | 3 |
Idaho | 3 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
Cornelia Eva Neuert – Sociological Methods & Research, 2024
The quality of data in surveys is affected by response burden and questionnaire length. With an increasing number of questions, respondents can become bored, tired, and annoyed and may take shortcuts to reduce the effort needed to complete the survey. In this article, direct evidence is presented on how the position of items within a web…
Descriptors: Online Surveys, Test Items, Test Format, Test Construction
Mahdi Ghorbankhani; Keyvan Salehi – SAGE Open, 2025
Academic procrastination, the tendency to delay academic tasks without reasonable justification, has significant implications for students' academic performance and overall well-being. To measure this construct, numerous scales have been developed, among which the Academic Procrastination Scale (APS) has shown promise in assessing academic…
Descriptors: Psychometrics, Measures (Individuals), Time Management, Foreign Countries
Bruno D. Zumbo – International Journal of Assessment Tools in Education, 2023
In line with the journal volume's theme, this essay considers lessons from the past and visions for the future of test validity. In the first part of the essay, a description of historical trends in test validity since the early 1900s leads to the natural question of whether the discipline has progressed in its definition and description of test…
Descriptors: Test Theory, Test Validity, True Scores, Definitions
Grace C. Tetschner; Sachin Nedungadi – Chemistry Education Research and Practice, 2025
Many undergraduate chemistry students hold alternate conceptions related to resonance--an important and fundamental topic of organic chemistry. To help address these alternate conceptions, an organic chemistry instructor could administer the resonance concept inventory (RCI), which is a multiple-choice assessment that was designed to identify…
Descriptors: Scientific Concepts, Concept Formation, Item Response Theory, Scores
Oluwaseyi Aina Gbolade Opesemowo – Research in Social Sciences and Technology, 2023
Local Item Dependence (LID) is a desecration of Local Item Independence (LII) which can lead to overestimating or underestimating a candidate's ability in mathematics items and create validity problems. The study investigated the intra and inter-LID of mathematics items. The study made use of ex-post facto research. The population encompassed all…
Descriptors: Foreign Countries, Secondary School Students, Item Response Theory, Test Items
Balbuena, Sherwin – International Journal of Assessment Tools in Education, 2023
Depression is a latent characteristic that is measured through self-reported or clinician-mediated instruments such as scales and inventories. The precision of depression estimates largely depends on the validity of the items used and on the truthfulness of people responding to these items. The existing methodology in instrumentation based on a…
Descriptors: Depression (Psychology), Test Items, Test Validity, Test Reliability
Y. Yokhebed; Rexy Maulana Dwi Karmadi; Luvia Ranggi Nastiti – Journal of Biological Education Indonesia (Jurnal Pendidikan Biologi Indonesia), 2025
Although self-assessment in critical thinking is thought to help students recognise their strengths and weaknesses, the reliability and validity of the assessment tool is still questionable, so a more objective evaluation is needed. Objective of this investigation is to assess the self-assessment tools in evaluating students' critical thinking…
Descriptors: Self Evaluation (Individuals), Critical Thinking, Science and Society, Test Validity
Fergadiotis, Gerasimos; Casilio, Marianne; Dickey, Michael Walsh; Steel, Stacey; Nicholson, Hannele; Fleegle, Mikala; Swiderski, Alexander; Hula, William D. – Journal of Speech, Language, and Hearing Research, 2023
Purpose: Item response theory (IRT) is a modern psychometric framework with several advantageous properties as compared with classical test theory. IRT has been successfully used to model performance on anomia tests in individuals with aphasia; however, all efforts to date have focused on noun production accuracy. The purpose of this study is to…
Descriptors: Item Response Theory, Psychometrics, Verbs, Naming
Basman, Munevver – International Journal of Assessment Tools in Education, 2023
To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test…
Descriptors: Test Bias, Test Items, Test Validity, Item Response Theory
Chen, Yunxiao; Lee, Yi-Hsuan; Li, Xiaoou – Journal of Educational and Behavioral Statistics, 2022
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric…
Descriptors: Standardized Tests, Test Items, Test Validity, Scores
Che Lah, Noor Hidayah; Tasir, Zaidatun; Jumaat, Nurul Farhana – Educational Studies, 2023
The aim of the study was to evaluate the extended version of the Problem-Solving Inventory (PSI) via an online learning setting known as the Online Problem-Solving Inventory (OPSI) through the lens of Rasch Model analysis. To date, there is no extended version of the PSI for online settings even though many researchers have used it; thus, this…
Descriptors: Problem Solving, Measures (Individuals), Electronic Learning, Item Response Theory
Purwo Susongko; Chockchai Yuenyong; Mobinta Kusuma; Yuni Arfiani – Pegem Journal of Education and Instruction, 2024
This study aims to develop Nature of Science (NOS) measurement instrument for high school students in the Mathematics and Natural Sciences program using COVID-19 cases that had occurred since the end of 2019. This study also attempts to validate test items with the Rasch model approach. The participants in this study are 194 high school students…
Descriptors: Test Construction, Science Tests, Scientific Principles, COVID-19
Karadavut, Tugba – Applied Measurement in Education, 2021
Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…
Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics