Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 30 |
Since 2016 (last 10 years) | 78 |
Since 2006 (last 20 years) | 145 |
Descriptor
Comparative Analysis | 214 |
Difficulty Level | 214 |
Test Items | 214 |
Item Response Theory | 71 |
Item Analysis | 64 |
Foreign Countries | 54 |
Correlation | 38 |
Test Format | 37 |
Scores | 36 |
Test Construction | 36 |
Multiple Choice Tests | 32 |
More ▼ |
Source
Author
DeBoer, George E. | 3 |
Herrmann-Abell, Cari F. | 3 |
Hsu, Tse-Chi | 3 |
Kim, Sooyeon | 3 |
Sinharay, Sandip | 3 |
Benson, Jeri | 2 |
Benton, Tom | 2 |
Beretvas, S. Natasha | 2 |
Brutten, Sheila R. | 2 |
Cai, Li | 2 |
Cohen, Allan S. | 2 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 3 |
Location
Australia | 5 |
Germany | 5 |
Indonesia | 5 |
United States | 5 |
South Korea | 4 |
Turkey | 4 |
Japan | 3 |
Nigeria | 3 |
United Kingdom (England) | 3 |
Belgium | 2 |
District of Columbia | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Gyamfi, Abraham; Acquaye, Rosemary – Acta Educationis Generalis, 2023
Introduction: Item response theory (IRT) has received much attention in validation of assessment instrument because it allows the estimation of students' ability from any set of the items. Item response theory allows the difficulty and discrimination levels of each item on the test to be estimated. In the framework of IRT, item characteristics are…
Descriptors: Item Response Theory, Models, Test Items, Difficulty Level
Harrison, Scott; Kroehne, Ulf; Goldhammer, Frank; Lüdtke, Oliver; Robitzsch, Alexander – Large-scale Assessments in Education, 2023
Background: Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an…
Descriptors: Scoring, Test Items, Difficulty Level, Foreign Countries
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Liu, Jinghua; Becker, Kirk – Journal of Educational Measurement, 2022
For any testing programs that administer multiple forms across multiple years, maintaining score comparability via equating is essential. With continuous testing and high-stakes results, especially with less secure online administrations, testing programs must consider the potential for cheating on their exams. This study used empirical and…
Descriptors: Cheating, Item Response Theory, Scores, High Stakes Tests
Benton, Tom – Research Matters, 2020
This article reviews the evidence on the extent to which experts' perceptions of item difficulties, captured using comparative judgement, can predict empirical item difficulties. This evidence is drawn from existing published studies on this topic and also from statistical analysis of data held by Cambridge Assessment. Having reviewed the…
Descriptors: Test Items, Difficulty Level, Expertise, Comparative Analysis
Hayat, Bahrul – Cogent Education, 2022
The purpose of this study comprises (1) calibrating the Basic Statistics Test for Indonesian undergraduate psychology students using the Rasch model, (2) testing the impact of adjustment for guessing on item parameters, person parameters, test reliability, and distribution of item difficulty and person ability, and (3) comparing person scores…
Descriptors: Guessing (Tests), Statistics Education, Undergraduate Students, Psychology
Kim, Sooyeon; Walker, Michael – ETS Research Report Series, 2021
In this investigation, we used real data to assess potential differential effects associated with taking a test in a test center (TC) versus testing at home using remote proctoring (RP). We used a pseudo-equivalent groups (PEG) approach to examine group equivalence at the item level and the total score level. If our assumption holds that the PEG…
Descriptors: Testing, Distance Education, Comparative Analysis, Test Items
Herrmann-Abell, Cari F.; Hardcastle, Joseph; DeBoer, George E. – Grantee Submission, 2022
As implementation of the "Next Generation Science Standards" moves forward, there is a need for new assessments that can measure students' integrated three-dimensional science learning. The National Research Council has suggested that these assessments be multicomponent tasks that utilize a combination of item formats including…
Descriptors: Multiple Choice Tests, Conditioning, Test Items, Item Response Theory
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Bacon, Terrence E. – ProQuest LLC, 2023
The purpose of this study was to investigate developmental music aptitude with a broader sample in order to propose national norms. Research questions were: 1) To what extent are published Primary Measures of Music Aptitude (PMMA) norms different from those established using a current sample? 2) Are there comparative differences in PMMA item…
Descriptors: Psychometrics, Music, Aptitude Tests, Test Items
Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023
Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…
Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level
Thacker, Nathan L. – ProQuest LLC, 2023
Organic chemistry is a class well known to be difficult and necessary for many careers in the sciences, and as a result, has garnered interest in researching ways to improve student learning and comprehension. One potential way involves using eye tracking techniques to understand how students visually examine questions. Organic chemistry involves…
Descriptors: Science Instruction, Multiple Choice Tests, Organic Chemistry, Science Tests
Musa Adekunle Ayanwale – Discover Education, 2023
Examination scores obtained by students from the West African Examinations Council (WAEC), and National Business and Technical Examinations Board (NABTEB) may not be directly comparable due to differences in examination administration, item characteristics of the subject in question, and student abilities. For more accurate comparisons, scores…
Descriptors: Equated Scores, Mathematics Tests, Test Items, Test Format
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Kuang, Huan; Sahin, Fusun – Large-scale Assessments in Education, 2023
Background: Examinees may not make enough effort when responding to test items if the assessment has no consequence for them. These disengaged responses can be problematic in low-stakes, large-scale assessments because they can bias item parameter estimates. However, the amount of bias, and whether this bias is similar across administrations, is…
Descriptors: Test Items, Comparative Analysis, Mathematics Tests, Reaction Time