Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 16 |
Descriptor
Item Analysis | 36 |
Test Format | 36 |
Test Validity | 36 |
Test Items | 25 |
Test Reliability | 20 |
Test Construction | 15 |
Multiple Choice Tests | 10 |
Foreign Countries | 7 |
Higher Education | 7 |
Difficulty Level | 6 |
Language Tests | 6 |
More ▼ |
Source
Author
Benson, Jeri | 2 |
Huntley, Renee M. | 2 |
Abramzon, Andrea | 1 |
Ali, Syed Haris | 1 |
Allalouf, Avi | 1 |
Alweis, Richard L. | 1 |
Arce-Ferrer, Alvaro J. | 1 |
Ault, Marilyn | 1 |
Austin, Joe Dan | 1 |
Beglar, David | 1 |
Berberoglu, Giray | 1 |
More ▼ |
Publication Type
Reports - Research | 24 |
Journal Articles | 20 |
Speeches/Meeting Papers | 7 |
Reports - Evaluative | 4 |
Tests/Questionnaires | 4 |
Guides - Non-Classroom | 3 |
Reports - Descriptive | 3 |
Information Analyses | 2 |
Opinion Papers | 2 |
Education Level
Audience
Practitioners | 5 |
Researchers | 3 |
Teachers | 3 |
Administrators | 1 |
Location
Canada | 1 |
Georgia | 1 |
Japan | 1 |
Mexico | 1 |
New York | 1 |
New York (Albany) | 1 |
New York (Buffalo) | 1 |
New York (New York) | 1 |
New York (Rochester) | 1 |
New York (Syracuse) | 1 |
North Dakota | 1 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Arce-Ferrer, Alvaro J.; Bulut, Okan – Journal of Experimental Education, 2019
This study investigated the performance of four widely used data-collection designs in detecting test-mode effects (i.e., computer-based versus paper-based testing). The experimental conditions included four data-collection designs, two test-administration modes, and the availability of an anchor assessment. The test-level and item-level results…
Descriptors: Data Collection, Test Construction, Test Format, Computer Assisted Testing
Masrai, Ahmed – SAGE Open, 2022
Vocabulary size measures serve important functions, not only with respect to placing learners at appropriate levels on language courses but also with a view to examining the progress of learners. One of the widely reported formats suitable for these purposes is the Yes/No vocabulary test. The primary aim of this study was to introduce and provide…
Descriptors: Vocabulary Development, Language Tests, English (Second Language), Second Language Learning
Ali, Syed Haris; Carr, Patrick A.; Ruit, Kenneth G. – Journal of the Scholarship of Teaching and Learning, 2016
Plausible distractors are important for accurate measurement of knowledge via multiple-choice questions (MCQs). This study demonstrates the impact of higher distractor functioning on validity and reliability of scores obtained on MCQs. Freeresponse (FR) and MCQ versions of a neurohistology practice exam were given to four cohorts of Year 1 medical…
Descriptors: Scores, Multiple Choice Tests, Test Reliability, Test Validity
Öztürk-Gübes, Nese; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2016
The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Descriptors: Test Format, Item Response Theory, True Scores, Equated Scores
Zhang, Xijuan; Savalei, Victoria – Educational and Psychological Measurement, 2016
Many psychological scales written in the Likert format include reverse worded (RW) items in order to control acquiescence bias. However, studies have shown that RW items often contaminate the factor structure of the scale by creating one or more method factors. The present study examines an alternative scale format, called the Expanded format,…
Descriptors: Factor Structure, Psychological Testing, Alternative Assessment, Test Items
Frey, Bruce B.; Ellis, James D.; Bulgreen, Janis A.; Hare, Jana Craig; Ault, Marilyn – Electronic Journal of Science Education, 2015
"Scientific argumentation," defined as the ability to develop and analyze scientific claims, support claims with evidence from investigations of the natural world, and explain and evaluate the reasoning that connects the evidence to the claim, is a critical component of current science standards and is consistent with "Common Core…
Descriptors: Test Construction, Science Tests, Persuasive Discourse, Science Process Skills
McLean, Stuart; Kramer, Brandon; Beglar, David – Language Teaching Research, 2015
An important gap in the field of second language vocabulary assessment concerns the lack of validated tests measuring aural vocabulary knowledge. The primary purpose of this study is to introduce and provide preliminary validity evidence for the Listening Vocabulary Levels Test (LVLT), which has been designed as a diagnostic tool to measure…
Descriptors: Test Construction, Test Validity, English (Second Language), Second Language Learning
Alweis, Richard L.; Fitzpatrick, Caroline; Donato, Anthony A. – Journal of Education and Training Studies, 2015
Introduction: The Multiple Mini-Interview (MMI) format appears to mitigate individual rater biases. However, the format itself may introduce structural systematic bias, favoring extroverted personality types. This study aimed to gain a better understanding of these biases from the perspective of the interviewer. Methods: A sample of MMI…
Descriptors: Interviews, Interrater Reliability, Qualitative Research, Semi Structured Interviews
New York State Education Department, 2015
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. By comparing the intent of the NYSAA with its process and design, the validity of the…
Descriptors: Alternative Assessment, Grade 3, Grade 4, Grade 5
Kalaycioglu, Dilara Bakan; Berberoglu, Giray – Journal of Psychoeducational Assessment, 2011
This study is aimed to detect differential item functioning (DIF) items across gender groups, analyze item content for the possible sources of DIF, and eventually investigate the effect of DIF items on the criterion-related validity of the test scores in the quantitative section of the university entrance examination (UEE) in Turkey. The reason…
Descriptors: Test Bias, College Entrance Examinations, Item Analysis, Test Items
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation