Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 17 |
| Since 2007 (last 20 years) | 35 |
Descriptor
| Comparative Analysis | 71 |
| Test Format | 71 |
| Test Reliability | 57 |
| Test Validity | 25 |
| Test Items | 24 |
| Computer Assisted Testing | 20 |
| Multiple Choice Tests | 16 |
| Foreign Countries | 15 |
| Higher Education | 15 |
| Language Tests | 15 |
| Test Construction | 15 |
| More ▼ | |
Source
Author
| Federico, Pat-Anthony | 2 |
| Lee, Won-Chan | 2 |
| Mott, Michael S. | 2 |
| Acar, Selcuk | 1 |
| Ahmadi, Alireza | 1 |
| Alemi, Minoo | 1 |
| Algozzine, Bob | 1 |
| Algozzine, Kate | 1 |
| Allison, Donald E. | 1 |
| Alpayar, Cagla | 1 |
| Anguiano-Carrsaco, Cristina | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 14 |
| Postsecondary Education | 10 |
| High Schools | 3 |
| Elementary Secondary Education | 2 |
| Secondary Education | 2 |
| Early Childhood Education | 1 |
| Elementary Education | 1 |
| Grade 8 | 1 |
| Middle Schools | 1 |
| Preschool Education | 1 |
Audience
| Practitioners | 1 |
| Researchers | 1 |
| Teachers | 1 |
Location
| Iran | 2 |
| Japan | 2 |
| California | 1 |
| Finland | 1 |
| France | 1 |
| Hong Kong | 1 |
| Israel | 1 |
| Maryland | 1 |
| Missouri | 1 |
| Taiwan | 1 |
| Turkey | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Grajzel, Katalin; Dumas, Denis; Acar, Selcuk – Journal of Creative Behavior, 2022
One of the best-known and most frequently used measures of creative idea generation is the Torrance Test of Creative Thinking (TTCT). The TTCT Verbal, assessing verbal ideation, contains two forms created to be used interchangeably by researchers and practitioners. However, the parallel forms reliability of the two versions of the TTCT Verbal has…
Descriptors: Test Reliability, Creative Thinking, Creativity Tests, Verbal Ability
Benton, Tom – Research Matters, 2021
Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…
Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Olsen, Jacob; Preston, Angela I.; Algozzine, Bob; Algozzine, Kate; Cusumano, Dale – Clearing House: A Journal of Educational Strategies, Issues and Ideas, 2018
Although it is widely agreed that there is no universally accepted definition for school climate, most professionals ground it in shared beliefs, values, and attitudes reflecting the quality and character of life in schools. In this article, we review and analyze measures accessible to school personnel charged with documenting and monitoring…
Descriptors: Educational Environment, Measures (Individuals), School Personnel, Test Format
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Martin-Raugh, Michelle P.; Anguiano-Carrsaco, Cristina; Jackson, Teresa; Brenneman, Meghan W.; Carney, Lauren; Barnwell, Patrick; Kochert, Jonathan – International Journal of Testing, 2018
Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far…
Descriptors: Test Format, Test Reliability, Test Validity, Predictive Validity
Storme, Martin; Myszkowski, Nils; Baron, Simon; Bernard, David – Journal of Intelligence, 2019
Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in…
Descriptors: Intelligence Tests, Item Response Theory, Comparative Analysis, Test Reliability
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning
Ford, Jeremy W.; Conoyer, Sarah J.; Lembke, Erica S.; Smith, R. Alex; Hosp, John L. – Assessment for Effective Intervention, 2018
In the present study, two types of curriculum-based measurement (CBM) tools in science, Vocabulary Matching (VM) and Statement Verification for Science (SV-S), a modified Sentence Verification Technique, were compared. Specifically, this study aimed to determine whether the format of information presented (i.e., SV-S vs. VM) produces differences…
Descriptors: Curriculum Based Assessment, Evaluation Methods, Measurement Techniques, Comparative Analysis
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Bush, Martin – Assessment & Evaluation in Higher Education, 2015
The humble multiple-choice test is very widely used within education at all levels, but its susceptibility to guesswork makes it a suboptimal assessment tool. The reliability of a multiple-choice test is partly governed by the number of items it contains; however, longer tests are more time consuming to take, and for some subject areas, it can be…
Descriptors: Guessing (Tests), Multiple Choice Tests, Test Format, Test Reliability
Hao, Tao; Wang, Zhe; Ardasheva, Yuliya – Journal of Research on Educational Effectiveness, 2021
This meta-analysis reviewed research between 2012 and 2018 focused on technology-assisted second language (L2) vocabulary learning for English as a foreign language (EFL) learner. A total of 45 studies of 2,374 preschool-to-college EFL students contributed effect sizes to this meta-analysis. Compared with traditional instructional methods, the…
Descriptors: Vocabulary Development, Second Language Learning, Second Language Instruction, English (Second Language)

Peer reviewed
Direct link
