Publication Date
In 2025 | 2 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 13 |
Since 2006 (last 20 years) | 30 |
Descriptor
Comparative Analysis | 61 |
Test Reliability | 61 |
Scoring | 47 |
Test Validity | 32 |
Test Construction | 15 |
Test Items | 13 |
Scoring Formulas | 12 |
Multiple Choice Tests | 11 |
Testing | 11 |
Foreign Countries | 10 |
Item Analysis | 10 |
More ▼ |
Source
Author
Bauer, Daniel | 2 |
Fischer, Martin R. | 2 |
Hakstian, A. Ralph | 2 |
Kansup, Wanlop | 2 |
Alligood, Leon | 1 |
Alqarni, Abdulelah Mohammed | 1 |
Attali, Yigal | 1 |
August, Diane | 1 |
Bailey, Dallin J. | 1 |
Balogh, Jennifer | 1 |
Beach, Tyson A. C. | 1 |
More ▼ |
Publication Type
Reports - Research | 39 |
Journal Articles | 31 |
Reports - Evaluative | 8 |
Speeches/Meeting Papers | 5 |
Reports - Descriptive | 3 |
Books | 2 |
Guides - Non-Classroom | 2 |
Tests/Questionnaires | 2 |
Collected Works - General | 1 |
Guides - General | 1 |
Education Level
Higher Education | 5 |
Postsecondary Education | 5 |
Secondary Education | 4 |
Elementary Education | 3 |
Elementary Secondary Education | 3 |
High Schools | 3 |
Kindergarten | 2 |
Adult Education | 1 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 4 | 1 |
More ▼ |
Audience
Practitioners | 2 |
Teachers | 1 |
Location
Japan | 2 |
Taiwan | 2 |
Australia | 1 |
Europe | 1 |
Florida | 1 |
Germany | 1 |
Iran | 1 |
Maryland | 1 |
New York (New York) | 1 |
Switzerland (Geneva) | 1 |
Tennessee | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Jeff Allen; Ty Cruce – ACT Education Corp., 2025
This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…
Descriptors: College Entrance Examinations, Testing, Change, Scores
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Bailey, Dallin J.; Bunker, Lisa; Mauszycki, Shannon; Wambaugh, Julie L. – International Journal of Language & Communication Disorders, 2019
Background: Acquired apraxia of speech (AOS) involves speech-production deficits on both the segmental and suprasegmental levels. Recent research has identified a non-linear interaction between the metrical structure of bisyllabic words and word-production accuracy in German speakers with AOS, with trochaic words (strong-weak stress) being…
Descriptors: Accuracy, Suprasegmentals, Phonology, German
Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019
This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…
Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring
Severo, Milton; Gaio, A. Rita; Povo, Ana; Silva-Pereira, Fernanda; Ferreira, Maria Amélia – Anatomical Sciences Education, 2015
In theory the formula scoring methods increase the reliability of multiple-choice tests in comparison with number-right scoring. This study aimed to evaluate the impact of the formula scoring method in clinical anatomy multiple-choice examinations, and to compare it with that from the number-right scoring method, hoping to achieve an…
Descriptors: Anatomy, Multiple Choice Tests, Scoring, Decision Making
Xu, Jing; Jones, Edmund; Laxton, Victoria; Galaczi, Evelina – Assessment in Education: Principles, Policy & Practice, 2021
Recent advances in machine learning have made automated scoring of learner speech widespread, and yet validation research that provides support for applying automated scoring technology to assessment is still in its infancy. Both the educational measurement and language assessment communities have called for greater transparency in describing…
Descriptors: Second Language Learning, Second Language Instruction, English (Second Language), Computer Software
Kelleher, Leila K.; Beach, Tyson A. C.; Frost, David M.; Johnson, Andrew M.; Dickey, James P. – Measurement in Physical Education and Exercise Science, 2018
The scoring scheme for the functional movement screen implicitly assumes that the factor structure is consistent, stable, and congruent across different populations. To determine if this is the case, we compared principal components analyses of three samples: a healthy, general population (n = 100), a group of varsity athletes (n = 101), and a…
Descriptors: Factor Structure, Test Reliability, Screening Tests, Motion
Beltrán, Jorge – Working Papers in TESOL & Applied Linguistics, 2016
In the assessment of aural skills of second language learners, the study of the inclusion of visual stimuli has almost exclusively been conducted in the context of listening assessment. While the inclusion of contextual information in test input has been advocated for by numerous researchers (Ockey, 2010), little has been said regarding the…
Descriptors: Achievement Tests, Speech Skills, Speech Tests, Second Language Learning
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016
In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…
Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics
Kloser, Matthew; Borko, Hilda; Martinez, Jose Felipe; Stecher, Brian; Luskin, Rebecca – Science Education, 2017
Assessments are powerful tools for informing teachers and students about where student thinking stands with relation to a learning goal. Yet, few studies provide qualitative analyses of assessment practice across a unit. This study uses a framework of nine dimensions of effective assessment practice in science classrooms to compare more and less…
Descriptors: Secondary School Science, Evidence, Portfolio Assessment, Middle School Teachers
Farwell, Tricia M.; Alligood, Leon; Fitzgerald, Sharon; Blake, Ken – Journalism and Mass Communication Educator, 2016
This article introduces an objective grammar and math assessment and evaluates the assessment's outcome and reliability when fielded among eighty-one students in media writing courses. In addition, the article proposes a rubric for grading straight news leads and compares the rubric's reliability with the reliability of rating straight news leads…
Descriptors: Journalism, Journalism Education, Introductory Courses, Reliability
Balogh, Jennifer; Bernstein, Jared; Cheng, Jian; Van Moere, Alistair; Townshend, Brent; Suzuki, Masanori – Educational and Psychological Measurement, 2012
A two-part experiment is presented that validates a new measurement tool for scoring oral reading ability. Data collected by the U.S. government in a large-scale literacy assessment of adults were analyzed by a system called VersaReader that uses automatic speech recognition and speech processing technologies to score oral reading fluency. In the…
Descriptors: Reading Fluency, Measures (Individuals), Scoring, Reading Ability
Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020
Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…
Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis
Mitchell, Alison M.; Truckenmiller, Adrea; Petscher, Yaacov – Communique, 2015
As part of the Race to the Top initiative, the United States Department of Education made nearly 1 billion dollars available in State Educational Technology grants with the goal of ramping up school technology. One result of this effort is that states, districts, and schools across the country are using computerized assessments to measure their…
Descriptors: Computer Assisted Testing, Educational Technology, Testing, Efficiency