Publication Date
In 2025 | 1 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 16 |
Descriptor
Scores | 70 |
Test Construction | 70 |
Test Use | 70 |
Test Validity | 24 |
Test Interpretation | 15 |
Test Reliability | 15 |
Elementary Secondary Education | 13 |
Test Results | 12 |
Achievement Tests | 11 |
Higher Education | 11 |
Test Items | 11 |
More ▼ |
Source
Author
Thompson, Bruce | 3 |
Hambleton, Ronald K. | 2 |
Messick, Samuel | 2 |
Saurino, Dan R. | 2 |
Albert Weideman | 1 |
Amery D. Wu | 1 |
Andrich, David | 1 |
Armstrong, Anne-Marie | 1 |
Arter, Judith A. | 1 |
Ashmore, Robert J. | 1 |
Bart Deygers | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 5 |
Higher Education | 4 |
Postsecondary Education | 4 |
Elementary Education | 2 |
Secondary Education | 2 |
Adult Basic Education | 1 |
Adult Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Location
Hong Kong | 2 |
Indiana | 2 |
Ohio | 2 |
Alabama | 1 |
Colorado | 1 |
Delaware | 1 |
Europe | 1 |
Indonesia | 1 |
Kansas | 1 |
Massachusetts | 1 |
Michigan | 1 |
More ▼ |
Laws, Policies, & Programs
Comprehensive Education… | 1 |
Every Student Succeeds Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Laura Schildt; Bart Deygers; Albert Weideman – Language Testing, 2024
In the context of policy-driven language testing for citizenship, a growing body of research examines the political justifications and ethical implications of language requirements and test use. However, virtually no studies have looked at the role that language testers play in the evolution of language requirements. Critical gaps remain in our…
Descriptors: Language Tests, Citizenship, Educational Policy, Assessment Literacy
Schmitt, Norbert; Nation, Paul; Kremmel, Benjamin – Language Teaching, 2020
Recently, a large number of vocabulary tests have been made available to language teachers, testers, and researchers. Unfortunately, most of them have been launched with inadequate validation evidence. The field of language testing has become increasingly more rigorous in the area of test validation, but developers of vocabulary tests have…
Descriptors: Test Construction, Test Validity, Language Tests, Test Use
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Knoch, Ute; Deygers, Bart; Khamboonruang, Apichat – Language Testing, 2021
Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing…
Descriptors: Rating Scales, Test Construction, Language Tests, Test Use
Tatiana Chaiban; Zeinab Nahle; Ghaith Assi; Michelle Cherfane – Discover Education, 2024
Background: Since it was first launched, ChatGPT, a Large Language Model (LLM), has been widely used across different disciplines, particularly the medical field. Objective: The main aim of this review is to thoroughly assess the performance of the distinct version of ChatGPT in subspecialty written medical proficiency exams and the factors that…
Descriptors: Medical Education, Accuracy, Artificial Intelligence, Computer Software
Yulianto, Ahmad; Pudjitriherwanti, Anastasia; Kusumah, Chevy; Oktavia, Dies – International Journal of Language Testing, 2023
The increasing use of computer-based mode in language testing raises concern over its similarities with and differences from paper-based format. The present study aimed to delineate discrepancies between TOEFL PBT and CBT. For that objective, a quantitative method was employed to probe into scores equivalence, the performance of male-female…
Descriptors: Computer Assisted Testing, Test Format, Comparative Analysis, Scores
Schmidgall, Jonathan; Cid, Jaime; Carter Grissom, Elizabeth; Li, Lucy – ETS Research Report Series, 2021
The redesigned "TOEIC Bridge"® tests were designed to evaluate test takers' English listening, reading, speaking, and writing skills in the context of everyday adult life. In this paper, we summarize the initial validity argument that supports the use of test scores for the purpose of selection, placement, and evaluation of a test…
Descriptors: Language Tests, Second Language Learning, English (Second Language), Language Proficiency
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS) to provide guidance to states that are interested in including New Meridian content and would like to either keep reporting scores on the New Meridian Scale or use the New Meridian performance levels; that is, the state…
Descriptors: Testing, Standards, Comparative Analysis, Test Content
Sanders, Sara – National Technical Assistance Center for the Education of Neglected or Delinquent Children and Youth (NDTAC), 2019
This guide is designed to assist States, agencies, and/or facilities who work with youth who are neglected, delinquent, or at-risk (N or D). The information in the guide will benefit those who are (a) interested in implementing pre-posttests, (b) in the process of identifying an appropriate pre-posttest, or (c) ready to evaluate current testing…
Descriptors: At Risk Students, Delinquency, Pretests Posttests, Testing
Traynor, Anne – Educational Assessment, 2017
Variation in test performance among examinees from different regions or national jurisdictions is often partially attributed to differences in the degree of content correspondence between local school or training program curricula, and the test of interest. This posited relationship between test-curriculum correspondence, or "alignment,"…
Descriptors: Test Items, Test Construction, Alignment (Education), Curriculum
Mann, Wolfgang; Marshall, Chloe R. – International Journal of Bilingual Education and Bilingualism, 2010
In this article, we adapt a concept designed to structure language testing more effectively, the "Assessment Use Argument" ("AUA"), as a framework for the development and/or use of sign language assessments for deaf children who are taught in a sign bilingual education setting. By drawing on data from a recent investigation of…
Descriptors: Sign Language, Bilingual Education, Deafness, Language Tests
Nichols, Paul D.; Williams, Natasha – Educational Measurement: Issues and Practice, 2009
This article has three goals. The first goal is to clarify the role that the consequences of test score use play in validity judgments by reviewing the role that modern writers on validity have ascribed for consequences in supporting validity judgments. The second goal is to summarize current views on who is responsible for collecting evidence of…
Descriptors: Tests, Test Validity, Scores, Data Collection
Jia, Yujie – ProQuest LLC, 2013
This study employed Bachman and Palmer's (2010) Assessment Use Argument framework to investigate to what extent the use of a second language oral test as an exit test in a Hong Kong university can be justified. It also aimed to help test developers of this oral test identify the most critical areas in the current test design that might need…
Descriptors: Test Use, Language Tests, Oral Language, Second Language Learning

Burney, DeAnna McKinnie; Kromrey, Jeffrey – Educational and Psychological Measurement, 2001
Studied the construct validity of scores on the Adolescent Anger Rating Scale (AARS) developed to measure instrumental and reactive anger. Results for 792 12- to 19-year-olds indicate that AARS scores are internally consistent and stable when anger subtypes are measured. (SLD)
Descriptors: Adolescents, Anger, Scores, Test Construction
Bovaird, James A., Ed.; Geisinger, Kurt F., Ed.; Buckendahl, Chad W., Ed. – APA Books, 2011
Educational assessment and, more broadly, educational research in the United States have entered into an era characterized by a dramatic increase in the prevalence and importance of test score use in accountability systems. This volume covers a selection of contemporary issues about testing science and practice that impact the nation's public…
Descriptors: Graduate Students, Test Use, Student Placement, Educational Research