Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 21 |
Since 2006 (last 20 years) | 53 |
Descriptor
Scores | 76 |
Test Validity | 76 |
Test Reliability | 25 |
Test Construction | 20 |
Student Evaluation | 18 |
Test Items | 16 |
Evaluation Methods | 14 |
Standardized Tests | 14 |
Testing | 12 |
Elementary Secondary Education | 10 |
Psychometrics | 10 |
More ▼ |
Source
Author
Cizek, Gregory J. | 2 |
Erford, Bradley T. | 2 |
Hays, Danica G. | 2 |
Ketterlin-Geller, Leanne R. | 2 |
Liu, Kimy | 2 |
Allen, Jeff | 1 |
Ault, Haley | 1 |
Badger, Julia R. | 1 |
Balkin, Richard S. | 1 |
Bardos, Achilles N. | 1 |
Barrio Minton, Casey | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 10 |
Elementary Education | 9 |
Secondary Education | 8 |
High Schools | 7 |
Grade 5 | 6 |
Middle Schools | 5 |
Postsecondary Education | 5 |
Grade 6 | 4 |
Grade 7 | 4 |
Junior High Schools | 4 |
Early Childhood Education | 3 |
More ▼ |
Location
China | 2 |
United Kingdom | 2 |
United States | 2 |
Canada | 1 |
Georgia (Atlanta) | 1 |
Idaho | 1 |
Massachusetts | 1 |
Missouri | 1 |
Nebraska | 1 |
New Jersey | 1 |
North Carolina | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Comprehensive Education… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Youmi Suk – Asia Pacific Education Review, 2024
Regression discontinuity (RD) designs have gained significant popularity as a quasi-experimental device for evaluating education programs and policies. In this paper, we present a comprehensive review of RD designs, focusing on the continuity-based framework, the most widely adopted RD framework. We first review the fundamental aspects of RD…
Descriptors: Educational Research, Preschool Education, Regression (Statistics), Test Validity
Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022
We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…
Descriptors: Science Tests, Test Validity, Test Items, Test Construction
Chen, Yunxiao; Lee, Yi-Hsuan; Li, Xiaoou – Journal of Educational and Behavioral Statistics, 2022
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric…
Descriptors: Standardized Tests, Test Items, Test Validity, Scores
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Sengül Avsar, Asiye – Measurement: Interdisciplinary Research and Perspectives, 2020
In order to reach valid and reliable test scores, various test theories have been developed, and one of them is nonparametric item response theory (NIRT). Mokken Models are the most widely known NIRT models which are useful for small samples and short tests. Mokken Package is useful for Mokken Scale Analysis. An important issue about validity is…
Descriptors: Response Style (Tests), Nonparametric Statistics, Item Response Theory, Test Validity
Dadey, Nathan; Keng, Leslie; Boyer, Michelle; Marion, Scott – National Center for the Improvement of Educational Assessment, 2021
State summative educational assessment is about to begin in earnest. Rightfully, many are raising questions about the quality, meaning, and appropriate use of the assessment results. This document was written to support state educational agencies (SEAs) and their assessment providers in devising effective and efficient analysis plans. This…
Descriptors: Educational Assessment, Summative Evaluation, Student Evaluation, Test Use
Hoeve, Karen B. – Language Testing in Asia, 2022
High stakes test-based accountability systems primarily rely on aggregates and derivatives of scores from tests that were originally developed to measure individual student proficiency in subject areas such as math, reading/language arts, and now English language proficiency. Current validity models do not explicitly address this use of aggregate…
Descriptors: High Stakes Tests, Language Tests, Accountability, Educational Assessment
Lenz, A. Stephen; Ault, Haley; Balkin, Richard S.; Barrio Minton, Casey; Erford, Bradley T.; Hays, Danica G.; Kim, Bryan S. K.; Li, Chi – Measurement and Evaluation in Counseling and Development, 2022
In April 2021, The Association for Assessment and Research in Counseling Executive Council commissioned a time-referenced task group to revise the Responsibilities of Users of Standardized Tests (RUST) Statement (3rd edition) published by the Association for Assessment in Counseling (AAC) in 2003. The task group developed a work plan to implement…
Descriptors: Responsibility, Standardized Tests, Counselor Training, Ethics
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Validity is broadly defined as how well something measures what it's supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other.
Descriptors: Screening Tests, Scores, Test Validity, Test Reliability
Wise, Steven L. – Education Inquiry, 2019
A decision of whether to move from paper-and-pencil to computer-based tests is based largely on a careful weighing of the potential benefits of a change against its costs, disadvantages, and challenges. This paper briefly discusses the trade-offs involved in making such a transition, and then focuses on a relatively unexplored benefit of…
Descriptors: Computer Assisted Testing, Cheating, Test Wiseness, Scores
Center on Standards and Assessments Implementation, 2018
Reliability is a measure of consistency. It is the degree to which student results are the same when they take the same test on different occasions, when different scorers score the same item or task, and when different but equivalent tests are taken at the same time or at different times. Reliability is about making sure that different test forms…
Descriptors: Test Reliability, Test Validity, Student Evaluation, Test Bias
Walton, Kate E.; Allen, Jeff; Burrus, Jeremy; Murano, Dana – ACT, Inc., 2022
Social and emotional (SE) skills are known to be linked with many important life outcomes, including academic performance, performance on standardized college entrance exams, and college enrollment, although research on enrollment is limited. In this report, the authors present data points from a study that evaluated test-criterion validity…
Descriptors: Social Development, Emotional Development, College Attendance, Postsecondary Education
Fitzgerald, Jill; Shanahan, Timothy E. – International Literacy Association, 2020
Reading scores exist for a continuum of purposes, from informal assessment to formal standardized tests. This brief aims to answer the question: What matters most for elementary-grade teachers when thinking about reading scores, and what could policymakers do to help teachers? Three positions worth pursuing in this regard are shared: (1) every…
Descriptors: Reading Achievement, Scores, Elementary School Students, Elementary School Teachers
Mattern, Krista – ACT, Inc., 2019
A great deal has been written on the topic of test validity. Guiding our work at ACT are "The Standards for Educational and Psychological Testing" (2014), which outlines best practices in test development and validation. As ACT transitions from an assessment company to a learning, measurement, and navigation organization, a framework for…
Descriptors: Test Validity, Measurement Techniques, Evidence, Test Content
Lynch, Sarah – Practical Assessment, Research & Evaluation, 2022
In today's digital age, tests are increasingly being delivered on computers. Many of these computer-based tests (CBTs) have been adapted from paper-based tests (PBTs). However, this change in mode of test administration has the potential to introduce construct-irrelevant variance, affecting the validity of score interpretations. Because of this,…
Descriptors: Computer Assisted Testing, Tests, Scores, Scoring