NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 155 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Ian Jones; Ben Davies – International Journal of Research & Method in Education, 2024
Educational researchers often need to construct precise and reliable measurement scales of complex and varied representations such as participants' written work, videoed lesson segments and policy documents. Developing such scales using can be resource-intensive and time-consuming, and the outcomes are not always reliable. Here we present…
Descriptors: Educational Research, Comparative Analysis, Educational Researchers, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Gregory Chernov – Evaluation Review, 2025
Most existing solutions to the current replication crisis in science address only the factors stemming from specific poor research practices. We introduce a novel mechanism that leverages the experts' predictive abilities to analyze the root causes of replication failures. It is backed by the principle that the most accurate predictor is the most…
Descriptors: Replication (Evaluation), Prediction, Scientific Research, Failure
Peer reviewed Peer reviewed
Direct linkDirect link
Youmi Suk – Asia Pacific Education Review, 2024
Regression discontinuity (RD) designs have gained significant popularity as a quasi-experimental device for evaluating education programs and policies. In this paper, we present a comprehensive review of RD designs, focusing on the continuity-based framework, the most widely adopted RD framework. We first review the fundamental aspects of RD…
Descriptors: Educational Research, Preschool Education, Regression (Statistics), Test Validity
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Validity is broadly defined as how well something measures what it's supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other.
Descriptors: Screening Tests, Scores, Test Validity, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022
We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…
Descriptors: Science Tests, Test Validity, Test Items, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Chen, Yunxiao; Lee, Yi-Hsuan; Li, Xiaoou – Journal of Educational and Behavioral Statistics, 2022
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric…
Descriptors: Standardized Tests, Test Items, Test Validity, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Ferrando, Pere Joan; Lorenzo-Seva, Urbano – Educational and Psychological Measurement, 2019
Many psychometric measures yield data that are compatible with (a) an essentially unidimensional factor analysis solution and (b) a correlated-factor solution. Deciding which of these structures is the most appropriate and useful is of considerable importance, and various procedures have been proposed to help in this decision. The only fully…
Descriptors: Validity, Models, Correlation, Factor Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sengül Avsar, Asiye – Measurement: Interdisciplinary Research and Perspectives, 2020
In order to reach valid and reliable test scores, various test theories have been developed, and one of them is nonparametric item response theory (NIRT). Mokken Models are the most widely known NIRT models which are useful for small samples and short tests. Mokken Package is useful for Mokken Scale Analysis. An important issue about validity is…
Descriptors: Response Style (Tests), Nonparametric Statistics, Item Response Theory, Test Validity
Dadey, Nathan; Keng, Leslie; Boyer, Michelle; Marion, Scott – National Center for the Improvement of Educational Assessment, 2021
State summative educational assessment is about to begin in earnest. Rightfully, many are raising questions about the quality, meaning, and appropriate use of the assessment results. This document was written to support state educational agencies (SEAs) and their assessment providers in devising effective and efficient analysis plans. This…
Descriptors: Educational Assessment, Summative Evaluation, Student Evaluation, Test Use
Peer reviewed Peer reviewed
Direct linkDirect link
Hoeve, Karen B. – Language Testing in Asia, 2022
High stakes test-based accountability systems primarily rely on aggregates and derivatives of scores from tests that were originally developed to measure individual student proficiency in subject areas such as math, reading/language arts, and now English language proficiency. Current validity models do not explicitly address this use of aggregate…
Descriptors: High Stakes Tests, Language Tests, Accountability, Educational Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Lenz, A. Stephen; Ault, Haley; Balkin, Richard S.; Barrio Minton, Casey; Erford, Bradley T.; Hays, Danica G.; Kim, Bryan S. K.; Li, Chi – Measurement and Evaluation in Counseling and Development, 2022
In April 2021, The Association for Assessment and Research in Counseling Executive Council commissioned a time-referenced task group to revise the Responsibilities of Users of Standardized Tests (RUST) Statement (3rd edition) published by the Association for Assessment in Counseling (AAC) in 2003. The task group developed a work plan to implement…
Descriptors: Responsibility, Standardized Tests, Counselor Training, Ethics
Peer reviewed Peer reviewed
Direct linkDirect link
Montgomery, Alyssa; Dumont, Ron; Willis, John O. – Journal of Psychoeducational Assessment, 2017
The articles presented in this Special Issue provide evidence for many statistically significant relationships among error scores obtained from the Kaufman Test of Educational Achievement, Third Edition (KTEA)-3 between various groups of students with and without disabilities. The data reinforce the importance of examiners looking beyond the…
Descriptors: Evidence, Validity, Predictive Validity, Error Patterns
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L. – Education Inquiry, 2019
A decision of whether to move from paper-and-pencil to computer-based tests is based largely on a careful weighing of the potential benefits of a change against its costs, disadvantages, and challenges. This paper briefly discusses the trade-offs involved in making such a transition, and then focuses on a relatively unexplored benefit of…
Descriptors: Computer Assisted Testing, Cheating, Test Wiseness, Scores
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11