Publication Date
In 2025 | 3 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 34 |
Since 2016 (last 10 years) | 87 |
Since 2006 (last 20 years) | 159 |
Descriptor
Scores | 650 |
Test Interpretation | 650 |
Test Validity | 156 |
Test Results | 149 |
Elementary Secondary Education | 141 |
Achievement Tests | 135 |
Standardized Tests | 121 |
Test Use | 98 |
Test Construction | 97 |
Test Reliability | 91 |
Academic Achievement | 87 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Practitioners | 48 |
Researchers | 23 |
Teachers | 16 |
Administrators | 15 |
Parents | 8 |
Policymakers | 6 |
Community | 4 |
Students | 4 |
Counselors | 2 |
Location
Canada | 8 |
Pennsylvania | 7 |
Alaska | 6 |
Japan | 6 |
New Jersey | 6 |
United Kingdom | 5 |
California | 4 |
Delaware | 4 |
Michigan | 4 |
United States | 4 |
Australia | 3 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 8 |
No Child Left Behind Act 2001 | 6 |
Americans with Disabilities… | 1 |
Every Student Succeeds Act… | 1 |
Individuals with Disabilities… | 1 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Folger, Timothy D.; Bostic, Jonathan; Krupa, Erin E. – Educational Measurement: Issues and Practice, 2023
Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in…
Descriptors: Test Interpretation, Scores, Test Use, Test Validity
Ching-Ni Hsieh – ETS Research Report Series, 2024
The TOEFL Junior® tests are designed to evaluate young language students' English reading, listening, speaking, and writing skills in an English-medium secondary instructional context. This paper articulates a validity argument constructed to support the use and interpretation of the TOEFL Junior test scores for the purpose of placement, progress…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scores
Uto, Masaki; Aomi, Itsuki; Tsutsumi, Emiko; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2023
In automated essay scoring (AES), essays are automatically graded without human raters. Many AES models based on various manually designed features or various architectures of deep neural networks (DNNs) have been proposed over the past few decades. Each AES model has unique advantages and characteristics. Therefore, rather than using a single-AES…
Descriptors: Prediction, Scores, Computer Assisted Testing, Scoring
Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Advances in Health Sciences Education, 2022
Understanding the response process used by test takers when responding to multiple-choice questions (MCQs) is particularly important in evaluating the validity of score interpretations. Previous authors have recommended eye-tracking technology as a useful approach for collecting data on the processes test taker's use to respond to test questions.…
Descriptors: Eye Movements, Artificial Intelligence, Scores, Test Interpretation
Rios, Joseph A.; Miranda, Alejandra A. – Educational Measurement: Issues and Practice, 2021
Subscore added value analyses assume invariance across test taking populations; however, this assumption may be untenable in practice as differential subdomain relationships may be present among subgroups. The purpose of this simulation study was to understand the conditions associated with subscore added value noninvariance when manipulating: (1)…
Descriptors: Scores, Test Length, Ability, Correlation
Kuhn, Melissa Gayle – ProQuest LLC, 2022
Validity in psychometrics refers to the degree to which evidence and theory supports the interpretations drawn from a test, and Messick's Contemporary Validity Theory (1994) includes several facets with well-established evidence collection methods. However, there is a lack of consensus on appropriate methods of evaluating the facet of…
Descriptors: Test Validity, Psychometrics, Test Interpretation, Scores
Frank Goldhammer; Ulf Kroehne; Carolin Hahnel; Johannes Naumann; Paul De Boeck – Journal of Educational Measurement, 2024
The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability…
Descriptors: Timed Tests, Efficiency, Scores, Test Interpretation
Puttaswamy, Ash; Barone, Anjelica; Viezel, Kathleen D.; Willis, John O.; Dumont, Ron – Journal of Psychoeducational Assessment, 2020
An area of particular importance when examining index scores on the Wechsler Intelligence Scale for Children--Fifth Edition (WISC-V) is the utilization and interpretation of critical values and base rates associated with differences between an individual's subtest scaled score and the individual's mean scaled score for an index. For the WISC-V,…
Descriptors: Children, Intelligence Tests, Scores, Differences
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Farmer, Ryan L.; Kim, Samuel Y. – Psychology in the Schools, 2020
Many prominent intelligence tests (e.g., Wechsler Intelligence Scale for Children, Fifth Edition [WISC-V] and Reynolds Intellectual Abilities Scale, Second Edition [RIAS-2]) offer methods for computing subtest- and composite-level difference scores. This study uses data provided in the technical manual of the WISC-V and RIAS-2 to calculate…
Descriptors: Children, Intelligence Tests, Scores, Test Reliability
Gorney, Kylie – ProQuest LLC, 2023
Aberrant behavior refers to any type of unusual behavior that would not be expected under normal circumstances. In educational and psychological testing, such behaviors have the potential to severely bias the aberrant examinee's test score while also jeopardizing the test scores of countless others. It is therefore crucial that aberrant examinees…
Descriptors: Behavior Problems, Educational Testing, Psychological Testing, Test Bias
Hannah E. Luce – ProQuest LLC, 2023
Young children are assessed to meet federal mandates and inform policy decisions, provide teachers with useful information to make instructional decisions and set reasonable learning goals, and facilitate communication with families. While young children are frequently assessed using whole-child assessments which often yield criterion-referenced…
Descriptors: Scores, Norm Referenced Tests, Test Interpretation, Student Evaluation
Marta Godoy-Giménez; Ángel García-Pérez; Fernando Cañadas; Angeles F. Estévez; Pablo Sayans-Jiménez – Autism: The International Journal of Research and Practice, 2024
The broad autism phenotype is the phenotypic expression of the primary characteristics of autism. However, currently available tests do not agree with the two-domain operationalization of broad autism phenotype or autism, and their internal structure has shown instability across applications. This study presents the Broad Autism…
Descriptors: Autism Spectrum Disorders, Genetics, Diagnostic Tests, Foreign Countries
Lyrica Lucas; Anum Khushal; Robert Mayes; Brian A. Couch; Joseph Dauer – International Journal of Science Education, 2025
Educational reform priorities such as emphasis on quantitative modelling (QM) have positioned undergraduate biology instructors as designers of QM experiences to engage students in authentic science practices that support the development of data-driven and evidence-based reasoning. Yet, little is known about how biology instructors adapt to the…
Descriptors: Undergraduate Students, College Science, Biology, Classroom Observation Techniques
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format