Publication Date
In 2025 | 4 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 32 |
Since 2006 (last 20 years) | 45 |
Descriptor
Scores | 76 |
Scoring | 76 |
Test Reliability | 76 |
Test Validity | 41 |
Test Items | 22 |
Test Construction | 21 |
Testing | 17 |
Correlation | 16 |
Item Response Theory | 16 |
Test Bias | 16 |
Item Analysis | 12 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 11 |
Postsecondary Education | 10 |
Secondary Education | 9 |
Elementary Education | 8 |
High Schools | 7 |
Middle Schools | 7 |
Junior High Schools | 6 |
Grade 7 | 5 |
Early Childhood Education | 4 |
Grade 3 | 4 |
Grade 4 | 4 |
More ▼ |
Audience
Researchers | 3 |
Practitioners | 2 |
Parents | 1 |
Location
Vermont | 4 |
California | 2 |
Turkey | 2 |
Alabama | 1 |
Idaho | 1 |
Iran | 1 |
Israel | 1 |
Nebraska | 1 |
New Mexico | 1 |
New York | 1 |
North Dakota | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
M. Arda Atakaya; Ugur Sak; M. Bahadir Ayas – Creativity Research Journal, 2024
Scoring in creativity research has been a central problem since creativity became an important issue in psychology and education in the 1950s. The current study examined the psychometric properties of 27 creativity indices derived from summed and averaged scores using 15 scoring methods. Participants included 2802 middle-school students. Data…
Descriptors: Psychometrics, Creativity, Creativity Tests, Scoring
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Jeff Allen; Ty Cruce – ACT Education Corp., 2025
This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…
Descriptors: College Entrance Examinations, Testing, Change, Scores
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Forthmann, Boris; Paek, Sue Hyeon; Dumas, Denis; Barbot, Baptiste; Holling, Heinz – British Journal of Educational Psychology, 2020
Background: The originality of divergent thinking (DT) production is one of the most critical indicators of creative potential. It is commonly scored using the statistical infrequency of responses relative to all responses provided in a given sample. Aims: Response frequency estimates vary in terms of measurement precision. This issue has been…
Descriptors: Creative Thinking, Creativity Tests, Item Response Theory, Scores
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Reliability is the consistency of a set of scores that are designed to measure the same thing. Reliability is a statistical property of scores that must be demonstrated rather than assumed.
Descriptors: Scores, Measurement, Test Reliability, Error Patterns
Ward, Brodie; Thornton, Ashleigh; Lay, Brendan; Chen, Nigel; Rosenberg, Michael – Research Quarterly for Exercise and Sport, 2020
Purpose: Fundamental movement skill (FMS) assessors in education environments rely upon real-time FMS assessment; however, the recognition of individual proficiency criteria during real-time process-oriented FMS assessment may be problematic. Few studies consider the accuracy of identifying individual proficiency criteria in process-oriented FMS…
Descriptors: Physical Education, Psychomotor Skills, Motor Development, Performance Tests
LaFlair, Geoffrey T.; Langenfeld, Thomas; Baig, Basim; Horie, André Kenji; Attali, Yigal; von Davier, Alina A. – Journal of Computer Assisted Learning, 2022
Background: Digital-first assessments leverage the affordances of technology in all elements of the assessment process--from design and development to score reporting and evaluation to create test taker-centric assessments. Objectives: The goal of this paper is to describe the engineering, machine learning, and psychometric processes and…
Descriptors: Computer Assisted Testing, Affordances, Scoring, Engineering
Lenz, A. Stephen; Ault, Haley; Balkin, Richard S.; Barrio Minton, Casey; Erford, Bradley T.; Hays, Danica G.; Kim, Bryan S. K.; Li, Chi – Measurement and Evaluation in Counseling and Development, 2022
In April 2021, The Association for Assessment and Research in Counseling Executive Council commissioned a time-referenced task group to revise the Responsibilities of Users of Standardized Tests (RUST) Statement (3rd edition) published by the Association for Assessment in Counseling (AAC) in 2003. The task group developed a work plan to implement…
Descriptors: Responsibility, Standardized Tests, Counselor Training, Ethics
Myszkowski, Nils – Journal of Intelligence, 2020
Raven's Standard Progressive Matrices (Raven 1941) is a widely used 60-item long measure of general mental ability. It was recently suggested that, for situations where taking this test is too time consuming, a shorter version, comprised of only the last series of the Standard Progressive Matrices (Myszkowski and Storme 2018) could be used, while…
Descriptors: Intelligence Tests, Psychometrics, Nonparametric Statistics, Item Response Theory
Slepkov, Aaron D.; Godfrey, Alan T. K. – Applied Measurement in Education, 2019
The answer-until-correct (AUC) method of multiple-choice (MC) testing involves test respondents making selections until the keyed answer is identified. Despite attendant benefits that include improved learning, broad student adoption, and facile administration of partial credit, the use of AUC methods for classroom testing has been extremely…
Descriptors: Multiple Choice Tests, Test Items, Test Reliability, Scores
Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019
This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…
Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring
Safak, Pinar; Cakmak, Salih; Karakoc, Tamer; Aydin O'Dwyer, Pinar – European Journal of Educational Research, 2021
This study aimed to develop a valid and reliable instrument that measures the functional vision of students with low vision. Thus, an assessment tool and performance activities were developed for three vision skill groups (near vision skills, distance vision skills, and visual field) that include functional vision skills. The universe was 1485…
Descriptors: Foreign Countries, Vision Tests, Diagnostic Tests, Vision