Publication Date
In 2025 | 2 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 26 |
Since 2006 (last 20 years) | 64 |
Descriptor
Evaluation Methods | 99 |
Scores | 99 |
Test Reliability | 99 |
Test Validity | 65 |
Student Evaluation | 37 |
Test Construction | 23 |
Foreign Countries | 20 |
Psychometrics | 16 |
Standardized Tests | 15 |
Statistical Analysis | 14 |
Elementary Secondary Education | 13 |
More ▼ |
Source
Author
Erford, Bradley T. | 3 |
Booker, Kevin | 2 |
Bruch, Julie | 2 |
Gill, Brian | 2 |
Koretz, Daniel | 2 |
Abedi, Jamal | 1 |
Abu-Hamour, Bashir | 1 |
Algozzine, Bob | 1 |
Algozzine, Kate | 1 |
Allen, Abigail | 1 |
Allen, Patricia J. | 1 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 3 |
Researchers | 3 |
Teachers | 2 |
Administrators | 1 |
Location
United Kingdom | 4 |
United States | 4 |
Australia | 2 |
China | 2 |
Florida | 2 |
Germany | 2 |
Illinois | 2 |
Kenya | 2 |
Minnesota | 2 |
Netherlands | 2 |
Norway | 2 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 2 |
Elementary and Secondary… | 1 |
Every Student Succeeds Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021
For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…
Descriptors: Scores, Regression (Statistics), Demography, Data
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Emre Zengin; Yasemin Karal – International Journal of Assessment Tools in Education, 2024
This study was carried out to develop a test to assess algorithmic thinking skills. To this end, the twelve steps suggested by Downing (2006) were adopted. Throughout the test development, 24 middle school sixth-grade students and eight experts in different areas took part as needed in the tasks on the project. The test was given to 252 students…
Descriptors: Grade 6, Algorithms, Thinking Skills, Evaluation Methods
Baraldi Cunha, Andrea; Babik, Iryna; Koziol, Natalie A.; Hsu, Lin-Ya; Nord, Jayden; Harbourne, Regina T.; Westcott-McCoy, Sarah; Dusing, Stacey C.; Bovaird, James A.; Lobo, Michele A. – Grantee Submission, 2021
Purpose: To evaluate the validity, reliability, and sensitivity of the novel Means-End Problem-Solving Assessment Tool (MEPSAT). Methods: Children with typical development and those with motor delay were assessed throughout the first 2 years of life using the MEPSAT. MEPSAT scores were validated against the cognitive and motor subscales of the…
Descriptors: Problem Solving, Early Intervention, Evaluation Methods, Motor Development
Little, Todd D.; Chang, Rong; Gorrall, Britt K.; Waggenspack, Luke; Fukuda, Eriko; Allen, Patricia J.; Noam, Gil G. – International Journal of Behavioral Development, 2020
We revisit the merits of the retrospective pretest-posttest (RPP) design for repeated-measures research. The underutilized RPP method asks respondents to rate survey items twice during the same posttest measurement occasion from two specific frames of reference: "now" and "then." Individuals first report their current attitudes…
Descriptors: Pretesting, Alternative Assessment, Program Evaluation, Evaluation Methods
Fu, Jianbin; Qu, Yanxuan – ETS Research Report Series, 2018
Various subscore estimation methods that use auxiliary information to improve subscore accuracy and stability have been developed. This report provides a review of various subscore estimation methods described in the literature. The methodology of each method is described, then research studies on these subscore estimation methods are summarized.…
Descriptors: Scores, Evaluation Methods, Item Response Theory, Test Items
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Morphew, Jason W.; Mestre, Jose P.; Kang, Hyeon-Ah; Chang, Hua-Hua; Fabry, Gregory – Physical Review Physics Education Research, 2018
Prior research has established that students often underprepare for midterm examinations yet remain overconfident in their proficiency. Research concerning the testing effect has demonstrated that utilizing testing as a study strategy leads to higher performance and more accurate confidence compared to more common study strategies such as…
Descriptors: Computer Assisted Testing, Physics, Science Instruction, Introductory Courses
Wakabayashi, Tomoko; Claxton, Jill; Smith, Everett V., Jr. – Journal of Psychoeducational Assessment, 2019
The Child Observation Record (COR), initially developed in 1993 by HighScope Educational Research Foundation, is an observation-based instrument that provides systematic assessment of young children's knowledge and abilities in all major areas of development. Teachers or caregivers spend a few minutes each day writing brief notes or…
Descriptors: Observation, Evaluation Methods, Early Childhood Education, Kindergarten
Daniel Rodriguez-Segura; Beth E. Schueler – Annenberg Institute for School Reform at Brown University, 2022
School closures induced by COVID-19 placed heightened emphasis on alternative ways to measure student learning besides in-person exams. We leverage the administration of phone-based assessments (PBAs) measuring numeracy and literacy for primary school children in Kenya, along with in-person standardized tests administered to the same students…
Descriptors: Foreign Countries, School Closing, COVID-19, Pandemics
Pennell, Adam – ProQuest LLC, 2019
This dissertation consists of three studies which examined multidimensional balance in youth (= 21 years; Individuals with Disabilities Education Act, 2004) with visual impairments (VIs) using the Brief-Balance Evaluation Systems Test (Brief-BESTest). These studies have the potential to inform (adapted) physical education curricula and…
Descriptors: Psychomotor Skills, Youth, Visual Impairments, Human Posture
Xiao, Yang; Fritchman, Joseph C.; Bao, Jacqueline Y.; Nie, Ying; Han, Jing; Xiong, Jianwen; Xiao, Hua; Bao, Lei – Physical Review Physics Education Research, 2019
In physics education research (PER), concept inventories (CIs) have become standard instruments for assessing students' learning throughout instruction. To promote widespread use of concept inventories, previous studies have developed an approach to split a full length CI into short versions of CIs. This research extends the existing method to…
Descriptors: Physics, Science Instruction, Energy, Magnets
Dumas, Denis G.; McNeish, Daniel M. – Educational Researcher, 2018
Dynamic measurement modeling (DMM) has been shown to improve the consequential validity of longitudinal mathematics assessment in the Early Childhood Longitudinal Study-Kindergarten (ECLS-K) database. Here, the authors demonstrate the capability of DMM to similarly improve the consequential validity of ECLS-K reading assessment through the…
Descriptors: Measurement Techniques, Student Evaluation, Alternative Assessment, Evaluation Methods
Badger, Julia R.; Mellanby, Jane – British Journal of Educational Psychology, 2018
Background: School attainment tests and Cognitive Abilities Tests are used in the United Kingdom to set targets for educational outcome. Whilst these are good predictors, they depend not only on basic ability but also on learnt knowledge and skills, such as reading. Method and Aims: VESPARCH is an online group test of verbal and spatial reasoning,…
Descriptors: Foreign Countries, Intelligence Tests, Verbal Ability, Spatial Ability