ERIC - Search Results

Publication Date

In 2025	3
Since 2024	4
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	27
Since 2006 (last 20 years)	65

Descriptor

Evaluation Methods	100
Scores	100
Test Reliability	100
Test Validity	65
Student Evaluation	38
Test Construction	23
Foreign Countries	20
Psychometrics	16
Standardized Tests	15
Statistical Analysis	14
Elementary Secondary Education	13
Correlation	12
Academic Achievement	11
Measurement Techniques	10
Tests	10
Higher Education	9
Interrater Reliability	9
Test Items	9
Achievement Tests	8
Computer Assisted Testing	8
Factor Analysis	8
Rating Scales	8
State Standards	8
Testing	8
Comparative Analysis	7
More ▼

Publication Type

Journal Articles	75
Reports - Research	57
Reports - Evaluative	15
Reports - Descriptive	11
Tests/Questionnaires	9
Opinion Papers	5
Speeches/Meeting Papers	5
Information Analyses	3
Dissertations/Theses -…	2
Guides - Non-Classroom	2
Reports - General	2
Collected Works - Proceedings	1
Guides - Classroom - Teacher	1
Guides - General	1
Reference Materials -…	1
More ▼

Audience

Practitioners	3
Researchers	3
Teachers	2
Administrators	1

Location

United Kingdom	4
United States	4
Australia	2
China	2
Florida	2
Germany	2
Illinois	2
Kenya	2
Minnesota	2
Netherlands	2
Norway	2
Spain	2
Turkey	2
Vermont	2
Alabama	1
Arizona	1
Asia	1
Brazil	1
Colorado	1
Connecticut	1
Delaware	1
Denmark	1
Egypt	1
Estonia	1
Greece	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	2
Elementary and Secondary…	1
Every Student Succeeds Act…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 100 results Save | Export

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee-Specific Demographic Data

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021

For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…

Descriptors: Scores, Regression (Statistics), Demography, Data

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Developing a Game-Based Test to Assess Middle School Sixth-Grade Students' Algorithmic Thinking Skills

Peer reviewed
PDF on ERIC

Download full text

Emre Zengin; Yasemin Karal – International Journal of Assessment Tools in Education, 2024

This study was carried out to develop a test to assess algorithmic thinking skills. To this end, the twelve steps suggested by Downing (2006) were adopted. Throughout the test development, 24 middle school sixth-grade students and eight experts in different areas took part as needed in the tasks on the project. The test was given to 252 students…

Descriptors: Grade 6, Algorithms, Thinking Skills, Evaluation Methods

A Novel Means-End Problem-Solving Assessment Tool for Early Intervention: Evaluation of Validity, Reliability, and Sensitivity

Peer reviewed
PDF on ERIC

Download full text

Direct link

Baraldi Cunha, Andrea; Babik, Iryna; Koziol, Natalie A.; Hsu, Lin-Ya; Nord, Jayden; Harbourne, Regina T.; Westcott-McCoy, Sarah; Dusing, Stacey C.; Bovaird, James A.; Lobo, Michele A. – Grantee Submission, 2021

Purpose: To evaluate the validity, reliability, and sensitivity of the novel Means-End Problem-Solving Assessment Tool (MEPSAT). Methods: Children with typical development and those with motor delay were assessed throughout the first 2 years of life using the MEPSAT. MEPSAT scores were validated against the cognitive and motor subscales of the…

Descriptors: Problem Solving, Early Intervention, Evaluation Methods, Motor Development

The Retrospective Pretest-Posttest Design Redux: On Its Validity as an Alternative to Traditional Pretest-Posttest Measurement

Peer reviewed

Direct link

Little, Todd D.; Chang, Rong; Gorrall, Britt K.; Waggenspack, Luke; Fukuda, Eriko; Allen, Patricia J.; Noam, Gil G. – International Journal of Behavioral Development, 2020

We revisit the merits of the retrospective pretest-posttest (RPP) design for repeated-measures research. The underutilized RPP method asks respondents to rate survey items twice during the same posttest measurement occasion from two specific frames of reference: "now" and "then." Individuals first report their current attitudes…

Descriptors: Pretesting, Alternative Assessment, Program Evaluation, Evaluation Methods

A Review of Subscore Estimation Methods. ETS RR-18-17

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin; Qu, Yanxuan – ETS Research Report Series, 2018

Various subscore estimation methods that use auxiliary information to improve subscore accuracy and stability have been developed. This report provides a review of various subscore estimation methods described in the literature. The methodology of each method is described, then research studies on these subscore estimation methods are summarized.…

Descriptors: Scores, Evaluation Methods, Item Response Theory, Test Items

Exploration of Factors Affecting the Added Value of Test Subscores

Peer reviewed

Direct link

Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019

Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…

Descriptors: Value Added Models, Scores, Sample Size, Correlation

Using Computer Adaptive Testing to Assess Physics Proficiency and Improve Exam Performance in an Introductory Physics Course

Peer reviewed

Direct link

Morphew, Jason W.; Mestre, Jose P.; Kang, Hyeon-Ah; Chang, Hua-Hua; Fabry, Gregory – Physical Review Physics Education Research, 2018

Prior research has established that students often underprepare for midterm examinations yet remain overconfident in their proficiency. Research concerning the testing effect has demonstrated that utilizing testing as a study strategy leads to higher performance and more accurate confidence compared to more common study strategies such as…

Descriptors: Computer Assisted Testing, Physics, Science Instruction, Introductory Courses

Validation of a Revised Observation-Based Assessment Tool for Children Birth through Kindergarten: The COR Advantage

Peer reviewed

Direct link

Wakabayashi, Tomoko; Claxton, Jill; Smith, Everett V., Jr. – Journal of Psychoeducational Assessment, 2019

The Child Observation Record (COR), initially developed in 1993 by HighScope Educational Research Foundation, is an observation-based instrument that provides systematic assessment of young children's knowledge and abilities in all major areas of development. Teachers or caregivers spend a few minutes each day writing brief notes or…

Descriptors: Observation, Evaluation Methods, Early Childhood Education, Kindergarten

Can Learning Be Measured by Phone? Evidence from Kenya. EdWorkingPaper No. 22-517

Download full text

Daniel Rodriguez-Segura; Beth E. Schueler – Annenberg Institute for School Reform at Brown University, 2022

School closures induced by COVID-19 placed heightened emphasis on alternative ways to measure student learning besides in-person exams. We leverage the administration of phone-based assessments (PBAs) measuring numeracy and literacy for primary school children in Kenya, along with in-person standardized tests administered to the same students…

Descriptors: Foreign Countries, School Closing, COVID-19, Pandemics

Multidimensional Balance in Youth with Visual Impairments

Direct link

Pennell, Adam – ProQuest LLC, 2019

This dissertation consists of three studies which examined multidimensional balance in youth (= 21 years; Individuals with Disabilities Education Act, 2004) with visual impairments (VIs) using the Brief-Balance Evaluation Systems Test (Brief-BESTest). These studies have the potential to inform (adapted) physical education curricula and…

Descriptors: Psychomotor Skills, Youth, Visual Impairments, Human Posture

Linking and Comparing Short and Full-Length Concept Inventories of Electricity and Magnetism Using Item Response Theory

Peer reviewed

Direct link

Xiao, Yang; Fritchman, Joseph C.; Bao, Jacqueline Y.; Nie, Ying; Han, Jing; Xiong, Jianwen; Xiao, Hua; Bao, Lei – Physical Review Physics Education Research, 2019

In physics education research (PER), concept inventories (CIs) have become standard instruments for assessing students' learning throughout instruction. To promote widespread use of concept inventories, previous studies have developed an approach to split a full length CI into short versions of CIs. This research extends the existing method to…

Descriptors: Physics, Science Instruction, Energy, Magnets

Increasing the Consequential Validity of Reading Assessment Using Dynamic Measurement Modeling: A Comment on Dumas and McNeish (2017)

Peer reviewed

Direct link

Dumas, Denis G.; McNeish, Daniel M. – Educational Researcher, 2018

Dynamic measurement modeling (DMM) has been shown to improve the consequential validity of longitudinal mathematics assessment in the Early Childhood Longitudinal Study-Kindergarten (ECLS-K) database. Here, the authors demonstrate the capability of DMM to similarly improve the consequential validity of ECLS-K reading assessment through the…

Descriptors: Measurement Techniques, Student Evaluation, Alternative Assessment, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7

Educational and Psychological…	5
ETS Research Report Series	4
Journal of Educational…	4
Assessment for Effective…	3
Journal of Psychoeducational…	3
Regional Educational…	3
Educational Assessment	2
Educational Researcher	2
International Journal of…	2
Journal of Chemical Education	2
Measurement and Evaluation in…	2
Physical Review Physics…	2
ProQuest LLC	2
Psychology in the Schools	2
Advances in Health Sciences…	1
Annenberg Institute for…	1
Applied Measurement in…	1
B.C. Journal of Special…	1
British Journal of…	1
Carnegie Foundation for the…	1
Curriculum and Teaching…	1
Diagnostique	1
Educational Psychology in…	1
Elementary School Journal	1
Frontiers of Education in…	1
More ▼

Erford, Bradley T.	3
Booker, Kevin	2
Bruch, Julie	2
Gill, Brian	2
Koretz, Daniel	2
Abedi, Jamal	1
Abu-Hamour, Bashir	1
Algozzine, Bob	1
Algozzine, Kate	1
Allen, Abigail	1
Allen, Patricia J.	1
Amery D. Wu	1
Amrein-Beardsley, Audrey	1
Anderson, Lorin W.	1
Arjoon, Janelle A.	1
August, Diane	1
Awomolo, Ademola	1
Babik, Iryna	1
Badger, Julia R.	1
Bao, Jacqueline Y.	1
Bao, Lei	1
Baraldi Cunha, Andrea	1
Bardhoshi, Gerta	1
Bennett, Randy E.	1
More ▼

Elementary Education	18
Higher Education	16
Postsecondary Education	14
Early Childhood Education	9
Secondary Education	9
Elementary Secondary Education	7
Middle Schools	7
High Schools	6
Primary Education	6
Grade 3	4
Grade 6	4
Intermediate Grades	3
Junior High Schools	3
Adult Basic Education	2
Adult Education	2
Grade 2	2
Kindergarten	2
Preschool Education	2
Grade 1	1
Grade 5	1
Grade 8	1
More ▼

ACT Assessment	4
Bayley Scales of Infant…	2
Dynamic Indicators of Basic…	2
Iowa Tests of Basic Skills	2
Preliminary Scholastic…	2
Stanford Achievement Tests	2
Teacher Rating Scale	2
Woodcock Johnson Tests of…	2
Beck Anxiety Inventory	1
College Level Examination…	1
Collegiate Assessment of…	1
Comprehensive Tests of Basic…	1
Early Childhood Longitudinal…	1
Graduate Management Admission…	1
Measures of Academic Progress	1
National Assessment of Adult…	1
National Assessment of…	1
Peabody Individual…	1
Pennsylvania Educational…	1
Program for International…	1
State Trait Anxiety Inventory	1
Test of Adult Basic Education	1
Wechsler Individual…	1
Wide Range Achievement Test	1
Woodcock Reading Mastery Test	1
More ▼