ERIC - Search Results

Publication Date

In 2025	5
Since 2024	6
Since 2021 (last 5 years)	28
Since 2016 (last 10 years)	65
Since 2006 (last 20 years)	150

Descriptor

Evaluation Methods	215
Scores	215
Test Reliability	100
Reliability	91
Test Validity	67
Student Evaluation	53
Validity	44
Interrater Reliability	41
Foreign Countries	40
Correlation	39
Test Construction	29
Psychometrics	24
Scoring	24
Measurement Techniques	23
Statistical Analysis	20
Test Items	20
Comparative Analysis	19
Elementary Secondary Education	19
Rating Scales	19
Academic Achievement	18
Factor Analysis	18
Higher Education	17
Measures (Individuals)	17
Decision Making	16
Educational Assessment	16
More ▼

Publication Type

Journal Articles	157
Reports - Research	140
Reports - Evaluative	29
Speeches/Meeting Papers	20
Reports - Descriptive	18
Tests/Questionnaires	14
Dissertations/Theses -…	12
Information Analyses	7
Opinion Papers	5
Guides - Non-Classroom	2
Reports - General	2
Collected Works - Proceedings	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - Classroom - Teacher	1
Guides - General	1
Reference Materials -…	1
More ▼

Education Level

Higher Education	38
Postsecondary Education	31
Elementary Education	30
Early Childhood Education	14
Secondary Education	14
High Schools	11
Elementary Secondary Education	10
Middle Schools	9
Primary Education	8
Grade 6	7
Grade 1	5
Grade 3	5
Preschool Education	5
Kindergarten	4
Grade 2	3
Grade 8	3
Intermediate Grades	3
Junior High Schools	3
Adult Basic Education	2
Adult Education	2
Grade 4	2
Grade 12	1
Grade 5	1
More ▼

Audience

Researchers	6
Practitioners	3
Teachers	2
Administrators	1

Location

Australia	7
China	4
Florida	4
United Kingdom	4
United States	4
Illinois	3
Israel	3
Vermont	3
Connecticut	2
Egypt	2
Germany	2
India	2
Kenya	2
Minnesota	2
Netherlands	2
New Jersey	2
Nigeria	2
Norway	2
Pakistan	2
Pennsylvania	2
Portugal	2
South Korea	2
Spain	2
Texas	2
Turkey	2
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	2
Elementary and Secondary…	1
Every Student Succeeds Act…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 215 results Save | Export

Benchmark Rating Procedure, Best of Both Worlds? Comparing Procedures to Rate Text Quality in a Reliable and Valid Manner

Peer reviewed

Direct link

Bouwer, Renske; Koster, Monica; van den Bergh, Huub – Assessment in Education: Principles, Policy & Practice, 2023

Assessing students' writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to…

Descriptors: Benchmarking, Writing Evaluation, Evaluation Methods, Elementary School Students

A Novel Means-End Problem-Solving Assessment Tool for Early Intervention: Evaluation of Validity, Reliability, and Sensitivity

Peer reviewed
PDF on ERIC

Download full text

Direct link

Baraldi Cunha, Andrea; Babik, Iryna; Koziol, Natalie A.; Hsu, Lin-Ya; Nord, Jayden; Harbourne, Regina T.; Westcott-McCoy, Sarah; Dusing, Stacey C.; Bovaird, James A.; Lobo, Michele A. – Grantee Submission, 2021

Purpose: To evaluate the validity, reliability, and sensitivity of the novel Means-End Problem-Solving Assessment Tool (MEPSAT). Methods: Children with typical development and those with motor delay were assessed throughout the first 2 years of life using the MEPSAT. MEPSAT scores were validated against the cognitive and motor subscales of the…

Descriptors: Problem Solving, Early Intervention, Evaluation Methods, Motor Development

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee-Specific Demographic Data

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021

For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…

Descriptors: Scores, Regression (Statistics), Demography, Data

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment

Peer reviewed

Direct link

Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022

The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…

Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Different Methods for Assessing Pre-Service Teachers' Instruction: Why Measures Matter. EdWorkingPaper No. 23-862

Download full text

Arielle Boguslav; Julie Cohen – Annenberg Institute for School Reform at Brown University, 2023

Teacher preparation programs are increasingly expected to use data on pre-service teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs' instructional skills, including rater standards…

Descriptors: Preservice Teachers, Student Evaluation, Evaluation Methods, Preservice Teacher Education

A Computationally Simple Method for Estimating Decision Consistency

Peer reviewed

Direct link

Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021

Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…

Descriptors: Decision Making, Reliability, Classification, Scores

Linear and Nonlinear Indices of Score Accuracy and Item Effectiveness for Measures That Contain Locally Dependent Items

Peer reviewed

Direct link

Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025

The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…

Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Proposed Metrics for Summarizing Student Evaluation of Teaching Data from Balanced Likert Scale Surveys

Peer reviewed

Direct link

Abdel Azim Zumrawi; Leah P. Macfadyen – Cogent Education, 2023

Student Evaluations of Teaching (SETs) gather crucial feedback on student experiences of teaching and learning and have been used for decades to evaluate the quality of teaching and student experience of instruction. In this paper, we make the case for an important improvement to the analysis of SET data that can further refine its interpretation.…

Descriptors: Likert Scales, Student Evaluation of Teacher Performance, Student Attitudes, Reliability

Developing a Game-Based Test to Assess Middle School Sixth-Grade Students' Algorithmic Thinking Skills

Peer reviewed
PDF on ERIC

Download full text

Emre Zengin; Yasemin Karal – International Journal of Assessment Tools in Education, 2024

This study was carried out to develop a test to assess algorithmic thinking skills. To this end, the twelve steps suggested by Downing (2006) were adopted. Throughout the test development, 24 middle school sixth-grade students and eight experts in different areas took part as needed in the tasks on the project. The test was given to 252 students…

Descriptors: Grade 6, Algorithms, Thinking Skills, Evaluation Methods

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 15

ProQuest LLC	11
Educational and Psychological…	10
Assessment for Effective…	5
Journal of Educational…	5
ETS Research Report Series	4
Grantee Submission	4
Journal of Psychoeducational…	4
Regional Educational…	4
Assessment in Education:…	3
International Journal of…	3
Language, Speech, and Hearing…	3
Multivariate Behavioral…	3
Online Submission	3
Psychology in the Schools	3
Advances in Health Sciences…	2
Annenberg Institute for…	2
Educational Assessment	2
Educational Researcher	2
English Language Teaching	2
International Educational…	2
International Journal of…	2
Journal of Chemical Education	2
Journal of Deaf Studies and…	2
Language Learning	2
Language Testing	2
More ▼

Thompson, Bruce	4
Cook, Colleen	3
Erford, Bradley T.	3
Gill, Brian	3
Koretz, Daniel	3
Snyder, Patricia A.	3
Algina, James	2
Booker, Kevin	2
Bruch, Julie	2
Deno, Stanley L.	2
Friedman, Greg	2
Lembke, Erica S.	2
McLaughlin, Tara W.	2
McMaster, Kristen L.	2
Michaels, Hillary	2
Ochieng, Charles	2
Raykov, Tenko	2
Yen, Shu Jing	2
A. C., John	1
Abdel Azim Zumrawi	1
Abedi, Jamal	1
Abu-Hamour, Bashir	1
Algozzine, Bob	1
Algozzine, Kate	1
More ▼

ACT Assessment	4
Bayley Scales of Infant…	3
Dynamic Indicators of Basic…	2
Iowa Tests of Basic Skills	2
National Assessment of…	2
Preliminary Scholastic…	2
Stanford Achievement Tests	2
Teacher Rating Scale	2
Wechsler Intelligence Scale…	2
Woodcock Johnson Tests of…	2
Aberrant Behavior Checklist	1
Beck Anxiety Inventory	1
Behavior Assessment System…	1
Center for Epidemiologic…	1
Child Behavior Checklist	1
Classroom Assessment Scoring…	1
College Level Examination…	1
Collegiate Assessment of…	1
Comprehensive Tests of Basic…	1
Early Childhood Longitudinal…	1
Florida Comprehensive…	1
Graduate Management Admission…	1
Mayer Salovey Caruso…	1
Measures of Academic Progress	1
National Assessment of Adult…	1
More ▼