ERIC - Search Results

Publication Date

In 2025	14
Since 2024	34
Since 2021 (last 5 years)	119
Since 2016 (last 10 years)	285
Since 2006 (last 20 years)	575

Descriptor

Scoring	575
Test Reliability	259
Interrater Reliability	189
Test Validity	188
Reliability	170
Foreign Countries	120
Scores	105
Test Construction	103
Test Items	99
Correlation	89
Comparative Analysis	84
Item Response Theory	84
Psychometrics	80
Validity	80
Computer Assisted Testing	79
Evaluation Methods	73
Language Tests	73
Student Evaluation	71
Testing	70
English (Second Language)	63
Second Language Learning	63
Writing Evaluation	60
Elementary School Students	56
Essays	54
Evaluators	54
More ▼

Publication Type

Journal Articles	457
Reports - Research	358
Reports - Evaluative	122
Reports - Descriptive	51
Tests/Questionnaires	34
Numerical/Quantitative Data	26
Dissertations/Theses -…	17
Speeches/Meeting Papers	11
Guides - Non-Classroom	10
Books	8
Information Analyses	7
Collected Works - General	5
Guides - General	4
Opinion Papers	4
Guides - Classroom - Teacher	2
Reports - General	2
More ▼

Education Level

Higher Education	115
Postsecondary Education	97
Elementary Education	94
Secondary Education	86
Middle Schools	47
Elementary Secondary Education	46
Early Childhood Education	44
High Schools	36
Junior High Schools	35
Primary Education	31
Intermediate Grades	30
Grade 4	29
Grade 5	29
Grade 8	28
Grade 3	27
Grade 6	22
Grade 7	22
Kindergarten	13
Preschool Education	13
Grade 2	10
Grade 1	8
Grade 10	8
Grade 11	7
Grade 9	6
Grade 12	5
More ▼

Audience

Administrators	6
Teachers	4
Policymakers	3
Practitioners	3

Location

New York	18
Turkey	17
China	11
California	10
Australia	9
United Kingdom	9
Florida	8
Germany	8
Netherlands	8
United Kingdom (England)	8
Canada	7
Japan	7
Nebraska	7
United States	7
India	6
Pennsylvania	6
Hong Kong	5
Iran	5
Texas	5
South Korea	4
Taiwan	4
Europe	3
Jordan	3
New Mexico	3
Nigeria	3
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…	5
Every Student Succeeds Act…	3
No Child Left Behind Act 2001	3
Race to the Top	2
Elementary and Secondary…	1
Individuals with Disabilities…	1
Individuals with Disabilities…	1

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	1
Meets WWC Standards with or without Reservations	1

Showing 1 to 15 of 575 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Assessing Handwriting in Preschool-Aged Children: Reliability and Internal Consistency of the "Just Write!" Tool

Peer reviewed

Direct link

Bolton, Tiffany; Stevenson, Brittney; Janes, William – Journal of Occupational Therapy, Schools & Early Intervention, 2023

Researchers utilized a cross-sectional secondary analysis of data within an ongoing non-randomized controlled trial study design to establish the reliability and internal consistency of a novel handwriting assessment for preschoolers, the Just Write! (JW), written by the authors. Seventy-eight children from an area preschool participated in the…

Descriptors: Handwriting, Writing Skills, Writing Evaluation, Preschool Children

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Development of a Categorical Scoring Codebook for Entrepreneurial Mindset (EM) Concept Maps

Peer reviewed

Direct link

Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025

Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…

Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation

Detecting Rater Bias in Mixed-Format Assessments

Peer reviewed

Direct link

Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024

Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…

Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

A Data-Driven Approach for the Identification of Features for Automated Feedback on Academic Essays

Peer reviewed

Direct link

Abbas, Mohsin; van Rosmalen, Peter; Kalz, Marco – IEEE Transactions on Learning Technologies, 2023

For predicting and improving the quality of essays, text analytic metrics (surface, syntactic, morphological, and semantic features) can be used to provide formative feedback to the students in higher education. In this study, the goal was to identify a sufficient number of features that exhibit a fair proxy of the scores given by the human raters…

Descriptors: Feedback (Response), Automation, Essays, Scoring

Certified to Evaluate: Exploring Administrator Accuracy and Beliefs in Teacher Observation. Research Report. ETS RR-21-05

Peer reviewed
PDF on ERIC

Download full text

Jones, Nathan; Bell, Courtney; Qi, Yi; Lewis, Jennifer; Kirui, David; Stickler, Leslie; Redash, Amanda – ETS Research Report Series, 2021

The observation systems being used in all 50 states require administrators to learn to accurately and reliably score their teachers' instruction using standardized observation systems. Although the literature on observation systems is growing, relatively few studies have examined the outcomes of trainings focused on developing administrators'…

Descriptors: Observation, Standardized Tests, Teacher Evaluation, Test Reliability

Monitoring Rater Quality in Observational Systems: Issues Due to Unreliable Estimates of Rater Quality

Peer reviewed

Direct link

Mark White; Matt Ronfeldt – Educational Assessment, 2024

Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e.,…

Descriptors: Interrater Reliability, Quality Control, Teacher Effectiveness, Error Patterns

A Study on Psychometric Properties of Creativity Indices

Peer reviewed

Direct link

M. Arda Atakaya; Ugur Sak; M. Bahadir Ayas – Creativity Research Journal, 2024

Scoring in creativity research has been a central problem since creativity became an important issue in psychology and education in the 1950s. The current study examined the psychometric properties of 27 creativity indices derived from summed and averaged scores using 15 scoring methods. Participants included 2802 middle-school students. Data…

Descriptors: Psychometrics, Creativity, Creativity Tests, Scoring

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

ChatGPT as an Automated Essay Scoring Tool in the Writing Classrooms: How It Compares with Human Scoring

Peer reviewed

Direct link

Ngoc My Bui; Jessie S. Barrot – Education and Information Technologies, 2025

With the generative artificial intelligence (AI) tool's remarkable capabilities in understanding and generating meaningful content, intriguing questions have been raised about its potential as an automated essay scoring (AES) system. One such tool is ChatGPT, which is capable of scoring any written work based on predefined criteria. However,…

Descriptors: Artificial Intelligence, Natural Language Processing, Technology Uses in Education, Automation

Developing a Validity Argument Case for Locally Developed University English Preparedness Testing from an Ethical Perspective

Direct link

Lynsey Joohyun Lee – ProQuest LLC, 2021

Reliability and validity are two important topics that have been studied for many decades in the educational measurement field, including discussions of Writing Studies' subfield of writing assessment, since the establishment of the College Entrance Exam Board [CEEB] in 1899 (Huot et al., 2010). In recent years, scholarly conversations of fairness…

Descriptors: Writing Evaluation, Test Validity, Test Reliability, Case Studies

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 39

Journal of Psychoeducational…	47
ETS Research Report Series	32
Language Testing	18
ProQuest LLC	17
Grantee Submission	14
New York State Education…	12
Online Submission	12
Applied Measurement in…	11
Educational and Psychological…	11
Educational Measurement:…	9
Assessment in Education:…	8
Canadian Journal of School…	8
Journal of Speech, Language,…	8
Advances in Health Sciences…	7
Journal of Educational…	7
Journal of Technology,…	7
Educational Assessment	6
International Journal of…	6
Nebraska Department of…	6
Assessment for Effective…	5
Educational Testing Service	5
Eurasian Journal of…	5
Journal of Science Education…	5
Language Assessment Quarterly	5
Partnership for Assessment of…	5
More ▼

Attali, Yigal	8
Schoen, Robert C.	7
McCrimmon, Adam W.	6
Casabianca, Jodi M.	4
Espin, Christine A.	4
Williamson, David M.	4
Yang, Xiaotong	4
Zhang, Mo	4
Anderson, Daniel	3
Bauduin, Charity	3
Breyer, F. Jay	3
Coniam, David	3
Haberman, Shelby J.	3
Johnson, Evelyn S.	3
Knight, Jennifer K.	3
Lembke, Erica S.	3
McMaster, Kristen L.	3
Moylan, Laura A.	3
Paek, Insu	3
Poch, Apryl L.	3
Pollitt, Alastair	3
Ricker-Pedley, Kathryn L.	3
Stefanie A. Wind	3
Wendler, Cathy	3
Xi, Xiaoming	3
More ▼

Test of English as a Foreign…	17
Graduate Record Examinations	9
National Assessment of…	7
Wechsler Intelligence Scale…	7
ACT Assessment	5
Wechsler Individual…	5
Woodcock Johnson Tests of…	5
Peabody Picture Vocabulary…	4
SAT (College Admission Test)	4
Clinical Evaluation of…	3
International English…	3
Kaufman Test of Educational…	3
Raven Progressive Matrices	3
Wechsler Adult Intelligence…	3
Beery Developmental Test of…	2
Dynamic Indicators of Basic…	2
Kaufman Assessment Battery…	2
New York State Regents…	2
Praxis Series	2
Program for International…	2
Test of English for…	2
Test of Nonverbal Intelligence	2
United States Medical…	2
Wechsler Preschool and…	2
Woodcock Reading Mastery Test	2
More ▼