ERIC - Search Results

Publication Date

In 2026	0
Since 2025	8
Since 2022 (last 5 years)	17
Since 2017 (last 10 years)	29
Since 2007 (last 20 years)	51

Descriptor

Error of Measurement	90
Evaluation Methods	90
Test Reliability	37
Reliability	36
Interrater Reliability	26
Measurement Techniques	17
Student Evaluation	16
Validity	13
Test Validity	12
Psychometrics	11
Research Methodology	11
Models	10
Rating Scales	10
Correlation	9
Foreign Countries	9
Scores	9
Performance Based Assessment	8
Data Analysis	7
Evaluation Research	7
Generalizability Theory	7
Goodness of Fit	7
Higher Education	7
Item Analysis	7
Measures (Individuals)	7
Scoring	7
More ▼

Publication Type

Journal Articles	64
Reports - Research	46
Reports - Evaluative	22
Reports - Descriptive	11
Speeches/Meeting Papers	7
Dissertations/Theses -…	3
Opinion Papers	3
Information Analyses	2
Books	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1
Tests/Questionnaires	1
More ▼

Education Level

Elementary Secondary Education	9
Higher Education	8
Elementary Education	6
Postsecondary Education	5
Grade 3	3
Grade 4	3
Secondary Education	3
Adult Education	2
Early Childhood Education	2
Intermediate Grades	2
Primary Education	2
Grade 5	1
High Schools	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Researchers	5
Students	1

Location

United States	2
Australia	1
California	1
Florida	1
Illinois	1
Louisiana	1
Nevada	1
New Jersey	1
New York	1
North Carolina	1
Ohio	1
Oklahoma	1
Pennsylvania	1
Portugal	1
Rhode Island	1
Spain	1
Taiwan	1
Tennessee	1
Texas	1
United Kingdom (England)	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	1
Race to the Top	1

Assessments and Surveys

National Assessment of…	1
Praxis Series	1
Program for International…	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 90 results Save | Export

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

An R Package for Optimizing the Composite Reliability in Multivariate Nested Designs

Peer reviewed
PDF on ERIC

Download full text

Joyce M. W. Moonen-van Loon; Jeroen Donkers – Practical Assessment, Research & Evaluation, 2025

The reliability of assessment tools is critical for accurately monitoring student performance in various educational contexts. When multiple assessments are combined to form an overall evaluation, each assessment serves as a data point contributing to the student's performance within a broader educational framework. Determining composite…

Descriptors: Programming Languages, Reliability, Evaluation Methods, Student Evaluation

Inter-Rater Reliability Methods in Qualitative Case Study Research

Peer reviewed

Direct link

Rosanna Cole – Sociological Methods & Research, 2024

The use of inter-rater reliability (IRR) methods may provide an opportunity to improve the transparency and consistency of qualitative case study data analysis in terms of the rigor of how codes and constructs have been developed from the raw data. Few articles on qualitative research methods in the literature conduct IRR assessments or neglect to…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Research Methodology

Exploring Rating Quality in the Context of High-Stakes Rater-Mediated Educational Assessments

Direct link

Wenjing Guo – ProQuest LLC, 2021

Constructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees' work. The judgment of human…

Descriptors: National Competency Tests, Responses, Interrater Reliability, Error of Measurement

Enhancing Model Fit Evaluation in SEM: Practical Tips for Optimizing Chi-Square Tests

Peer reviewed

Direct link

Bang Quan Zheng; Peter M. Bentler – Structural Equation Modeling: A Multidisciplinary Journal, 2025

This paper aims to advocate for a balanced approach to model fit evaluation in structural equation modeling (SEM). The ongoing debate surrounding chi-square test statistics and fit indices has been characterized by ambiguity and controversy. Despite the acknowledged limitations of relying solely on the chi-square test, its careful application can…

Descriptors: Monte Carlo Methods, Structural Equation Models, Goodness of Fit, Robustness (Statistics)

Quality-of-Life Measurement in Randomised Controlled Trials of Mental Health Interventions for Autistic Adults: A Systematic Review

Peer reviewed

Direct link

Amanda Timmerman; Vasiliki Totsika; Valerie Lye; Laura Crane; Audrey Linden; Elizabeth Pellicano – Autism: The International Journal of Research and Practice, 2025

Autistic people are more likely to have co-occurring mental health conditions compared to the general population, and mental health interventions have been identified as a top research priority by autistic people and the wider autism community. Autistic adults have also communicated that quality of life is the outcome that matters most to them in…

Descriptors: Adults, Autism Spectrum Disorders, Quality of Life, Randomized Controlled Trials

Challenges in Using Parent-Reported Bed and Wake Times for Actigraphy Scoring in Rett-Related Syndromes

Peer reviewed

Direct link

Breanne J. Byiers; Alyssa M. Merbler; Chantel C. Burkitt; Frank J. Symons – American Journal on Intellectual and Developmental Disabilities, 2025

Sleep problems are common in Rett syndrome and other neurogenetic syndromes. Actigraphy is a cost-effective, objective method for measuring sleep. Current guidelines require caregiver-reported bed and wake times to facilitate actigraphy data scoring. The current study examined missingness and consistency of caregiver-reported bed and wake times…

Descriptors: Sleep, Neurodevelopmental Disorders, Psychomotor Skills, Genetic Disorders

Stabilizing School Performance Indicators in New Jersey to Reduce the Effect of Random Error. Appendixes. REL 2025-009

Peer reviewed
PDF on ERIC

Download full text

Regional Educational Laboratory Mid-Atlantic, 2024

These are the appendixes for the report, "Stabilizing School Performance Indicators in New Jersey to Reduce the Effect of Random Error." This study applied a stabilization model called Bayesian hierarchical modeling to group-level data (with groups assigned according to demographic designations) within schools in New Jersey with the aim…

Descriptors: Institutional Evaluation, Elementary Secondary Education, Bayesian Statistics, Test Reliability

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Linear and Nonlinear Indices of Score Accuracy and Item Effectiveness for Measures That Contain Locally Dependent Items

Peer reviewed

Direct link

Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025

The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…

Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis

Controlling for Measurement Error in Evaluations When Treatment Group Assignment Is Based on Noisy Measures

Peer reviewed

Direct link

Robert Meyer; Sara Hu; Michael Christian – Society for Research on Educational Effectiveness, 2023

Background: This paper develops a new method to estimate quasi-experimental evaluation models when it is necessary to control for measurement error in predictors and individual assignment to the treatment group is based on these same fallible variables. A major methodological finding of the study is that standard methods of estimating models that…

Descriptors: Error of Measurement, Measurement Techniques, Elementary Secondary Education, Report Cards

Signal-to-Noise Ratio in Estimating and Testing the Mediation Effect: Structural Equation Modeling versus Path Analysis with Weighted Composites

Peer reviewed

Direct link

Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024

Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…

Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Educational and Psychological…	7
Grantee Submission	3
Journal of Educational…	3
Multivariate Behavioral…	3
ProQuest LLC	3
Advances in Health Sciences…	2
Educational Measurement:…	2
Measurement and Evaluation in…	2
Psychological Methods	2
Sociological Methods &…	2
Advances in Physiology…	1
American Educational Research…	1
American Journal on…	1
Applied Measurement in…	1
Applied Psychological…	1
Assessment	1
Assessment & Evaluation in…	1
Audio-Visual Language Journal	1
Autism: The International…	1
Behavior Therapy	1
British Educational Research…	1
Communication Education	1
Developmental Medicine &…	1
Developmental Psychology	1
ETS Research Institute	1
More ▼

Raykov, Tenko	3
Al Otaiba, Stephanie	2
Cason, Gerald J.	2
Gatlin, Brandy	2
Kim, Young-Suk Grace	2
Schatschneider, Christopher	2
Wanzek, Jeanne	2
Aksu, Gökhan	1
Alonso, Ariel	1
Alyssa M. Merbler	1
Amanda Timmerman	1
Amit Sevak	1
Audrey Linden	1
Bakeman, Roger	1
Bang Quan Zheng	1
Bardhoshi, Gerta	1
Bateman, Andrea	1
Bohn, Christine A.	1
Bohn, Emil	1
Brandt, Lorilynn	1
Breanne J. Byiers	1
Busch, John Christian	1
Camilli, Gregory	1
Cason, Carolyn L.	1
More ▼