ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	10
Since 2017 (last 10 years)	35
Since 2007 (last 20 years)	84

Descriptor

Comparative Analysis	130
Scoring	130
Test Reliability	47
Reliability	46
Interrater Reliability	41
Foreign Countries	36
Correlation	32
Test Validity	27
English (Second Language)	23
Validity	23
Scores	22
Computer Assisted Testing	21
Test Items	21
Language Tests	19
Writing Evaluation	19
Evaluators	18
Second Language Learning	17
Statistical Analysis	17
Test Construction	17
Essays	15
Item Response Theory	15
Student Evaluation	15
Evaluation Methods	14
Item Analysis	13
Computer Software	12
More ▼

Publication Type

Reports - Research	83
Journal Articles	81
Reports - Evaluative	27
Speeches/Meeting Papers	13
Dissertations/Theses -…	7
Tests/Questionnaires	5
Collected Works - General	4
Numerical/Quantitative Data	4
Reports - Descriptive	4
Books	3
Guides - Non-Classroom	2
Information Analyses	2
Guides - General	1
More ▼

Education Level

Higher Education	18
Postsecondary Education	16
Secondary Education	16
Elementary Education	11
High Schools	9
Elementary Secondary Education	7
Grade 11	3
Grade 2	3
Middle Schools	3
Grade 1	2
Grade 10	2
Grade 12	2
Grade 3	2
Grade 4	2
Grade 5	2
Grade 6	2
Grade 7	2
Grade 8	2
Intermediate Grades	2
Junior High Schools	2
Kindergarten	2
Preschool Education	2
Adult Education	1
Early Childhood Education	1
Grade 9	1
More ▼

Audience

Practitioners	2
Teachers	1

Location

Australia	5
United Kingdom (England)	4
China	3
New York	3
Taiwan	3
Vermont	3
Connecticut	2
Germany	2
Hong Kong	2
Iran	2
Japan	2
New Hampshire	2
Rhode Island	2
Singapore	2
Turkey	2
Arizona	1
Austria	1
Colorado	1
Europe	1
Florida	1
Georgia	1
India	1
Jordan	1
Maryland	1
Netherlands	1
More ▼

Laws, Policies, & Programs

Every Student Succeeds Act…

What Works Clearinghouse Rating

Showing 1 to 15 of 130 results Save | Export

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Coherence-Based Automatic Short Answer Scoring Using Sentence Embedding

Peer reviewed

Direct link

Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024

Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…

Descriptors: Scoring, Essays, Writing Evaluation, Memory

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Initial Evidence Supporting Interpretations of Scores from the Enhanced ACT Test. ACT Research. Research Report. R2425

Download full text

Jeff Allen; Ty Cruce – ACT Education Corp., 2025

This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…

Descriptors: College Entrance Examinations, Testing, Change, Scores

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Investigating a New Method for Standardising Essay Marking Using Levels-Based Mark Schemes

Peer reviewed
PDF on ERIC

Download full text

Greatorex, Jackie; Sutch, Tom; Werno, Magda; Bowyer, Jess; Dunn, Karen – International Journal of Assessment Tools in Education, 2019

Standardisation is a procedure used by Awarding Organisations to maximise marking reliability, by teaching examiners to consistently judge scripts using a mark scheme. However, research shows that people are better at comparing two objects than judging each object individually. Consequently, Oxford, Cambridge and RSA (OCR, a UK awarding…

Descriptors: Reliability, Achievement Rating, Standards, Scoring

A Comparative Analysis of the "Early Childhood Environment Rating Scale--Revised" and "Early Childhood Environment Rating Scale, Third Edition"

Peer reviewed
PDF on ERIC

Download full text

Direct link

Neitzel, Jennifer; Early, Diane; Sideris, John; LaForrett, Doré; Abel, Michael B.; Soli, Margaret; Davidson, Dawn L.; Haboush-Deloye, Amanda; Hestenes, Linda L.; Jenson, Denise; Johnson, Cindy; Kalas, Jennifer; Mamrak, Angela; Masterson, Marie L.; Mims, Sharon U.; Oya, Patti; Philson, Bobbi; Showalter, Megan; Warner-Richter, Mallory; Kortright Wood, Jill – Journal of Early Childhood Research, 2019

The Early Childhood Environment Rating Scales, including the "Early Childhood Environment Rating Scale--Revised" (Harms et al., 2005) and the "Early Childhood Environment Rating Scale, Third Edition" (Harms et al., 2015) are the most widely used observational assessments in early childhood learning environments. The most recent…

Descriptors: Rating Scales, Early Childhood Education, Educational Quality, Scoring

Applying Generalizability Theory in Language Testing: Comparing Nested and Crossed Scoring Designs in the Assessment of Speaking Skills

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021

Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…

Descriptors: Language Tests, Scoring, Speech Communication, State Universities

Reliability and Stability of the Metrical Stress Effect on Segmental Production Accuracy in Persons with Apraxia of Speech

Peer reviewed

Direct link

Bailey, Dallin J.; Bunker, Lisa; Mauszycki, Shannon; Wambaugh, Julie L. – International Journal of Language & Communication Disorders, 2019

Background: Acquired apraxia of speech (AOS) involves speech-production deficits on both the segmental and suprasegmental levels. Recent research has identified a non-linear interaction between the metrical structure of bisyllabic words and word-production accuracy in German speakers with AOS, with trochaic words (strong-weak stress) being…

Descriptors: Accuracy, Suprasegmentals, Phonology, German

Monitoring the Performance of Human and Automated Scores for Spoken Responses

Peer reviewed

Direct link

Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…

Descriptors: Automation, Scoring, Speech Tests, Language Tests

Students' Use of Formalisations for Improved Logical Reasoning

Peer reviewed

Direct link

Bronkhorst, Hugo; Roorda, Gerrit; Suhre, Cor; Goedhart, Martin – Research in Mathematics Education, 2022

Logical reasoning as part of critical thinking is becoming more and more important to prepare students for their future life in society, work, and study. This article presents the results of a quasi-experimental study with a pre-test-post-test control group design focusing on the effective use of formalisations to support logical reasoning. The…

Descriptors: Mathematics Instruction, Teaching Methods, Logical Thinking, Critical Thinking

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

ProQuest LLC	7
ETS Research Report Series	6
Educational and Psychological…	5
Language Testing	5
Applied Measurement in…	4
Journal of Speech, Language,…	4
Assessment in Education:…	3
Journal of Applied Testing…	2
Journal of Psychoeducational…	2
ACT Education Corp.	1
Action in Teacher Education	1
Advances in Health Sciences…	1
Advances in Physiology…	1
Anatomical Sciences Education	1
Applied Linguistics	1
Applied Psychological…	1
Asia Pacific Education Review	1
Assessment	1
Assessment for Effective…	1
Australian Journal of…	1
Brookes Publishing Company	1
Bulletin of Faculty of…	1
CALICO Journal	1
Canadian Journal of School…	1
Child Language Teaching and…	1
More ▼

Attali, Yigal	3
Crehan, Kevin D.	2
Darling-Hammond, Linda	2
Lunz, Mary E.	2
Martin, Michael O., Ed.	2
O'Neill, Thomas R.	2
Abdul Gafoor, K.	1
Abel, Michael B.	1
Al-Shbeil, Abeer	1
Alkahtani, Saif F.	1
Allan S. Cohen	1
Allen, Abigail	1
Alqarni, Abdulelah Mohammed	1
Alt, Mary	1
Amanda Huee-Ping Wong	1
Apple, Kristen	1
Arneson, Brian Todd	1
August, Diane	1
Bailey, Dallin J.	1
Baldwin, Peter	1
Balogh, Jennifer	1
Barkaoui, Khaled	1
Barrueco, Sandra	1
Bauer, Daniel	1
More ▼

Test of English as a Foreign…	5
National Assessment of…	4
Trends in International…	4
Peabody Picture Vocabulary…	3
New York State Regents…	2
SAT (College Admission Test)	2
Woodcock Johnson Tests of…	2
ACT Assessment	1
Draw a Person Test	1
Early Childhood Environment…	1
Early Childhood Longitudinal…	1
Expressive One Word Picture…	1
Goodenough Harris Drawing Test	1
Graduate Management Admission…	1
Graduate Record Examinations	1
International Association for…	1
Kaufman Assessment Battery…	1
McCarthy Scales of Childrens…	1
Mean Length of Utterance	1
Michigan Test of English…	1
Neale Analysis of Reading…	1
Praxis Series	1
Progress in International…	1
Strong Campbell Interest…	1
Test of Language Development	1
More ▼