ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	22

Descriptor

Correlation	26
Writing Tests	26
Comparative Analysis	12
Scoring	12
English (Second Language)	10
Reliability	10
Test Reliability	10
Essays	8
Interrater Reliability	8
Scores	8
Second Language Learning	8
Statistical Analysis	8
Test Validity	8
Writing Evaluation	8
Computer Software	7
Foreign Countries	7
Language Tests	7
Computer Assisted Testing	6
Academic Achievement	5
Essay Tests	5
Predictor Variables	5
Test Items	5
College Entrance Examinations	4
College Students	4
Evaluation Methods	4
More ▼

Source

ETS Research Report Series	6
Australian Educational…	2
College Board	2
Applied Linguistics	1
Applied Measurement in…	1
Bulletin of the Association…	1
Business and Professional…	1
Educational Testing Service	1
Eurasian Journal of…	1
Journal of Education and…	1
Journal of Language and…	1
Language Teaching Research…	1
ProQuest LLC	1
ReCALL	1
Reading and Writing: An…	1
Scandinavian Journal of…	1
More ▼

Publication Type

Journal Articles	19
Reports - Research	18
Reports - Evaluative	4
Tests/Questionnaires	3
Numerical/Quantitative Data	2
Reports - Descriptive	2
Dissertations/Theses -…	1
Non-Print Media	1
Reference Materials - General	1
Speeches/Meeting Papers	1

Education Level

Higher Education	7
Secondary Education	6
Postsecondary Education	5
Elementary Secondary Education	3
High Schools	3
Elementary Education	2
Grade 10	2
Grade 7	1
Grade 9	1

Audience

Location

Australia	2
Hong Kong	1
India	1
New Jersey	1
Norway	1
Turkey	1
Utah	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	3
Test of English as a Foreign…	3
ACT Assessment	1
Graduate Record Examinations	1
International English…	1
Praxis Series	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Properties of a Combined Measure of Reading and Writing: The Assessment of Writing, Self-Monitoring, and Reading (AWSM Reader)

Peer reviewed

Direct link

Gioia, Anthony R.; Ahmed, Yusra; Woods, Steven P.; Cirino, Paul T. – Reading and Writing: An Interdisciplinary Journal, 2023

There is significant overlap between reading and writing, but no known standardized measure assesses these jointly. The goal of the present study is to evaluate the properties of a novel measure, the Assessment of Writing, Self-Monitoring, and Reading (AWSM Reader), that simultaneously evaluates both reading comprehension and writing. In doing so,…

Descriptors: Reading Writing Relationship, Writing Evaluation, Self Evaluation (Individuals), Executive Function

Development and Validation of the Written Communication Assessment of the "HEIghten"® Outcomes Assessment Suite. Research Report. ETS RR-17-53

Peer reviewed
PDF on ERIC

Download full text

Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017

Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…

Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Reexamining the Writing Apprehension Measure

Peer reviewed

Direct link

Autman, Hamlet; Kelly, Stephanie – Business and Professional Communication Quarterly, 2017

This article contains two measurement development studies on writing apprehension. Study 1 reexamines the validity of the writing apprehension measure based on the finding from prior research that a second false factor was embedded. The findings from Study 1 support the validity of a reduced measure with 6 items versus the original 20-item…

Descriptors: Writing Apprehension, Writing Tests, Test Validity, Test Reliability

Does Indirect Writing Assessment Have Any Relevance to Direct Writing Assessment? Focus on Validity and Reliability

Peer reviewed
PDF on ERIC

Download full text

Kural, Faruk – Journal of Language and Linguistic Studies, 2018

The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…

Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays

Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…

Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring

Equating a Large-Scale Writing Assessment Using Pairwise Comparisons of Performances

Peer reviewed

Direct link

Humphry, Stephen M.; McGrane, Joshua A. – Australian Educational Researcher, 2015

This paper presents a method for equating writing assessments using pairwise comparisons which does not depend upon conventional common-person or common-item equating designs. Pairwise comparisons have been successfully applied in the assessment of open-ended tasks in English and other areas such as visual art and philosophy. In this paper,…

Descriptors: Writing Evaluation, Evaluation Methods, Comparative Analysis, Writing Tests

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

Performance Assessment of High and Low Income Families through "Online RAW Achievement Battery Test" of Primary Grade Students

Peer reviewed
PDF on ERIC

Download full text

Ahmed, Tamim; Hanif, Maria – Journal of Education and Practice, 2016

This study is intended to investigate student's achievement capability among two families i.e. Low and High income families and designed for primary level learners. A Reading, Arithmetic and Writing (RAW) Achievement test that was developed as a part of another research study (Tamim Ahmed Khan, 2015) was adopted for this study. Both English medium…

Descriptors: Low Income, Performance Based Assessment, Elementary School Students, Achievement Tests

Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability

Peer reviewed
PDF on ERIC

Download full text

Kayapinar, Ulas – Eurasian Journal of Educational Research, 2014

Problem Statement: There have been many attempts to research the effective assessment of writing ability, and many proposals for how this might be done. In this sense, rater reliability plays a crucial role for making vital decisions about testees in different turning points of both educational and professional life. Intra-rater and inter-rater…

Descriptors: Interrater Reliability, Essay Tests, Writing Tests, Grading

Automated Trait Scores for "GRE"® Writing Tasks. Research Report. ETS RR-15-15

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015

The "e-rater"® automated essay scoring system is used operationally in the scoring of the argument and issue tasks that form the Analytical Writing measure of the "GRE"® General Test. For each of these tasks, this study explored the value added of reporting 4 trait scores for each of these 2 tasks over the total e-rater score.…

Descriptors: Scores, Computer Assisted Testing, Computer Software, Grammar

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Toward Automated Multi-Trait Scoring of Essays: Investigating Links among Holistic, Analytic, and Text Feature Scores

Peer reviewed

Direct link

Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – Applied Linguistics, 2010

The main purpose of the study was to investigate the distinctness and reliability of analytic (or multi-trait) rating dimensions and their relationships to holistic scores and "e-rater"[R] essay feature variables in the context of the TOEFL[R] computer-based test (TOEFL CBT) writing assessment. Data analyzed in the study were holistic…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

Using the Method of Pairwise Comparison to Obtain Reliable Teacher Assessments

Peer reviewed
PDF on ERIC

Download full text

Heldsinger, Sandra; Humphry, Stephen – Australian Educational Researcher, 2010

Demands for accountability have seen the implementation of large scale testing programs in Australia and internationally. There is, however, a growing body of evidence to show that externally imposed testing programs do not have a sustained impact on student achievement. It has been argued that teacher assessment is more effective in raising…

Descriptors: Testing Programs, Testing, Academic Achievement, Measures (Individuals)

Previous Page | Next Page »

Pages: 1 | 2

Attali, Yigal	2
Gentile, Claudia	2
Kantor, Robert	2
Lee, Yong-Won	2
Zhang, Mo	2
Ahmed, Tamim	1
Ahmed, Yusra	1
Armstrong, William B.	1
Autman, Hamlet	1
Berge, Kjell Lars	1
Breyer, F. Jay	1
Briller, Vladimir	1
Camara, Wayne	1
Cirino, Paul T.	1
Coniam, David	1
Dunn, David E.	1
Elliot, Norbert	1
Evensen, Lars Sigfred	1
Fasting, Rolf B.	1
Ferrara, Steve	1
Gioia, Anthony R.	1
Haberman, Shelby J.	1
Hanif, Maria	1
Heldsinger, Sandra	1
Hendrickson, Amy	1
More ▼