ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	11

Descriptor

Evaluators	13
Scoring	13
Statistical Analysis	13
Correlation	7
Essays	7
Second Language Learning	6
Computer Assisted Testing	5
English (Second Language)	5
Writing Evaluation	5
Writing Tests	5
College Students	4
Language Tests	4
Comparative Analysis	3
Computer Software	3
Cues	3
Evaluation Criteria	3
Evaluation Methods	3
Foreign Countries	3
Holistic Approach	3
Qualitative Research	3
Reliability	3
Generalization	2
Interrater Reliability	2
Interviews	2
Korean	2
More ▼

Source

ETS Research Report Series	3
Applied Measurement in…	2
Contemporary Issues in…	1
Journal of Experimental…	1
Language Teaching Research	1
Language Testing	1
PASAA: Journal of Language…	1
ProQuest LLC	1
World Journal of Education	1

Publication Type

Journal Articles	11
Reports - Research	10
Reports - Evaluative	2
Dissertations/Theses -…	1
Information Analyses	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	2
Secondary Education	1

Audience

Location

Nigeria	1
South Korea	1
Texas	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Graduate Record Examinations	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Effects of Analytical and Holistic Scoring Patterns on Scorer Reliability in Biology Essay Tests

Peer reviewed
PDF on ERIC

Download full text

Ebuoh, Casmir N. – World Journal of Education, 2018

Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…

Descriptors: Holistic Approach, Scoring, Essay Tests, Biology

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Task and Rater Effects in L2 Speaking and Writing: A Synthesis of Generalizability Studies

Peer reviewed

Direct link

In'nami, Yo; Koizumi, Rie – Language Testing, 2016

We addressed Deville and Chalhoub-Deville's (2006), Schoonen's (2012), and Xi and Mollaun's (2006) call for research into the contextual features that are considered related to person-by-task interactions in the framework of generalizability theory in two ways. First, we quantitatively synthesized the generalizability studies to determine the…

Descriptors: Evaluators, Second Language Learning, Writing Skills, Oral Language

Comparing Human and Automated Essay Scoring for Prospective Graduate Students with Learning Disabilities and/or ADHD

Peer reviewed

Direct link

Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…

Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring

Examining the Impact of Scoring Methods on the Institutional EFL Writing Assessment: A Turkish Perspective

Peer reviewed
PDF on ERIC

Download full text

Han, Turgay; Huang, Jinyan – PASAA: Journal of Language Teaching and Learning in Thailand, 2017

Using generalizability (G-) theory and rater interviews as both quantitative and qualitative approaches, this study examined the impact of scoring methods (i.e., holistic versus analytic scoring) on the scoring variability and reliability of an EFL institutional writing assessment at a Turkish university. Ten raters were invited to rate 36…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

Developing Conceptual Understanding of Sarcasm in L2 English through Explicit Instruction

Peer reviewed

Direct link

Kim, Jiyun; Lantolf, James P. – Language Teaching Research, 2018

This article reports on a pedagogical project aimed at helping second language (L2) learners of English develop the ability to detect and appropriately interpret spoken sarcasm. The study used a pre- and posttest procedure to assess the development of learners' ability to both detect sarcasm and impute appropriate speaker intentions and attitudes…

Descriptors: Concept Formation, Native Language, Korean, Language Usage

Evaluation of the "e-rater"® Scoring Engine for the "TOEFL"® Independent and Integrated Prompts. Research Report. ETS RR-12-06

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…

Descriptors: Scoring, Prompting, Evaluators, Computer Software

The Relationship between Raters' Prior Language Study and the Evaluation of Foreign Language Speech Samples. TOEFL iBT® Research Report. TOEFL iBT-16. ETS Research Report RR-11-30

Peer reviewed
PDF on ERIC

Download full text

Winke, Paula; Gass, Susan; Myford, Carol – ETS Research Report Series, 2011

This study investigated whether raters' second language (L2) background and the first language (L1) of test takers taking the TOEFL iBT® Speaking test were related through scoring. After an initial 4-hour training period, a group of 107 raters (mostly of learners of Chinese, Korean, and Spanish), listened to a selection of 432 speech samples that…

Descriptors: Second Language Learning, Evaluators, Speech Tests, English (Second Language)

Automated Essay Scoring versus Human Scoring: A Correlational Study

Peer reviewed

Direct link

Wang, Jinhao; Brown, Michelle Stallone – Contemporary Issues in Technology and Teacher Education (CITE Journal), 2008

The purpose of the current study was to analyze the relationship between automated essay scoring (AES) and human scoring in order to determine the validity and usefulness of AES for large-scale placement tests. Specifically, a correlational research design was used to examine the correlations between AES performance and human raters' performance.…

Descriptors: Scoring, Essays, Computer Assisted Testing, Sentence Structure

Analytic Scoring of TOEFL® CBT Essays: Scores from Humans and "E-rater"®. TOEFL® Research Reports. RR-81. ETS RR-08-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – ETS Research Report Series, 2008

The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and "e-rater"® essay feature variables in the context of the TOEFL® computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Scoring

Visual Inspection and Statistical Analysis in Single-Case Designs.

Peer reviewed

Park, Hyun-Sook; And Others – Journal of Experimental Education, 1990

The reliability of visual inspection in single-case research was investigated by determining agreement among 5 judges visually inspecting 44 graphs depicting behavior from baseline to intervention. Agreement between visual inspection and statistical procedures was determined. Implications for single-case research are discussed. (SLD)

Descriptors: Behavior Patterns, Evaluation Methods, Evaluators, Graphs

Some Practical Solutions to Standard-Setting Problems: The Georgia Teacher Certification Test Experience.

Download full text

Cramer, Stephen E. – 1990

A standard-setting procedure was developed for the Georgia Teacher Certification Testing Program as tests in 30 teaching fields were revised. A list of important characteristics of a standard-setting procedure was derived, drawing on the work of R. A. Berk (1986). The best method was found to be a highly formalized judgmental, empirical Angoff…

Descriptors: Computer Assisted Testing, Cutting Scores, Data Collection, Elementary Secondary Education

Attali, Yigal	1
Bridgeman, Brent	1
Brown, Michelle Stallone	1
Buzick, Heather	1
Cramer, Stephen E.	1
Davey, Tim	1
Ebuoh, Casmir N.	1
Ferrara, Steve	1
Flor, Michael	1
Gass, Susan	1
Gentile, Claudia	1
Han, Turgay	1
Huang, Jinyan	1
In'nami, Yo	1
Kantor, Robert	1
Kim, Jiyun	1
Koizumi, Rie	1
Lantolf, James P.	1
Lee, Yong-Won	1
Myford, Carol	1
Oliveri, Maria Elena	1
Park, Hyun-Sook	1
Ramineni, Chaitanya	1
Steedle, Jeffrey T.	1
More ▼