ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	4

Descriptor

Computer Assisted Testing	6
True Scores	6
Scoring	4
Automation	2
Comparative Analysis	2
Computer Simulation	2
Correlation	2
English (Second Language)	2
Essays	2
High Stakes Tests	2
Interrater Reliability	2
Language Tests	2
Prediction	2
Second Language Learning	2
Test Scoring Machines	2
Writing Evaluation	2
Ability	1
Academic Ability	1
Accuracy	1
Adaptive Testing	1
Bias	1
College English	1
College Entrance Examinations	1
College Students	1
Comparative Education	1
More ▼

Source

ETS Research Report Series	2
Journal of Educational…	2
Applied Measurement in…	1
Journal of Technology,…	1

Author

Attali, Yigal	1
Ben-Simon, Anat	1
Brown, Michelle Stallone	1
Cohen, Yoav	1
Haberman, Shelby J.	1
Hirsch, Thomas M.	1
Levi, Effi	1
Wang, Jinhao	1
Yao, Lili	1
Zhang, Mo	1
Zwick, Rebecca	1
More ▼

Publication Type

Journal Articles	6
Reports - Research	5
Reports - Evaluative	1

Education Level

Higher Education	2
Postsecondary Education	1

Audience

Location

Israel	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Graduate Record Examinations	1
Praxis Series	1

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Prediction of Writing True Scores in Automated Scoring of Essays by Best Linear Predictors and Penalized Best Linear Predictors. Research Report. ETS RR-19-13

Peer reviewed
PDF on ERIC

Download full text

Yao, Lili; Haberman, Shelby J.; Zhang, Mo – ETS Research Report Series, 2019

Many assessments of writing proficiency that aid in making high-stakes decisions consist of several essay tasks evaluated by a combination of human holistic scores and computer-generated scores for essay features such as the rate of grammatical errors per word. Under typical conditions, a summary writing score is provided by a linear combination…

Descriptors: Prediction, True Scores, Computer Assisted Testing, Scoring

Effect of Rasch Calibration on Ability and DIF Estimation in Computer-Adaptive Tests.

Peer reviewed

Zwick, Rebecca; And Others – Journal of Educational Measurement, 1995

In a simulation study of ability and estimation of differential item functioning (DIF) in computerized adaptive tests, Rasch-based DIF statistics were highly correlated with generating DIF, but DIF statistics tended to be slightly smaller than in the three-parameter logistic model analyses. (SLD)

Descriptors: Ability, Adaptive Testing, Computer Assisted Testing, Computer Simulation

Automated Essay Scoring versus Human Scoring: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007

The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…

Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences

Construct Validity of "e-rater"® in Scoring TOEFL® Essays. Research Report. ETS RR-07-21

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal – ETS Research Report Series, 2007

This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…

Descriptors: Construct Validity, Computer Assisted Testing, Scoring, English (Second Language)

Multidimensional Equating.

Peer reviewed

Hirsch, Thomas M. – Journal of Educational Measurement, 1989

Equatings were performed on both simulated and real data sets using common-examinee design and two abilities for each examinee. Results indicate that effective equating, as measured by comparability of true scores, is possible with the techniques used in this study. However, the stability of the ability estimates proved unsatisfactory. (TJH)

Descriptors: Academic Ability, College Students, Comparative Analysis, Computer Assisted Testing