ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	6

Descriptor

Comparative Testing	12
Computer Assisted Testing	12
Scoring	12
Higher Education	6
Multiple Choice Tests	5
College Students	4
Evaluation Methods	4
Test Items	4
Comparative Analysis	3
Item Analysis	3
Test Format	3
Test Reliability	3
Testing Problems	3
Adaptive Testing	2
Correlation	2
Difficulty Level	2
Error of Measurement	2
Evaluation Research	2
Grading	2
Interrater Reliability	2
Response Style (Tests)	2
Scoring Rubrics	2
Test Length	2
Test Scoring Machines	2
Test Validity	2
More ▼

Source

Journal of Technology,…	2
Applied Psychological…	1
British Educational Research…	1
College Student Journal	1
Computers & Education	1
Educational and Psychological…	1
Journal of Applied Testing…	1

Publication Type

Reports - Research	9
Journal Articles	8
Reports - Evaluative	3
Speeches/Meeting Papers	3

Education Level

Higher Education	5
Elementary Secondary Education	1
High Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Maryland	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Comparison of Oral Examination and Electronic Examination Using Paired Multiple-Choice Questions

Peer reviewed

Direct link

Ventouras, Errikos; Triantis, Dimos; Tsiakas, Panagiotis; Stergiopoulos, Charalampos – Computers & Education, 2011

The aim of the present research was to compare the use of multiple-choice questions (MCQs) as an examination method against the oral examination (OE) method. MCQs are widely used and their importance seems likely to grow, due to their inherent suitability for electronic assessment. However, MCQs are influenced by the tendency of examinees to guess…

Descriptors: Grades (Scholastic), Scoring, Multiple Choice Tests, Test Format

Performance of a Generic Approach in Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Bridgeman, Brent; Trapani, Catherine – Journal of Technology, Learning, and Assessment, 2010

A generic approach in automated essay scoring produces scores that have the same meaning across all prompts, existing or new, of a writing assessment. This is accomplished by using a single set of linguistic indicators (or features), a consistent way of combining and weighting these features into essay scores, and a focus on features that are not…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Test Scoring Machines

The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding Their Impact

Peer reviewed

Direct link

Lissitz, Robert W.; Hou, Xiaodong; Slater, Sharon Cadman – Journal of Applied Testing Technology, 2012

This article investigates several questions regarding the impact of different item formats on measurement characteristics. Constructed response (CR) items and multiple choice (MC) items obviously differ in their formats and in the resources needed to score them. As such, they have been the subject of considerable discussion regarding the impact of…

Descriptors: Computer Assisted Testing, Scoring, Evaluation Problems, Psychometrics

Assessing Holland Types on the Internet: A Comparative Study

Peer reviewed

Direct link

Miller, Mark J.; Cowger, Ernest, Jr.; Young, Tony; Tobacyk, Jerome; Sheets, Tillman; Loftus, Christina – College Student Journal, 2008

This study examined the degree of similarity between scores on the Self-Directed Search and an online instrument measuring Holland types. A relatively high congruency score was found between the two measures. Implications for career counselors are discussed.

Descriptors: Career Counseling, Personality Assessment, Congruence (Psychology), Personality Traits

Automated Essay Scoring versus Human Scoring: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007

The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…

Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences

The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

Peer reviewed

Bennett, Randy Elliot; And Others – Applied Psychological Measurement, 1990

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)

Descriptors: College Students, Comparative Testing, Computer Assisted Testing, Computer Science

Quantitative Comparisons of Difficulty, Discrimination and Reliability of Machine-Scored Completion Items and Tests (in the MDT Un-Cued Answer-Bank Format) in Contrast with Statistics from Comparable Multiple Choice Questions: The First Round of Results.

PDF pending restoration

Anderson, Paul S.; Hyers, Albert D. – 1991

Three descriptive statistics (difficulty, discrimination, and reliability) of multiple-choice (MC) test items were compared to those of a new (1980s) format of machine-scored questions. The new method, answer-bank multi-digit testing (MDT), uses alphabetized lists of up to 1,000 alternatives and approximates the completion style of assessment…

Descriptors: College Students, Comparative Testing, Computer Assisted Testing, Correlation

The Analysis of a Construct That Does Not Exist: Misunderstanding the Multidimensional Nature of Trait Anxiety.

Peer reviewed

Endler, Norman S.; Parker, James D. A. – Educational and Psychological Measurement, 1990

C. Davis and M. Cowles (1989) analyzed a total trait anxiety score on the Endler Multidimensional Anxiety Scales (EMAS)--a unidimensional construct that this multidimensional measure does not assess. Data are reanalyzed using the appropriate scoring procedure for the EMAS. Subjects included 145 undergraduates in 1 of 4 testing conditions. (SLD)

Descriptors: Anxiety, Comparative Testing, Computer Assisted Testing, Construct Validity

Confidence in Pass/Fail Decisions for Computer Adaptive and Paper and Pencil Examinations.

Bergstrom, Betty A.; Lunz, Mary E. – 1991

The level of confidence in pass/fail decisions obtained with computer adaptive tests (CATs) was compared to decisions based on paper-and-pencil tests. Subjects included 645 medical technology students from 238 educational programs across the country. The tests used in this study constituted part of the subjects' review for the certification…

Descriptors: Adaptive Testing, Certification, Comparative Testing, Computer Assisted Testing

Methods and Materials for Geography Education Improvement through Innovative, Machine-Scored Assessment.

PDF pending restoration

Hyers, Albert D.; Anderson, Paul S. – 1991

Using matched pairs of geography questions, a new testing method for machine-scored fill-in-the-blank, multiple-digit testing (MDT) questions was compared to the traditional multiple-choice (MC) style. Data were from 118 matched or parallel test items for 4 tests from 764 college students of geography. The new method produced superior results when…

Descriptors: College Students, Comparative Testing, Computer Assisted Testing, Difficulty Level

An Information Comparison of Conventional and Adaptive Tests in the Measurement of Classroom Achievement. Research Report 77-7.

Download full text

Bejar, Isaac I.; And Others – 1977

Information provided by typical and improved conventional classroom achievement tests was compared with information provided by an adaptive test covering the same subject matter. Both tests were administered to over 700 college students in a general biology course. Using the same scoring method, adaptive testing was found to yield substantially…

Descriptors: Academic Achievement, Achievement Tests, Adaptive Testing, Biology

Anderson, Paul S.	2
Hyers, Albert D.	2
Attali, Yigal	1
Bejar, Isaac I.	1
Bennett, Randy Elliot	1
Bergstrom, Betty A.	1
Bridgeman, Brent	1
Brown, Michelle Stallone	1
Cowger, Ernest, Jr.	1
Endler, Norman S.	1
Hou, Xiaodong	1
Jonas Flodén	1
Lissitz, Robert W.	1
Loftus, Christina	1
Lunz, Mary E.	1
Miller, Mark J.	1
Parker, James D. A.	1
Sheets, Tillman	1
Slater, Sharon Cadman	1
Stergiopoulos, Charalampos	1
Tobacyk, Jerome	1
Trapani, Catherine	1
Triantis, Dimos	1
Tsiakas, Panagiotis	1
More ▼