Publication Date
In 2025 | 2 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 10 |
Descriptor
Comparative Testing | 12 |
Computer Assisted Testing | 12 |
Evaluation Methods | 12 |
Intermode Differences | 4 |
Interrater Reliability | 4 |
Item Analysis | 4 |
Scoring | 4 |
Academic Achievement | 3 |
Multiple Choice Tests | 3 |
Test Format | 3 |
Test Reliability | 3 |
More ▼ |
Source
Journal of Technology,… | 4 |
Applied Psychological… | 1 |
British Educational Research… | 1 |
British Journal of… | 1 |
Computers & Education | 1 |
Educational Research and… | 1 |
Journal of Educational… | 1 |
Journal of Personnel… | 1 |
Author
Algozzine, Bob | 1 |
Attali, Yigal | 1 |
Boughton, Keith | 1 |
Bridgeman, Brent | 1 |
Brown, Michelle Stallone | 1 |
Flowers, Claudia | 1 |
Gretes, John | 1 |
Hamid Mohammadi | 1 |
Huynh, Huynh | 1 |
Jonas Flodén | 1 |
Kim, Do-Hong | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 9 |
Reports - Evaluative | 3 |
Collected Works - Proceedings | 1 |
Education Level
Higher Education | 8 |
Postsecondary Education | 4 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 5 | 1 |
High Schools | 1 |
Middle Schools | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Morrison, Keith – Educational Research and Evaluation, 2013
This paper reviews the literature on comparing online and paper course evaluations in higher education and provides a case study of a very large randomised trial on the topic. It presents a mixed but generally optimistic picture of online course evaluations with respect to response rates, what they indicate, and how to increase them. The paper…
Descriptors: Literature Reviews, Course Evaluation, Case Studies, Higher Education
Ventouras, Errikos; Triantis, Dimos; Tsiakas, Panagiotis; Stergiopoulos, Charalampos – Computers & Education, 2011
The aim of the present research was to compare the use of multiple-choice questions (MCQs) as an examination method against the oral examination (OE) method. MCQs are widely used and their importance seems likely to grow, due to their inherent suitability for electronic assessment. However, MCQs are influenced by the tendency of examinees to guess…
Descriptors: Grades (Scholastic), Scoring, Multiple Choice Tests, Test Format
Park, Jooyong – British Journal of Educational Technology, 2010
The newly developed computerized Constructive Multiple-choice Testing system is introduced. The system combines short answer (SA) and multiple-choice (MC) formats by asking examinees to respond to the same question twice, first in the SA format, and then in the MC format. This manipulation was employed to collect information about the two…
Descriptors: Grade 5, Evaluation Methods, Multiple Choice Tests, Scores
Attali, Yigal; Bridgeman, Brent; Trapani, Catherine – Journal of Technology, Learning, and Assessment, 2010
A generic approach in automated essay scoring produces scores that have the same meaning across all prompts, existing or new, of a writing assessment. This is accomplished by using a single set of linguistic indicators (or features), a consistent way of combining and weighting these features into essay scores, and a focus on features that are not…
Descriptors: Writing Evaluation, Writing Tests, Scoring, Test Scoring Machines
Kim, Do-Hong; Huynh, Huynh – Journal of Technology, Learning, and Assessment, 2007
This study examined comparability of student scores obtained from computerized and paper-and-pencil formats of the large-scale statewide end-of-course (EOC) examinations in the two subject areas of Algebra and Biology. Evidence in support of comparability of computerized and paper-based tests was sought by examining scale scores, item parameter…
Descriptors: Computer Assisted Testing, Measures (Individuals), Biology, Algebra
Puhan, Gautam; Boughton, Keith; Kim, Sooyeon – Journal of Technology, Learning, and Assessment, 2007
The study evaluated the comparability of two versions of a certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). An effect size measure known as Cohen's d and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that the effect…
Descriptors: Computer Assisted Testing, Effect Size, Test Bias, Mathematics Tests
Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007
The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…
Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences
van der Linden, Wim J. – Applied Psychological Measurement, 2006
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Equated Scores
Mohanty, Ganesh; Gretes, John; Flowers, Claudia; Algozzine, Bob; Spooner, Fred – Journal of Personnel Evaluation in Education, 2005
Student evaluation of instruction in college and university courses has been a routine and mandatory part of undergraduate and graduate education for some time. A major shortcoming of the process is that it often relies exclusively on the opinions or qualitative judgments of students rather than the learning or transfer of knowledge that takes…
Descriptors: Evaluation Methods, Engineering Education, College Instruction, Student Evaluation of Teacher Performance
Weiss, David J., Ed. – 1980
This report is the Proceedings of the third conference of its type. Included are 23 of the 25 papers presented at the conference, discussion of these papers by invited discussants, and symposium papers by a group of leaders in adaptive testing and latent trait test theory research and applications. The papers are organized into the following…
Descriptors: Academic Ability, Academic Achievement, Comparative Testing, Computer Assisted Testing