Publication Date
In 2025 | 3 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 19 |
Descriptor
Computer Assisted Testing | 19 |
Reliability | 19 |
Validity | 11 |
Evaluation Methods | 7 |
Scoring | 7 |
Foreign Countries | 5 |
Test Construction | 5 |
Artificial Intelligence | 4 |
Automation | 4 |
COVID-19 | 4 |
Feedback (Response) | 4 |
More ▼ |
Source
Author
Aaron McVay | 1 |
Al-Bahlani, Sara | 1 |
Al-Maqbali, Asma Hilal | 1 |
Alexandros Tantos | 1 |
Allehaiby, Wid Hasen | 1 |
Amanda Huee-Ping Wong | 1 |
Apple, Kristen | 1 |
Beaty, Roger E. | 1 |
Chen, Dandan | 1 |
Choi, Ikkyu | 1 |
Dalton, Sarah Grace | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 13 |
Dissertations/Theses -… | 3 |
Information Analyses | 2 |
Reports - Evaluative | 2 |
Speeches/Meeting Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 9 |
Postsecondary Education | 9 |
Elementary Education | 3 |
Grade 5 | 2 |
Intermediate Grades | 2 |
Middle Schools | 2 |
Adult Education | 1 |
Early Childhood Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
High School Equivalency… | 1 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Shard; Devesh Kumar; Sapna Koul – International Journal of Information and Learning Technology, 2024
Purpose: This study aims to gain insights into how students perceive online examination practices and evaluation, as well as identify the key factors that impact their intentions toward online exams. Design/methodology/approach: This empirical study conducted in India utilized an online survey method between May 24 and June 14, 2022. The data were…
Descriptors: Foreign Countries, Undergraduate Students, Graduate Students, Student Attitudes
Despina Papadopoulou; Nikolaos Amvrazis; Gerakini Douka; Alexandros Tantos – Modern Language Journal, 2024
The article introduces triangulation to converge evidence from corpus and experimental data, by means of two case studies in second language (L2) learners of Greek. The first case study investigates the acquisition of gender agreement, while the second probes the development of relative clauses. In both studies, findings from the corpus are tested…
Descriptors: Greek, Phrase Structure, Second Language Learning, Second Language Instruction
Elkhatat, Ahmed M. – International Journal for Educational Integrity, 2022
Examinations form part of the assessment processes that constitute the basis for benchmarking individual educational progress, and must consequently fulfill credibility, reliability, and transparency standards in order to promote learning outcomes and ensure academic integrity. A randomly selected question examination (RSQE) is considered to be an…
Descriptors: Integrity, Monte Carlo Methods, Credibility, Reliability
Doewes, Afrizal; Pechenizkiy, Mykola – International Educational Data Mining Society, 2021
Scoring essays is generally an exhausting and time-consuming task for teachers. Automated Essay Scoring (AES) facilitates the scoring process to be faster and more consistent. The most logical way to assess the performance of an automated scorer is by measuring the score agreement with the human raters. However, we provide empirical evidence that…
Descriptors: Man Machine Systems, Automation, Computer Assisted Testing, Scoring
Emily B. Goldberg; Sheila R. Pratt; Malcolm R. McNeil; Neil Szuminsky; Kenneth DeHaan; Leslie Q. Zhen – Journal of Speech, Language, and Hearing Research, 2025
Purpose: The present study assessed the test-retest reliability of the American Sign Language (ASL) version of the Computerized Revised Token Test (CRTT-ASL) and compared the differences and similarities between ASL and English reading by Deaf and hearing users of ASL. Method: Creation of the CRTT-ASL involved filming, editing, and validating CRTT…
Descriptors: American Sign Language, Reliability, Validity, Test Construction
Aaron McVay – ProQuest LLC, 2021
As assessments move towards computerized testing and making continuous testing available the need for rapid assembly of forms is increasing. The objective of this study was to investigate variability in assembled forms through the lens of first- and second-order equity properties of equating, by examining three factors and their interactions. Two…
Descriptors: Automation, Computer Assisted Testing, Test Items, Reaction Time
Ramsey Lee Cardwell – ProQuest LLC, 2022
The emergence of digital-first assessments is prompting reconsideration of, and innovation in, aspects of psychometrics, test validation, and test use. Using the Duolingo English Test (DET) as an example, this three-paper series seeks to address issues concerning the estimation of classification consistency and the reporting of results for such…
Descriptors: Classification, Reliability, Language Proficiency, Computer Assisted Testing
Chen, Dandan; Hebert, Michael; Wilson, Joshua – American Educational Research Journal, 2022
We used multivariate generalizability theory to examine the reliability of hand-scoring and automated essay scoring (AES) and to identify how these scoring methods could be used in conjunction to optimize writing assessment. Students (n = 113) included subsamples of struggling writers and non-struggling writers in Grades 3-5 drawn from a larger…
Descriptors: Reliability, Scoring, Essays, Automation
Nejdet Karadag – Journal of Educational Technology and Online Learning, 2023
The purpose of this study is to examine the impact of artificial intelligence (AI) on online assessment in the context of opportunities and threats based on the literature. To this end, 19 articles related to the AI tool ChatGPT and online assessment were analysed through rapid literature review. In the content analysis, the themes of "AI's…
Descriptors: Artificial Intelligence, Computer Assisted Testing, Natural Language Processing, Grading
Kershree Padayachee; M. Matimolane – Teaching in Higher Education, 2025
In the shift to Emergency Remote Teaching and Learning (ERT&L) during the COVID-19 pandemic, remote assessment and feedback became a major source of discontent and challenge for students and staff. This paper is a reflection and analysis of assessment practices during ERT&L, and our theorisation of the possibilities for shifts towards…
Descriptors: Educational Quality, Social Justice, Distance Education, Feedback (Response)
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022
Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…
Descriptors: Semantics, Scoring, Creative Thinking, Creativity
Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…
Descriptors: Validity, Discourse Analysis, Databases, Scoring
Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021
Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…
Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing
Previous Page | Next Page »
Pages: 1 | 2