Publication Date
In 2025 | 1 |
Since 2024 | 7 |
Since 2021 (last 5 years) | 13 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 15 |
Descriptor
Artificial Intelligence | 15 |
Natural Language Processing | 15 |
Test Items | 15 |
Models | 7 |
Automation | 6 |
Test Construction | 6 |
Student Evaluation | 4 |
Taxonomy | 4 |
Accuracy | 3 |
Computer Assisted Testing | 3 |
Multiple Choice Tests | 3 |
More ▼ |
Source
Author
Alice Ng | 1 |
Andrew M. Olney | 1 |
Araya, Roberto | 1 |
Ayaka Sugawara | 1 |
Bohm, Isabell | 1 |
Brunnert, Kim | 1 |
Calders, Toon | 1 |
Chintala, Tejas | 1 |
Cole, Brian S. | 1 |
Conati, Cristina | 1 |
Condor, Aubrey | 1 |
More ▼ |
Publication Type
Journal Articles | 9 |
Reports - Research | 9 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 3 |
Books | 1 |
Collected Works - General | 1 |
Collected Works - Proceedings | 1 |
Information Analyses | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 3 |
Elementary Education | 2 |
Elementary Secondary Education | 1 |
Grade 4 | 1 |
Intermediate Grades | 1 |
Secondary Education | 1 |
Audience
Location
Germany | 1 |
Japan | 1 |
Netherlands | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
A Method for Generating Course Test Questions Based on Natural Language Processing and Deep Learning
Hei-Chia Wang; Yu-Hung Chiang; I-Fan Chen – Education and Information Technologies, 2024
Assessment is viewed as an important means to understand learners' performance in the learning process. A good assessment method is based on high-quality examination questions. However, generating high-quality examination questions manually by teachers is a time-consuming task, and it is not easy for students to obtain question banks. To solve…
Descriptors: Natural Language Processing, Test Construction, Test Items, Models
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Sharma, Harsh; Mathur, Rohan; Chintala, Tejas; Dhanalakshmi, Samiappan; Senthil, Ramalingam – Education and Information Technologies, 2023
Examination assessments undertaken by educational institutions are pivotal since it is one of the fundamental steps to determining students' understanding and achievements for a distinct subject or course. Questions must be framed on the topics to meet the learning objectives and assess the student's capability in a particular subject. The…
Descriptors: Taxonomy, Student Evaluation, Test Items, Questioning Techniques
Condor, Aubrey; Litster, Max; Pardos, Zachary – International Educational Data Mining Society, 2021
We explore how different components of an Automatic Short Answer Grading (ASAG) model affect the model's ability to generalize to questions outside of those used for training. For supervised automatic grading models, human ratings are primarily used as ground truth labels. Producing such ratings can be resource heavy, as subject matter experts…
Descriptors: Automation, Grading, Test Items, Generalization
Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024
Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Andrew M. Olney – Grantee Submission, 2023
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Algorithms
Urrutia, Felipe; Araya, Roberto – Journal of Educational Computing Research, 2024
Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection…
Descriptors: Elementary School Students, Grade 4, Elementary School Mathematics, Mathematics Tests
Mead, Alan D.; Zhou, Chenxuan – Journal of Applied Testing Technology, 2022
This study fit a Naïve Bayesian classifier to the words of exam items to predict the Bloom's taxonomy level of the items. We addressed five research questions, showing that reasonably good prediction of Bloom's level was possible, but accuracy varies across levels. In our study, performance for Level 2 was poor (Level 2 items were misclassified…
Descriptors: Artificial Intelligence, Prediction, Taxonomy, Natural Language Processing
Cole, Brian S.; Lima-Walton, Elia; Brunnert, Kim; Vesey, Winona Burt; Raha, Kaushik – Journal of Applied Testing Technology, 2020
Automatic item generation can rapidly generate large volumes of exam items, but this creates challenges for assembly of exams which aim to include syntactically diverse items. First, we demonstrate a diminishing marginal syntactic return for automatic item generation using a saturation detection approach. This analysis can help users of automatic…
Descriptors: Artificial Intelligence, Automation, Test Construction, Test Items
Gombert, Sebastian; Di Mitri, Daniele; Karademir, Onur; Kubsch, Marcus; Kolbe, Hannah; Tautz, Simon; Grimm, Adrian; Bohm, Isabell; Neumann, Knut; Drachsler, Hendrik – Journal of Computer Assisted Learning, 2023
Background: Formative assessments are needed to enable monitoring how student knowledge develops throughout a unit. Constructed response items which require learners to formulate their own free-text responses are well suited for testing their active knowledge. However, assessing such constructed responses in an automated fashion is a complex task…
Descriptors: Coding, Energy, Scientific Concepts, Formative Evaluation
Hong Jiao, Editor; Robert W. Lissitz, Editor – IAP - Information Age Publishing, Inc., 2024
With the exponential increase of digital assessment, different types of data in addition to item responses become available in the measurement process. One of the salient features in digital assessment is that process data can be easily collected. This non-conventional structured or unstructured data source may bring new perspectives to better…
Descriptors: Artificial Intelligence, Natural Language Processing, Psychometrics, Computer Assisted Testing

Sami Baral; Li Lucy; Ryan Knight; Alice Ng; Luca Soldaini; Neil T. Heffernan; Kyle Lo – Grantee Submission, 2024
In real-world settings, vision language models (VLMs) should robustly handle naturalistic, noisy visual content as well as domain-specific language and concepts. For example, K-12 educators using digital learning platforms may need to examine and provide feedback across many images of students' math work. To assess the potential of VLMs to support…
Descriptors: Visual Learning, Visual Perception, Natural Language Processing, Freehand Drawing
Lu, Owen H. T.; Huang, Anna Y. Q.; Tsai, Danny C. L.; Yang, Stephen J. H. – Educational Technology & Society, 2021
Human-guided machine learning can improve computing intelligence, and it can accurately assist humans in various tasks. In education research, artificial intelligence (AI) is applicable in many situations, such as predicting students' learning paths and strategies. In this study, we explore the benefits of repetitive practice of short-answer…
Descriptors: Test Items, Artificial Intelligence, Test Construction, Student Evaluation
Qiao Wang; Ralph L. Rose; Ayaka Sugawara; Naho Orita – Vocabulary Learning and Instruction, 2025
VocQGen is an automated tool designed to generate multiple-choice cloze (MCC) questions for vocabulary assessment in second language learning contexts. It leverages several natural language processing (NLP) tools and OpenAI's GPT-4 model to produce MCC items quickly from user-specified word lists. To evaluate its effectiveness, we used the first…
Descriptors: Vocabulary Skills, Artificial Intelligence, Computer Software, Multiple Choice Tests
Pechenizkiy, Mykola; Calders, Toon; Conati, Cristina; Ventura, Sebastian; Romero, Cristobal; Stamper, John – International Working Group on Educational Data Mining, 2011
The 4th International Conference on Educational Data Mining (EDM 2011) brings together researchers from computer science, education, psychology, psychometrics, and statistics to analyze large datasets to answer educational research questions. The conference, held in Eindhoven, The Netherlands, July 6-9, 2011, follows the three previous editions…
Descriptors: Academic Achievement, Logical Thinking, Profiles, Tutoring