Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 7 |
| Since 2017 (last 10 years) | 10 |
| Since 2007 (last 20 years) | 12 |
Descriptor
| Models | 16 |
| Test Items | 16 |
| Natural Language Processing | 11 |
| Artificial Intelligence | 7 |
| Test Construction | 7 |
| Automation | 5 |
| Language Processing | 5 |
| Prediction | 5 |
| Difficulty Level | 4 |
| Multiple Choice Tests | 4 |
| Bayesian Statistics | 2 |
| More ▼ | |
Source
Author
| Andrew M. Olney | 1 |
| Bohm, Isabell | 1 |
| Calders, Toon | 1 |
| Conati, Cristina | 1 |
| Condor, Aubrey | 1 |
| Denis Dumas | 1 |
| Di Mitri, Daniele | 1 |
| Doyle, Patrick J. | 1 |
| Drachsler, Hendrik | 1 |
| Floriana Grasso | 1 |
| Futagi, Yoko | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 14 |
| Journal Articles | 11 |
| Speeches/Meeting Papers | 3 |
| Collected Works - Proceedings | 1 |
| Information Analyses | 1 |
| Numerical/Quantitative Data | 1 |
Audience
Location
| Germany | 1 |
| Netherlands | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Graduate Record Examinations | 1 |
| Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
A Method for Generating Course Test Questions Based on Natural Language Processing and Deep Learning
Hei-Chia Wang; Yu-Hung Chiang; I-Fan Chen – Education and Information Technologies, 2024
Assessment is viewed as an important means to understand learners' performance in the learning process. A good assessment method is based on high-quality examination questions. However, generating high-quality examination questions manually by teachers is a time-consuming task, and it is not easy for students to obtain question banks. To solve…
Descriptors: Natural Language Processing, Test Construction, Test Items, Models
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Olney, Andrew M. – Grantee Submission, 2022
Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Models
Condor, Aubrey; Litster, Max; Pardos, Zachary – International Educational Data Mining Society, 2021
We explore how different components of an Automatic Short Answer Grading (ASAG) model affect the model's ability to generalize to questions outside of those used for training. For supervised automatic grading models, human ratings are primarily used as ground truth labels. Producing such ratings can be resource heavy, as subject matter experts…
Descriptors: Automation, Grading, Test Items, Generalization
Andrew M. Olney – Grantee Submission, 2023
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Algorithms
Mead, Alan D.; Zhou, Chenxuan – Journal of Applied Testing Technology, 2022
This study fit a Naïve Bayesian classifier to the words of exam items to predict the Bloom's taxonomy level of the items. We addressed five research questions, showing that reasonably good prediction of Bloom's level was possible, but accuracy varies across levels. In our study, performance for Level 2 was poor (Level 2 items were misclassified…
Descriptors: Artificial Intelligence, Prediction, Taxonomy, Natural Language Processing
Peter Organisciak; Selcuk Acar; Denis Dumas; Kelly Berthiaume – Grantee Submission, 2023
Automated scoring for divergent thinking (DT) seeks to overcome a key obstacle to creativity measurement: the effort, cost, and reliability of scoring open-ended tests. For a common test of DT, the Alternate Uses Task (AUT), the primary automated approach casts the problem as a semantic distance between a prompt and the resulting idea in a text…
Descriptors: Automation, Computer Assisted Testing, Scoring, Creative Thinking
Gombert, Sebastian; Di Mitri, Daniele; Karademir, Onur; Kubsch, Marcus; Kolbe, Hannah; Tautz, Simon; Grimm, Adrian; Bohm, Isabell; Neumann, Knut; Drachsler, Hendrik – Journal of Computer Assisted Learning, 2023
Background: Formative assessments are needed to enable monitoring how student knowledge develops throughout a unit. Constructed response items which require learners to formulate their own free-text responses are well suited for testing their active knowledge. However, assessing such constructed responses in an automated fashion is a complex task…
Descriptors: Coding, Energy, Scientific Concepts, Formative Evaluation
Rao, Dhawaleswar; Saha, Sujan Kumar – IEEE Transactions on Learning Technologies, 2020
Automatic multiple choice question (MCQ) generation from a text is a popular research area. MCQs are widely accepted for large-scale assessment in various domains and applications. However, manual generation of MCQs is expensive and time-consuming. Therefore, researchers have been attracted toward automatic MCQ generation since the late 90's.…
Descriptors: Multiple Choice Tests, Test Construction, Automation, Computer Software
Yi, Yeon-Sook – Language Testing, 2017
The present study examines the relative importance of attributes within and across items by applying four cognitive diagnostic assessment models. The current study utilizes the function of the models that can indicate inter-attribute relationships that reflect the response behaviors of examinees to analyze scored test-taker responses to four forms…
Descriptors: Second Language Learning, Reading Comprehension, Listening Comprehension, Language Tests
Yang, Jianfeng; McCandliss, Bruce D.; Shu, Hua; Zevin, Jason D. – Journal of Memory and Language, 2009
Many theoretical models of reading assume that different writing systems require different processing assumptions. For example, it is often claimed that print-to-sound mappings in Chinese are not represented or processed sub-lexically. We present a connectionist model that learns the print-to-sound mappings of Chinese characters using the same…
Descriptors: Test Items, Speech, Models, Oral Language
Peer reviewedGitomer, Drew H.; And Others – Journal of Educational Psychology, 1987
Processing of verbal analogies was evaluated by recording eye fixation patterns during solution of problems that represented a broad range of difficulty. Findings on easier problems replicated previous work. On difficult items, high verbal ability individuals adapted processing strategies to a greater extent then did low ability students.…
Descriptors: Analogy, Difficulty Level, Eye Fixations, Higher Education
Peer reviewedMcKenna, Michael C. – Journal of Educational Psychology, 1986
The purpose of this study was to test a specific hypothesis with regard to how cloze retrieval takes place once semantic constraints are recognized. Results from human subjects and computer simulation suggested that increase in latency between the two- and three-semantic-constraint conditions was not artifactual. (JAZ)
Descriptors: Cloze Procedure, Computer Simulation, Correlation, Graduate Students
Hula, William; Doyle, Patrick J.; McNeil, Malcolm R.; Mikolic, Joseph M. – Journal of Speech, Language, and Hearing Research, 2006
The purpose of this research was to examine the validity of the 55-item Revised Token Test (RTT) and to compare traditional and Rasch-based scores in their ability to detect group differences and change over time. The 55-item RTT was administered to 108 left- and right-hemisphere stroke survivors, and the data were submitted to Rasch analysis.…
Descriptors: Test Items, Brain Hemisphere Functions, Individual Differences, Difficulty Level
Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Hemat, Ramin; Zuckerman, Daniel – ETS Research Report Series, 2006
This paper describes the development, implementation, and evaluation of an automated system for predicting the acceptability status of candidate reading-comprehension stimuli extracted from a database of journal and magazine articles. The system uses a combination of classification and regression techniques to predict the probability that a given…
Descriptors: Automation, Prediction, Reading Comprehension, Classification
Previous Page | Next Page »
Pages: 1 | 2
Direct link
