ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	7
Since 2017 (last 10 years)	10
Since 2007 (last 20 years)	12

Descriptor

Models	16
Test Items	16
Natural Language Processing	11
Artificial Intelligence	7
Test Construction	7
Automation	5
Language Processing	5
Prediction	5
Difficulty Level	4
Multiple Choice Tests	4
Bayesian Statistics	2
Classification	2
Coding	2
Comparative Analysis	2
Computer Assisted Testing	2
Computer Software	2
Correlation	2
Higher Education	2
Item Response Theory	2
Learning Processes	2
Listening Comprehension	2
Reading Comprehension	2
Regression (Statistics)	2
Science Tests	2
Student Evaluation	2
More ▼

Source

Grantee Submission	3
Journal of Educational…	2
ETS Research Report Series	1
Education and Information…	1
IEEE Transactions on Learning…	1
International Educational…	1
International Journal of…	1
International Working Group…	1
Journal of Applied Testing…	1
Journal of Computer Assisted…	1
Journal of Memory and Language	1
Journal of Speech, Language,…	1
Language Testing	1
More ▼

Publication Type

Reports - Research	14
Journal Articles	11
Speeches/Meeting Papers	3
Collected Works - Proceedings	1
Information Analyses	1
Numerical/Quantitative Data	1

Education Level

Higher Education	4
Postsecondary Education	2
Elementary Education	1
Secondary Education	1

Audience

Location

Germany	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

A Method for Generating Course Test Questions Based on Natural Language Processing and Deep Learning

Peer reviewed

Direct link

Hei-Chia Wang; Yu-Hung Chiang; I-Fan Chen – Education and Information Technologies, 2024

Assessment is viewed as an important means to understand learners' performance in the learning process. A good assessment method is based on high-quality examination questions. However, generating high-quality examination questions manually by teachers is a time-consuming task, and it is not easy for students to obtain question banks. To solve…

Descriptors: Natural Language Processing, Test Construction, Test Items, Models

Text-Based Question Difficulty Prediction: A Systematic Review of Automatic Approaches

Peer reviewed

Direct link

Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024

Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…

Descriptors: Test Items, Test Construction, Difficulty Level, Prediction

Generating Multiple Choice Questions with a Multi-Angle Question Answering Model

Peer reviewed
PDF on ERIC

Download full text

Direct link

Olney, Andrew M. – Grantee Submission, 2022

Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that…

Descriptors: Test Construction, Multiple Choice Tests, Test Items, Models

Automatic Short Answer Grading with SBERT on Out-of-Sample Questions

Peer reviewed
PDF on ERIC

Download full text

Condor, Aubrey; Litster, Max; Pardos, Zachary – International Educational Data Mining Society, 2021

We explore how different components of an Automatic Short Answer Grading (ASAG) model affect the model's ability to generalize to questions outside of those used for training. For supervised automatic grading models, human ratings are primarily used as ground truth labels. Producing such ratings can be resource heavy, as subject matter experts…

Descriptors: Automation, Grading, Test Items, Generalization

Generating Multiple Choice Questions from a Textbook: LLMs Match Human Performance on Most Metrics

Peer reviewed
PDF on ERIC

Download full text

Andrew M. Olney – Grantee Submission, 2023

Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled…

Descriptors: Test Construction, Multiple Choice Tests, Test Items, Algorithms

Using Machine Learning to Predict Bloom's Taxonomy Level for Certification Exam Items

Peer reviewed

Direct link

Mead, Alan D.; Zhou, Chenxuan – Journal of Applied Testing Technology, 2022

This study fit a Naïve Bayesian classifier to the words of exam items to predict the Bloom's taxonomy level of the items. We addressed five research questions, showing that reasonably good prediction of Bloom's level was possible, but accuracy varies across levels. In our study, performance for Level 2 was poor (Level 2 items were misclassified…

Descriptors: Artificial Intelligence, Prediction, Taxonomy, Natural Language Processing

Beyond Semantic Distance: Automated Scoring of Divergent Thinking Greatly Improves with Large Language Models

Peer reviewed
PDF on ERIC

Download full text

Direct link

Peter Organisciak; Selcuk Acar; Denis Dumas; Kelly Berthiaume – Grantee Submission, 2023

Automated scoring for divergent thinking (DT) seeks to overcome a key obstacle to creativity measurement: the effort, cost, and reliability of scoring open-ended tests. For a common test of DT, the Alternate Uses Task (AUT), the primary automated approach casts the problem as a semantic distance between a prompt and the resulting idea in a text…

Descriptors: Automation, Computer Assisted Testing, Scoring, Creative Thinking

Coding Energy Knowledge in Constructed Responses with Explainable NLP Models

Peer reviewed

Direct link

Gombert, Sebastian; Di Mitri, Daniele; Karademir, Onur; Kubsch, Marcus; Kolbe, Hannah; Tautz, Simon; Grimm, Adrian; Bohm, Isabell; Neumann, Knut; Drachsler, Hendrik – Journal of Computer Assisted Learning, 2023

Background: Formative assessments are needed to enable monitoring how student knowledge develops throughout a unit. Constructed response items which require learners to formulate their own free-text responses are well suited for testing their active knowledge. However, assessing such constructed responses in an automated fashion is a complex task…

Descriptors: Coding, Energy, Scientific Concepts, Formative Evaluation

Automatic Multiple Choice Question Generation From Text: A Survey

Peer reviewed

Direct link

Rao, Dhawaleswar; Saha, Sujan Kumar – IEEE Transactions on Learning Technologies, 2020

Automatic multiple choice question (MCQ) generation from a text is a popular research area. MCQs are widely accepted for large-scale assessment in various domains and applications. However, manual generation of MCQs is expensive and time-consuming. Therefore, researchers have been attracted toward automatic MCQ generation since the late 90's.…

Descriptors: Multiple Choice Tests, Test Construction, Automation, Computer Software

Probing the Relative Importance of Different Attributes in L2 Reading and Listening Comprehension Items: An Application of Cognitive Diagnostic Models

Peer reviewed

Direct link

Yi, Yeon-Sook – Language Testing, 2017

The present study examines the relative importance of attributes within and across items by applying four cognitive diagnostic assessment models. The current study utilizes the function of the models that can indicate inter-attribute relationships that reflect the response behaviors of examinees to analyze scored test-taker responses to four forms…

Descriptors: Second Language Learning, Reading Comprehension, Listening Comprehension, Language Tests

Simulating Language-Specific and Language-General Effects in a Statistical Learning Model of Chinese Reading

Peer reviewed

Direct link

Yang, Jianfeng; McCandliss, Bruce D.; Shu, Hua; Zevin, Jason D. – Journal of Memory and Language, 2009

Many theoretical models of reading assume that different writing systems require different processing assumptions. For example, it is often claimed that print-to-sound mappings in Chinese are not represented or processed sub-lexically. We present a connectionist model that learns the print-to-sound mappings of Chinese characters using the same…

Descriptors: Test Items, Speech, Models, Oral Language

Processing Differences as a Function of Item Difficulty in Verbal Analogy Performance.

Peer reviewed

Gitomer, Drew H.; And Others – Journal of Educational Psychology, 1987

Processing of verbal analogies was evaluated by recording eye fixation patterns during solution of problems that represented a broad range of difficulty. Findings on easier problems replicated previous work. On difficult items, high verbal ability individuals adapted processing strategies to a greater extent then did low ability students.…

Descriptors: Analogy, Difficulty Level, Eye Fixations, Higher Education

Cloze Procedure as a Memory-Search Process.

Peer reviewed

McKenna, Michael C. – Journal of Educational Psychology, 1986

The purpose of this study was to test a specific hypothesis with regard to how cloze retrieval takes place once semantic constraints are recognized. Results from human subjects and computer simulation suggested that increase in latency between the two- and three-semantic-constraint conditions was not artifactual. (JAZ)

Descriptors: Cloze Procedure, Computer Simulation, Correlation, Graduate Students

Rasch Modeling of Revised Token Test Performance: Validity and Sensitivity to Change

Peer reviewed

Direct link

Hula, William; Doyle, Patrick J.; McNeil, Malcolm R.; Mikolic, Joseph M. – Journal of Speech, Language, and Hearing Research, 2006

The purpose of this research was to examine the validity of the 55-item Revised Token Test (RTT) and to compare traditional and Rasch-based scores in their ability to detect group differences and change over time. The 55-item RTT was administered to 108 left- and right-hemisphere stroke survivors, and the data were submitted to Rasch analysis.…

Descriptors: Test Items, Brain Hemisphere Functions, Individual Differences, Difficulty Level

Inside Sourcefinder: Predicting the Acceptability Status of Candidate Reading-Comprehension Source Documents. Research Report. ETS RR-06-24

Peer reviewed
PDF on ERIC

Download full text

Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Hemat, Ramin; Zuckerman, Daniel – ETS Research Report Series, 2006

This paper describes the development, implementation, and evaluation of an automated system for predicting the acceptability status of candidate reading-comprehension stimuli extracted from a database of journal and magazine articles. The system uses a combination of classification and regression techniques to predict the probability that a given…

Descriptors: Automation, Prediction, Reading Comprehension, Classification

Previous Page | Next Page »

Pages: 1 | 2

Andrew M. Olney	1
Bohm, Isabell	1
Calders, Toon	1
Conati, Cristina	1
Condor, Aubrey	1
Denis Dumas	1
Di Mitri, Daniele	1
Doyle, Patrick J.	1
Drachsler, Hendrik	1
Floriana Grasso	1
Futagi, Yoko	1
Gitomer, Drew H.	1
Gombert, Sebastian	1
Grimm, Adrian	1
Hei-Chia Wang	1
Hemat, Ramin	1
Hula, William	1
I-Fan Chen	1
Karademir, Onur	1
Kelly Berthiaume	1
Kolbe, Hannah	1
Kostin, Irene	1
Kubsch, Marcus	1
Litster, Max	1
More ▼