NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 106 to 120 of 9,547 results Save | Export
Joshua B. Gilbert; Zachary Himmelsbach; Luke W. Miratrix; Andrew D. Ho; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how…
Descriptors: Value Added Models, Reliability, Effect Size, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Owen Henkel; Libby Hills; Bill Roberts; Joshua McGrane – International Journal of Artificial Intelligence in Education, 2025
Formative assessment plays a critical role in improving learning outcomes by providing feedback on student mastery. Open-ended questions, which require students to produce multi-word, nontrivial responses, are a popular tool for formative assessment as they provide more specific insights into what students do and do not know. However, grading…
Descriptors: Artificial Intelligence, Grading, Reading Comprehension, Natural Language Processing
Peer reviewed Peer reviewed
Direct linkDirect link
Anupkumar D. Dhanvijay; Amita Kumari; Mohammed Jaffer Pinjar; Anita Kumari; Abhimanyu Ganguly; Ankita Priya; Ayesha Juhi; Pratima Gupta; Himel Mondal – Advances in Physiology Education, 2025
Multiple-choice questions (MCQs) are widely used for assessment in medical education. While human-generated MCQs benefit from pedagogical insight, creating high-quality items is time intensive. With the advent of artificial intelligence (AI), tools like DeepSeek R1 offer potential for automated MCQ generation, though their educational validity…
Descriptors: Multiple Choice Tests, Physiology, Artificial Intelligence, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Hotaka Maeda; Yikai Lu – Journal of Educational Measurement, 2025
We fine-tuned and compared several encoder-based Transformer large language models (LLM) to predict differential item functioning (DIF) from the item text. We then applied explainable artificial intelligence (XAI) methods to identify specific words associated with the DIF prediction. The data included 42,180 items designed for English language…
Descriptors: Artificial Intelligence, Prediction, Test Bias, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Mingfeng Xue; Mark Wilson – Applied Measurement in Education, 2024
Multidimensionality is common in psychological and educational measurements. This study focuses on dimensions that converge at the upper anchor (i.e. the highest acquisition status defined in a learning progression) and compares different ways of dealing with them using the multidimensional random coefficients multinomial logit model and scale…
Descriptors: Learning Trajectories, Educational Assessment, Item Response Theory, Evolution
Peer reviewed Peer reviewed
Direct linkDirect link
James D. Weese; Ronna C. Turner; Allison Ames; Xinya Liang; Brandon Crawford – Journal of Experimental Education, 2024
In this study a standardized effect size was created for use with the SIBTEST procedure. Using this standardized effect size, a single set of heuristics was developed that are appropriate for data fitting different item response models (e.g., 2-parameter logistic, 3-parameter logistic). The standardized effect size rescales the raw beta-uni value…
Descriptors: Test Bias, Test Items, Item Response Theory, Effect Size
Peer reviewed Peer reviewed
Direct linkDirect link
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Peer reviewed Peer reviewed
Direct linkDirect link
Kuan-Yu Jin; Thomas Eckes – Educational and Psychological Measurement, 2024
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable…
Descriptors: Item Response Theory, Test Items, Test Wiseness, Surveys
Peer reviewed Peer reviewed
Direct linkDirect link
Gregory M. Hurtz; Regi Mucino – Journal of Educational Measurement, 2024
The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we…
Descriptors: Student Reaction, Reaction Time, Response Style (Tests), Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Wuji Lin; Chenxi Lv; Jiejie Liao; Yuan Hu; Yutong Liu; Jingyuan Lin – npj Science of Learning, 2024
The debate about whether the capacity of working memory (WM) varies with the complexity of memory items continues. This study employed novel experimental materials to investigate the role of complexity in WM capacity. Across seven experiments, we explored the relationship between complexity and WM capacity. The results indicated that the…
Descriptors: Short Term Memory, Difficulty Level, Retention (Psychology), Test Items
Peer reviewed Peer reviewed
Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2024
Assessing the difficulty of reading comprehension questions is crucial to educational methodologies and language understanding technologies. Traditional methods of assessing question difficulty rely frequently on human judgments or shallow metrics, often failing to accurately capture the intricate cognitive demands of answering a question. This…
Descriptors: Difficulty Level, Reading Tests, Test Items, Reading Comprehension
Peer reviewed Peer reviewed
Direct linkDirect link
Jochen Ranger; Christoph König; Benjamin W. Domingue; Jörg-Tobias Kuhn; Andreas Frey – Journal of Educational and Behavioral Statistics, 2024
In the existing multidimensional extensions of the log-normal response time (LNRT) model, the log response times are decomposed into a linear combination of several latent traits. These models are fully compensatory as low levels on traits can be counterbalanced by high levels on other traits. We propose an alternative multidimensional extension…
Descriptors: Models, Statistical Distributions, Item Response Theory, Response Rates (Questionnaires)
Peer reviewed Peer reviewed
Direct linkDirect link
Guher Gorgun; Okan Bulut – Education and Information Technologies, 2024
In light of the widespread adoption of technology-enhanced learning and assessment platforms, there is a growing demand for innovative, high-quality, and diverse assessment questions. Automatic Question Generation (AQG) has emerged as a valuable solution, enabling educators and assessment developers to efficiently produce a large volume of test…
Descriptors: Computer Assisted Testing, Test Construction, Test Items, Automation
Peer reviewed Peer reviewed
Direct linkDirect link
Sam von Gillern; Chad Rose; Amy Hutchison – British Journal of Educational Technology, 2024
As teachers are purveyors of digital citizenship and their perspectives influence classroom practice, it is important to understand teachers' views on digital citizenship. This study establishes the Teachers' Perceptions of Digital Citizenship Scale (T-PODS) as a survey instrument for scholars to investigate educators' views on digital citizenship…
Descriptors: Citizenship, Digital Literacy, Teacher Attitudes, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Kofi Nkonkonya Mpuangnan – Review of Education, 2024
Assessment practices play a crucial role in fostering student learning and guiding instructional decision-making. The ability to construct effective test items is of utmost importance in evaluating student learning and shaping instructional strategies. This study aims to investigate the skills of Ghanaian basic schoolteachers in test item…
Descriptors: Test Items, Test Construction, Student Evaluation, Foreign Countries
Pages: 1  |  ...  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  12  |  ...  |  637