Publication Date
| In 2026 | 0 |
| Since 2025 | 215 |
| Since 2022 (last 5 years) | 1084 |
| Since 2017 (last 10 years) | 2594 |
| Since 2007 (last 20 years) | 4955 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Joshua B. Gilbert; Zachary Himmelsbach; Luke W. Miratrix; Andrew D. Ho; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how…
Descriptors: Value Added Models, Reliability, Effect Size, Test Items
Owen Henkel; Libby Hills; Bill Roberts; Joshua McGrane – International Journal of Artificial Intelligence in Education, 2025
Formative assessment plays a critical role in improving learning outcomes by providing feedback on student mastery. Open-ended questions, which require students to produce multi-word, nontrivial responses, are a popular tool for formative assessment as they provide more specific insights into what students do and do not know. However, grading…
Descriptors: Artificial Intelligence, Grading, Reading Comprehension, Natural Language Processing
Anupkumar D. Dhanvijay; Amita Kumari; Mohammed Jaffer Pinjar; Anita Kumari; Abhimanyu Ganguly; Ankita Priya; Ayesha Juhi; Pratima Gupta; Himel Mondal – Advances in Physiology Education, 2025
Multiple-choice questions (MCQs) are widely used for assessment in medical education. While human-generated MCQs benefit from pedagogical insight, creating high-quality items is time intensive. With the advent of artificial intelligence (AI), tools like DeepSeek R1 offer potential for automated MCQ generation, though their educational validity…
Descriptors: Multiple Choice Tests, Physiology, Artificial Intelligence, Test Items
Hotaka Maeda; Yikai Lu – Journal of Educational Measurement, 2025
We fine-tuned and compared several encoder-based Transformer large language models (LLM) to predict differential item functioning (DIF) from the item text. We then applied explainable artificial intelligence (XAI) methods to identify specific words associated with the DIF prediction. The data included 42,180 items designed for English language…
Descriptors: Artificial Intelligence, Prediction, Test Bias, Test Items
Mingfeng Xue; Mark Wilson – Applied Measurement in Education, 2024
Multidimensionality is common in psychological and educational measurements. This study focuses on dimensions that converge at the upper anchor (i.e. the highest acquisition status defined in a learning progression) and compares different ways of dealing with them using the multidimensional random coefficients multinomial logit model and scale…
Descriptors: Learning Trajectories, Educational Assessment, Item Response Theory, Evolution
James D. Weese; Ronna C. Turner; Allison Ames; Xinya Liang; Brandon Crawford – Journal of Experimental Education, 2024
In this study a standardized effect size was created for use with the SIBTEST procedure. Using this standardized effect size, a single set of heuristics was developed that are appropriate for data fitting different item response models (e.g., 2-parameter logistic, 3-parameter logistic). The standardized effect size rescales the raw beta-uni value…
Descriptors: Test Bias, Test Items, Item Response Theory, Effect Size
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Kuan-Yu Jin; Thomas Eckes – Educational and Psychological Measurement, 2024
Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable…
Descriptors: Item Response Theory, Test Items, Test Wiseness, Surveys
Gregory M. Hurtz; Regi Mucino – Journal of Educational Measurement, 2024
The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we…
Descriptors: Student Reaction, Reaction Time, Response Style (Tests), Test Items
Wuji Lin; Chenxi Lv; Jiejie Liao; Yuan Hu; Yutong Liu; Jingyuan Lin – npj Science of Learning, 2024
The debate about whether the capacity of working memory (WM) varies with the complexity of memory items continues. This study employed novel experimental materials to investigate the role of complexity in WM capacity. Across seven experiments, we explored the relationship between complexity and WM capacity. The results indicated that the…
Descriptors: Short Term Memory, Difficulty Level, Retention (Psychology), Test Items
Peer reviewedAndreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2024
Assessing the difficulty of reading comprehension questions is crucial to educational methodologies and language understanding technologies. Traditional methods of assessing question difficulty rely frequently on human judgments or shallow metrics, often failing to accurately capture the intricate cognitive demands of answering a question. This…
Descriptors: Difficulty Level, Reading Tests, Test Items, Reading Comprehension
Jochen Ranger; Christoph König; Benjamin W. Domingue; Jörg-Tobias Kuhn; Andreas Frey – Journal of Educational and Behavioral Statistics, 2024
In the existing multidimensional extensions of the log-normal response time (LNRT) model, the log response times are decomposed into a linear combination of several latent traits. These models are fully compensatory as low levels on traits can be counterbalanced by high levels on other traits. We propose an alternative multidimensional extension…
Descriptors: Models, Statistical Distributions, Item Response Theory, Response Rates (Questionnaires)
Guher Gorgun; Okan Bulut – Education and Information Technologies, 2024
In light of the widespread adoption of technology-enhanced learning and assessment platforms, there is a growing demand for innovative, high-quality, and diverse assessment questions. Automatic Question Generation (AQG) has emerged as a valuable solution, enabling educators and assessment developers to efficiently produce a large volume of test…
Descriptors: Computer Assisted Testing, Test Construction, Test Items, Automation
Sam von Gillern; Chad Rose; Amy Hutchison – British Journal of Educational Technology, 2024
As teachers are purveyors of digital citizenship and their perspectives influence classroom practice, it is important to understand teachers' views on digital citizenship. This study establishes the Teachers' Perceptions of Digital Citizenship Scale (T-PODS) as a survey instrument for scholars to investigate educators' views on digital citizenship…
Descriptors: Citizenship, Digital Literacy, Teacher Attitudes, Test Items
Kofi Nkonkonya Mpuangnan – Review of Education, 2024
Assessment practices play a crucial role in fostering student learning and guiding instructional decision-making. The ability to construct effective test items is of utmost importance in evaluating student learning and shaping instructional strategies. This study aims to investigate the skills of Ghanaian basic schoolteachers in test item…
Descriptors: Test Items, Test Construction, Student Evaluation, Foreign Countries

Direct link
