Publication Date
In 2025 | 0 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 43 |
Since 2016 (last 10 years) | 102 |
Since 2006 (last 20 years) | 159 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 151 |
Practitioners | 20 |
Teachers | 14 |
Administrators | 2 |
Counselors | 2 |
Policymakers | 1 |
Students | 1 |
Location
Australia | 18 |
Canada | 11 |
Netherlands | 10 |
Turkey | 8 |
United States | 8 |
Germany | 6 |
Israel | 6 |
Texas | 4 |
United Kingdom (England) | 4 |
Virginia | 4 |
California | 3 |
More ▼ |
Laws, Policies, & Programs
Comprehensive Education… | 2 |
Elementary and Secondary… | 2 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating

Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2024
Assessing the difficulty of reading comprehension questions is crucial to educational methodologies and language understanding technologies. Traditional methods of assessing question difficulty rely frequently on human judgments or shallow metrics, often failing to accurately capture the intricate cognitive demands of answering a question. This…
Descriptors: Difficulty Level, Reading Tests, Test Items, Reading Comprehension
Kaldes, Gal; Tighe, Elizabeth; He, Qiwei – AERA Online Paper Repository, 2023
This study used PIAAC process data to examine time-related allocation patterns (time for the first action, total time, last action) of low-skilled, relative to higher-skilled, adults on digital literacy items. Results suggest that less-skilled (Level 2) and higher skilled adults (Levels 3-5) exhibited similar time allocation patterns; however,…
Descriptors: Time Management, Literacy Education, Adult Literacy, Adult Education
Olney, Andrew M. – Grantee Submission, 2022
Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Models
Plackner, Christie; Kim, Dong-In – Online Submission, 2022
The application of item response theory (IRT) is almost universal in the development, implementation, and maintenance of large-scale assessments. Therefore, establishing the fit of IRT models to data is essential as the viability of calibration and equating implementations depend on it. In a typical test administration situation, measurement…
Descriptors: COVID-19, Pandemics, Item Response Theory, Goodness of Fit
Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model
Custer, Michael; Kim, Jongpil – Online Submission, 2023
This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…
Descriptors: Sample Size, Item Response Theory, Test Items, Computation
Jahangeer Mohamed Jahabar; Toh Tin Lam; Tay Eng Guan; Tong Cherng Luen – Mathematics Education Research Group of Australasia, 2024
Big Ideas can be seen as overarching concepts that occur in various mathematical topics and strands within a syllabus. Within our project on Big Ideas in School Mathematics, we developed instruments to measure two Big Ideas: Equivalence and Proportionality. The instruments we developed seek to assess students' ability to see these Big Ideas as…
Descriptors: Mathematical Concepts, Mathematics Tests, Test Items, Test Construction
Condor, Aubrey; Litster, Max; Pardos, Zachary – International Educational Data Mining Society, 2021
We explore how different components of an Automatic Short Answer Grading (ASAG) model affect the model's ability to generalize to questions outside of those used for training. For supervised automatic grading models, human ratings are primarily used as ground truth labels. Producing such ratings can be resource heavy, as subject matter experts…
Descriptors: Automation, Grading, Test Items, Generalization
Andrew M. Olney – Grantee Submission, 2023
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Algorithms
Hanif Akhtar – International Society for Technology, Education, and Science, 2023
For efficiency, Computerized Adaptive Test (CAT) algorithm selects items with the maximum information, typically with a 50% probability of being answered correctly. However, examinees may not be satisfied if they only correctly answer 50% of the items. Researchers discovered that changing the item selection algorithms to choose easier items (i.e.,…
Descriptors: Success, Probability, Computer Assisted Testing, Adaptive Testing
Zur, Amir; Applebaum, Isaac; Nardo, Jocelyn Elizabeth; DeWeese, Dory; Sundrani, Sameer; Salehi, Shima – International Educational Data Mining Society, 2023
Detailed learning objectives foster an effective and equitable learning environment by clarifying what instructors expect students to learn, rather than requiring students to use prior knowledge to infer these expectations. When questions are labeled with relevant learning goals, students understand which skills are tested by those questions.…
Descriptors: Equal Education, Prior Learning, Educational Objectives, Chemistry
Chioma C. Ezeh – AERA Online Paper Repository, 2023
Culturally relevant assessments (CRA) account for multiple socio-cultural identities, experiences, and values that mediate how students know, think, and respond to test items. Given the diversity of modern classrooms, it is critical that education researchers and practitioners understand and strive to implement CRA practices. This systematic…
Descriptors: Educational Practices, Culturally Relevant Education, Culture Fair Tests, Classroom Techniques
Michelle Cheung; Bronwyn Reid O’Connor; Ben Zunica – Mathematics Education Research Group of Australasia, 2024
Progressing from additive to multiplicative thinking is a key outcome of school mathematics, making ratios an essential topic of study in junior secondary. In this study, 15 Australian Year 8 students were administered a ratio test followed by semi-structured interviews to explore their conceptions of ratio prior to formal instruction. In this…
Descriptors: Secondary School Students, Mathematics Instruction, Foreign Countries, Multiplication
Xue, Kang; Huggins-Manley, Anne Corinne; Leite, Walter – Grantee Submission, 2020
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of…
Descriptors: Virtual Classrooms, Item Response Theory, Test Bias, Test Items
Martha L. Epstein; Hamza Malik; Kun Wang; Chandra Hawley Orrill – Grantee Submission, 2022
Response Process Validity (RPV) reflects the degree to which items are interpreted as intended by item developers. In this study, teacher responses to constructed response (CR) items to assess pedagogical content knowledge (PCK) of middle school mathematics teachers were evaluated to determine what types of teacher responses signaled weak RPV. We…
Descriptors: Teacher Response, Test Items, Pedagogical Content Knowledge, Mathematics Teachers
Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023
Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…
Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests