Publication Date
In 2025 | 72 |
Since 2024 | 327 |
Since 2021 (last 5 years) | 1278 |
Since 2016 (last 10 years) | 3124 |
Since 2006 (last 20 years) | 6248 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Teachers | 480 |
Practitioners | 356 |
Researchers | 152 |
Administrators | 120 |
Policymakers | 51 |
Students | 44 |
Parents | 31 |
Counselors | 25 |
Community | 14 |
Media Staff | 5 |
Support Staff | 2 |
More ▼ |
Location
Australia | 182 |
Turkey | 153 |
California | 133 |
Canada | 122 |
New York | 116 |
United States | 112 |
Florida | 107 |
China | 98 |
Texas | 72 |
United Kingdom | 72 |
Japan | 68 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 5 |
Meets WWC Standards with or without Reservations | 11 |
Does not meet standards | 8 |
Jiayi Deng – ProQuest LLC, 2024
Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…
Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement
Louise Badham – Oxford Review of Education, 2025
Different sources of assessment evidence are reviewed during International Baccalaureate (IB) grade awarding to convert marks into grades and ensure fair results for students. Qualitative and quantitative evidence are analysed to determine grade boundaries, with statistical evidence weighed against examiner judgement and teachers' feedback on…
Descriptors: Advanced Placement Programs, Grading, Interrater Reliability, Evaluative Thinking
Naima Debbar – International Journal of Contemporary Educational Research, 2024
Intelligent systems of essay grading constitute important tools for educational technologies. They can significantly replace the manual scoring efforts and provide instructional feedback as well. These systems typically include two main parts: a feature extractor and an automatic grading model. The latter is generally based on computational and…
Descriptors: Test Scoring Machines, Computer Uses in Education, Artificial Intelligence, Essay Tests
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023
Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…
Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems
Ormerod, Christopher; Lottridge, Susan; Harris, Amy E.; Patel, Milan; van Wamelen, Paul; Kodeswaran, Balaji; Woolf, Sharon; Young, Mackenzie – International Journal of Artificial Intelligence in Education, 2023
We introduce a short answer scoring engine made up of an ensemble of deep neural networks and a Latent Semantic Analysis-based model to score short constructed responses for a large suite of questions from a national assessment program. We evaluate the performance of the engine and show that the engine achieves above-human-level performance on a…
Descriptors: Computer Assisted Testing, Scoring, Artificial Intelligence, Semantics
Dhini, Bachriah Fatwa; Girsang, Abba Suganda; Sufandi, Unggul Utan; Kurniawati, Heny – Asian Association of Open Universities Journal, 2023
Purpose: The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the…
Descriptors: Computer Assisted Testing, Scoring, Writing Evaluation, Essays
Makiko Kato – Journal of Education and Learning, 2025
This study aims to examine whether differences exist in the factors influencing the difficulty of scoring English summaries and determining scores based on the raters' attributes, and to collect candid opinions, considerations, and tentative suggestions for future improvements to the analytic rubric of summary writing for English learners. In this…
Descriptors: Writing Evaluation, Scoring, Writing Skills, English (Second Language)
Ariely, Moriah; Nazaretsky, Tanya; Alexandron, Giora – International Journal of Artificial Intelligence in Education, 2023
Machine learning algorithms that automatically score scientific explanations can be used to measure students' conceptual understanding, identify gaps in their reasoning, and provide them with timely and individualized feedback. This paper presents the results of a study that uses Hebrew NLP to automatically score student explanations in Biology…
Descriptors: Artificial Intelligence, Algorithms, Natural Language Processing, Hebrew
Heather D. Hussey; Tara Lehan; Kate McConnell – Learning Assistance Review, 2024
Rubrics (e.g., Valid Assessment of Learning in Undergraduate Education (VALUE) rubrics) that measure specific skills exist, and researchers have demonstrated their benefits; however, most of them were designed for use with undergraduate students. Although some rubrics have been created to assess dissertations and oral defenses, few have been…
Descriptors: Scoring Rubrics, Doctoral Programs, Doctoral Dissertations, Online Courses
Sophie Litschwartz – Society for Research on Educational Effectiveness, 2021
Background/Context: Pass/fail standardized exams frequently selectively rescore failing exams and retest failing examinees. This practice distorts the test score distribution and can confuse those who do analysis on these distributions. In 2011, the Wall Street Journal showed large discontinuities in the New York City Regent test score…
Descriptors: Standardized Tests, Pass Fail Grading, Scoring Rubrics, Scoring Formulas
Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021
For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…
Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests
Akif Avcu – Malaysian Online Journal of Educational Technology, 2025
This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…
Descriptors: Automation, Scoring, Models, Educational Assessment
Olaghere, Ajima; Wilson, David B.; Kimbrell, Catherine – Research Synthesis Methods, 2023
A diversity of approaches for critically appraising qualitative and quantitative evidence exist and emphasize different aspects. These approaches lack clear processes to facilitate rating the overall quality of the evidence for aggregated findings that combine qualitative and quantitative evidence. We draw on a meta-aggregation of implementation…
Descriptors: Evidence, Synthesis, Scoring Rubrics, Standardized Tests