Publication Date
In 2025 | 3 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 29 |
Since 2006 (last 20 years) | 51 |
Descriptor
Evaluation Methods | 96 |
Test Reliability | 96 |
Test Validity | 55 |
Scoring | 48 |
Scoring Rubrics | 38 |
Student Evaluation | 31 |
Interrater Reliability | 20 |
Test Construction | 16 |
Foreign Countries | 15 |
Elementary Secondary Education | 14 |
Evaluation Criteria | 14 |
More ▼ |
Source
Author
Crawford, Angela R. | 2 |
Gearhart, Maryl | 2 |
Johnson, Evelyn S. | 2 |
Kane, Thomas J. | 2 |
Koretz, Daniel | 2 |
Moylan, Laura A. | 2 |
Novak, John R. | 2 |
Staiger, Douglas O. | 2 |
Zheng, Yuzhu | 2 |
Ackerman, Debra J. | 1 |
Aghbar, Ali-Asghar | 1 |
More ▼ |
Publication Type
Education Level
Location
United Kingdom (England) | 3 |
California | 2 |
Colorado (Denver) | 2 |
North Carolina (Charlotte) | 2 |
Tennessee (Memphis) | 2 |
Vermont | 2 |
Arkansas (Little Rock) | 1 |
Australia | 1 |
Canada | 1 |
Croatia | 1 |
Europe | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
National Assessment of… | 3 |
Advanced Placement… | 2 |
Childrens Depression Inventory | 1 |
Graduate Record Examinations | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Mohammad Hmoud; Hadeel Swaity; Eman Anjass; Eva María Aguaded-Ramírez – Electronic Journal of e-Learning, 2024
This research aimed to develop and validate a rubric to assess Artificial Intelligence (AI) chatbots' effectiveness in accomplishing tasks, particularly within educational contexts. Given the rapidly growing integration of AI in various sectors, including education, a systematic and robust tool for evaluating AI chatbot performance is essential.…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Flor de Lis González-Mujico – Education and Information Technologies, 2024
Over the past decade, self-assessment tools have garnered significant attention in the interest of measuring the skillset required by educators and students to function productively and ethically in digitally mediated environments, particularly in relation to education policy implementation. Since stated beliefs do not always align with actual…
Descriptors: Technological Literacy, Evaluation Methods, Test Validity, Test Construction
Swapneel Thite; Jayashri Ravishankar; Inmaculada Tomeo-Reyes; Araceli Martinez Ortiz – European Journal of Engineering Education, 2024
Effectively working in an engineering workplace requires strong teamwork skills, yet the existing literature within various disciplines reveals discrepancies in evaluating these skills. This complicates the design of a generic teamwork peer evaluation tool for engineering students. This study aims to address this gap by introducing the DRIVE…
Descriptors: Scoring Rubrics, Evaluation Methods, Peer Evaluation, Teamwork
Begicevic Redjep, Nina; Balaban, Igor; Zugec, Bojan – Technology, Pedagogy and Education, 2021
The European Commission emphasises the need for educational institutions to integrate digital technologies in their teaching, learning and organisational practices. This study contributes to the field of digital transformation of schools by proposing and validating a Framework for Digitally Mature Schools (FDMS) and an instrument for assessing the…
Descriptors: Technology Integration, Information Technology, Program Evaluation, Educational Assessment
Amirhossein Rasooli; Jim Turner; Tünde Varga-Atkins; Edd Pitt; Shaghayegh Asgari; Will Moindrot – Assessment & Evaluation in Higher Education, 2025
Groupwork is a crucial aspect of work contexts and a key twenty first century skill. Assessment of groupwork provides a persistent challenge for educators in university contexts with students reporting experiences of unfairness from their peers during groupwork. This study developed a novel Peer Assessment Fairness Instrument to explore factors…
Descriptors: Foreign Countries, Undergraduate Students, Student Attitudes, College Faculty
Çifci, Musa; Kaplan, Kadir – Journal of Language and Linguistic Studies, 2020
This study aimed to develop "Caricature Creation Rubric" which can be used to evaluate the products produced by 6th grade students at the end of their caricature creation process and to make its validity and reliability studies. The criteria in the graded key were determined by using the "Caricature Literacy Module" prepared by…
Descriptors: Cartoons, Scoring Rubrics, Evaluation Methods, Student Evaluation
Wenjing Guo – ProQuest LLC, 2021
Constructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees' work. The judgment of human…
Descriptors: National Competency Tests, Responses, Interrater Reliability, Error of Measurement
Knowing and Doing: The Development of Information Literacy Measures to Assess Knowledge and Practice
Nierenberg, Ellen; Låg, Torstein; Dahl, Tove Irene – Journal of Information Literacy, 2021
This study touches upon three major themes in the field of information literacy (IL): the assessment of IL, the association between IL knowledge and skills, and the dimensionality of the IL construct. Three quantitative measures were developed and tested with several samples of university students to assess knowledge and skills for core facets of…
Descriptors: Information Literacy, College Students, Evaluation Methods, Knowledge Level
Cesur, Kursat – Educational Policy Analysis and Strategic Research, 2019
Examinees' performances are assessed using a wide variety of different techniques. Multiple-choice (MC) tests are among the most frequently used ones. Nearly, all standardized achievement tests make use of MC test items and there is a variety of ways to score these tests. The study compares number right and liberal scoring (SAC) methods. Mixed…
Descriptors: Multiple Choice Tests, Scoring, Evaluation Methods, Guessing (Tests)
Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J. – Educational Assessment, 2017
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Descriptors: Scores, Test Construction, Test Reliability, Test Validity
Feldman, Jo – Educational Leadership, 2018
Have teachers become too dependent on points? This article explores educators' dependency on their points systems, and the ways that points can distract teachers from really analyzing students' capabilities and achievements. Feldman argues that using a more subjective grading system can help illuminate crucial information about students and what…
Descriptors: Grading, Evaluation Methods, Evaluation Criteria, Achievement Rating
Developing a High Performance Digital Education Ecosystem: Institutional Self-Assessment Instruments
Volungeviciene, Airina; Brown, Mark; Greenspon, Rasa; Gaebel, Michael; Morrisroe, Alison – European University Association, 2021
Digitally enhanced learning and teaching is widely used across the European Higher Education Area, with general acceptance growing over the years and institutions widely acknowledging the benefits it brings to the student experience. The strategic focus being placed on digitally enhanced learning and teaching has increased, undoubtedly accelerated…
Descriptors: Educational Technology, Technology Uses in Education, Program Evaluation, Self Evaluation (Groups)