Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 14 |
| Since 2007 (last 20 years) | 36 |
Descriptor
| Reliability | 46 |
| Statistical Analysis | 46 |
| Scoring | 27 |
| Validity | 17 |
| Scoring Rubrics | 15 |
| Foreign Countries | 14 |
| Comparative Analysis | 9 |
| Student Evaluation | 9 |
| Correlation | 8 |
| Evaluation Methods | 8 |
| Interrater Reliability | 7 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 12 |
| Postsecondary Education | 11 |
| Secondary Education | 9 |
| Junior High Schools | 5 |
| Middle Schools | 4 |
| Elementary Education | 3 |
| Elementary Secondary Education | 3 |
| Grade 7 | 3 |
| Adult Education | 1 |
| Grade 3 | 1 |
| Grade 5 | 1 |
| More ▼ | |
Audience
| Parents | 1 |
| Practitioners | 1 |
| Teachers | 1 |
Location
| Canada | 2 |
| China | 2 |
| New York | 2 |
| Nigeria | 2 |
| Australia | 1 |
| California | 1 |
| Egypt | 1 |
| Florida | 1 |
| Illinois | 1 |
| Malaysia | 1 |
| Massachusetts | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Flesch Kincaid Grade Level… | 1 |
| Massachusetts Comprehensive… | 1 |
| Praxis Series | 1 |
| Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Guy B. deBrun – Journal of Outdoor Recreation, Education, and Leadership, 2025
Discussions of what it means to be an effective outdoor leader are common in outdoor education literature (Martin et al., 2025; Smith, 2021). Research has identified core competencies (Martin et al., 2025), conceptual frameworks (Pomfret et al., 2023), and course curricula/qualifications for effective leadership (Baker & O'Brien, 2019; Seaman…
Descriptors: Outdoor Leadership, Leadership Effectiveness, Evaluation Methods, Scoring Rubrics
Saito, Daisuke; Yajima, Risei; Washizaki, Hironori; Fukazawa, Yoshiaki – Education Sciences, 2021
In evaluating the learning achievement of programming-thinking skills, the method of using a rubric that describes evaluation items and evaluation stages is widely employed. However, few studies have evaluated the reliability, validity, and consistency of the rubrics themselves. In this study, we introduced a statistical method for evaluating the…
Descriptors: Scoring Rubrics, Computer Science Education, Programming, Reliability
Pérez-Ferreirós, Alexandra; Kalén, Anton; Gómez, Miguel-Ángel; Rey, Ezequiel – Research Quarterly for Exercise and Sport, 2019
In basketball, game-related statistics are the most common measure of performance. However, the literature assessing their reliability is scarce. Purpose: Analyze the number of games required to obtain a good relative and absolute reliability of teams' game-related statistics. Method: A total of 884 games from the 2015-2016 to 2017-2018 seasons of…
Descriptors: Team Sports, Statistics, Reliability, Foreign Countries
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability
Cetin, Bayram; Guler, Nese; Sarica, Rabia – Eurasian Journal of Educational Research, 2016
Problem Statement: In addition to being teaching tools, concept maps can be used as effective assessment tools. The use of concept maps for assessment has raised the issue of scoring them. Concept maps generated and used in different ways can be scored via various methods. Holistic and relational scoring methods are two of them. Purpose of the…
Descriptors: Generalizability Theory, Concept Mapping, Scoring, Scoring Formulas
Menéndez-Varela, José-Luis; Gregori-Giralt, Eva – Assessment & Evaluation in Higher Education, 2018
Rubrics are widely used in higher education to assess performance in project-based learning environments. To date, the sources of error that may affect their reliability have not been studied in depth. Using generalisability theory as its starting-point, this article analyses the influence of the assessors and the criteria of the rubrics on the…
Descriptors: Scoring Rubrics, Student Projects, Active Learning, Reliability
Britton, Emily; Simper, Natalie; Leger, Andrew; Stephenson, Jenn – Assessment & Evaluation in Higher Education, 2017
Effective teamwork skills are essential for success in an increasingly team-based workplace. However, research suggests that there is often confusion concerning how teamwork is measured and assessed, making it difficult to develop these skills in undergraduate curricula. The goal of the present study was to develop a sustainable tool for assessing…
Descriptors: Teamwork, Undergraduate Students, Skills, Student Evaluation
Ebuoh, Casmir N. – World Journal of Education, 2018
Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…
Descriptors: Holistic Approach, Scoring, Essay Tests, Biology
Benton, Tom; Elliott, Gill – Research Papers in Education, 2016
In recent years the use of expert judgement to set and maintain examination standards has been increasingly criticised in favour of approaches based on statistical modelling. This paper reviews existing research on this controversy and attempts to unify the evidence within a framework where expertise is utilised in the form of comparative…
Descriptors: Reliability, Expertise, Mathematical Models, Standard Setting (Scoring)
Levine, William H.; Betzner, Michelle; Autry, Kevin S. – Discourse Processes: A multidisciplinary journal, 2016
Recent research has provided evidence that the information provided before a story--a spoiler--may increase the enjoyment of that story, perhaps by increasing the processing fluency experienced during reading. In one experiment, we tested the reliability of these findings by closely replicating existing methods and the generality of these findings…
Descriptors: Literary Genres, Reading Fluency, Reliability, Reading Processes
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Vázquez-Alonso, Ángel; Manassero-Mas, María-Antonia; García-Carmona, Antonio; Montesano de Talavera, Marisa – Asia-Pacific Forum on Science Learning and Teaching, 2016
This study applies a new quantitative methodological approach to diagnose epistemology conceptions in a large sample. The analyses use seven multiple-rating items on the epistemology of science drawn from the item pool Views on Science-Technology-Society (VOSTS). The bases of the new methodological diagnostic approach are the empirical…
Descriptors: Epistemology, Statistical Analysis, Science and Society, Scientific Principles
Dan, Youngjun; Geng, Leisha; Li, Meng – Education, 2017
This study aimed to explore students' cognitive patterns based on their knowledge and levels. Participants were seventh graders from a junior high school in China. Three relatively distinct groups were specified by Cluster Analysis: high knowledge and low ability, low knowledge and low ability, and high knowledge and high ability. The group of low…
Descriptors: Cognitive Structures, Curriculum Design, Teaching Methods, Junior High School Students
Halpin, Peter F. – Society for Research on Educational Effectiveness, 2016
Recent research on multiple measures of teaching effectiveness has redefined the role of in-classroom observations in teacher evaluation systems. In particular, most states now mandate that teachers are observed on multiple occasions during the school year, and it is increasingly common that multiple raters are utilized across the different rating…
Descriptors: Models, Multivariate Analysis, Scoring Rubrics, Teacher Evaluation

Peer reviewed
Direct link
