Publication Date
In 2025 | 2 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 19 |
Since 2016 (last 10 years) | 58 |
Since 2006 (last 20 years) | 165 |
Descriptor
Correlation | 220 |
Evaluation Methods | 220 |
Reliability | 94 |
Test Reliability | 81 |
Interrater Reliability | 59 |
Test Validity | 52 |
Foreign Countries | 46 |
Validity | 46 |
Scores | 39 |
Statistical Analysis | 37 |
Measurement Techniques | 34 |
More ▼ |
Source
Author
Gill, Brian | 4 |
Booker, Kevin | 2 |
Bruch, Julie | 2 |
Dudley-Marling, Curt | 2 |
Elliott, Stephen N. | 2 |
Gresham, Frank M. | 2 |
Lambie, Glenn W. | 2 |
Matson, Johnny L. | 2 |
Onwuegbuzie, Anthony J. | 2 |
Owston, Ronald D. | 2 |
Smith, Erica | 2 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 6 |
Practitioners | 3 |
Counselors | 1 |
Teachers | 1 |
Location
China | 7 |
Florida | 6 |
United Kingdom | 6 |
Netherlands | 5 |
Spain | 5 |
Turkey | 5 |
Australia | 4 |
California | 4 |
Arizona | 3 |
Illinois | 3 |
Japan | 3 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Kelvin Terrell Pompey – ProQuest LLC, 2021
Many methods are used to measure interrater reliability for studies where each target receives ratings by a different set of judges. The purpose of this study is to explore the use of hierarchical modeling for estimating interrater reliability using the intraclass correlation coefficient. This study provides a description of how the ICC can be…
Descriptors: Interrater Reliability, Evaluation Methods, Test Reliability, Correlation
Henrique Mohallem Paiva; Flávia Maria Santoro; Victor Takashi Hayashi; Bianca Cassemiro Lima – IEEE Transactions on Education, 2025
Contribution: This article analyzes student assessment within a computing faculty employing a full project-based learning (PBL) approach. Examining 2078 final grades across 60 classes and periods, the study reveals a significant correlation between graded self-studies, exams, and projects. This result contributes to understanding the reliability…
Descriptors: Student Evaluation, Computer Science Education, College Faculty, Correlation
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025
The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…
Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis
Andrea Vallevand; David E. Manthey; Kim Askew; Nicholas D. Hartman; Cynthia Burns; Lindsay C. Strowd; Claudio Violato – Advances in Health Sciences Education, 2024
Education in Doctor of Medicine programs has moved towards an emphasis on clinical competency, with entrustable professional activities providing a framework of learning objectives and outcomes to be assessed within the clinical environment. While the identification and structured definition of objectives and outcomes have evolved, many methods…
Descriptors: Clinical Experience, Graduate Medical Education, Validity, Evaluation Methods
Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021
Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…
Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries
Koehler, Jana Christina; Georgescu, Alexandra Livia; Weiske, Johanna; Spangemacher, Moritz; Burghof, Lana; Falkai, Peter; Koutsouleris, Nikolaos; Tschacher, Wolfgang; Vogeley, Kai; Falter-Wagner, Christine M. – Journal of Autism and Developmental Disorders, 2022
Reliably diagnosing autism spectrum disorders (ASD) in adulthood poses a challenge to clinicians due to the absence of specific diagnostic markers. This study investigated the potential of interpersonal synchrony (IPS), which has been found to be reduced in ASD, to augment the diagnostic process. IPS was objectively assessed in videos…
Descriptors: Autism, Pervasive Developmental Disorders, Clinical Diagnosis, Reliability
Olvera Astivia, Oscar L.; Zumbo, Bruno D. – Measurement: Interdisciplinary Research and Perspectives, 2019
Methods to generate random correlation matrices have been proposed in the literature, but very few instances exist where these correlation matrices are structured or where the statistical properties of the algorithms are known. By relying on the tetrad relation discovered by Spearman and the properties of the beta distribution, an algorithm is…
Descriptors: Correlation, Psychometrics, Benchmarking, Evaluation Methods
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Lambie, Glenn W.; Mullen, Patrick R.; Swank, Jacqueline M.; Blount, Ashley – Measurement and Evaluation in Counseling and Development, 2018
Supervisors evaluated counselors-in-training at multiple points during their practicum experience using the Counseling Competencies Scale (CCS; N = 1,070). The CCS evaluations were randomly split to conduct exploratory factor analysis and confirmatory factor analysis, resulting in a 2-factor model (61.5% of the variance explained).
Descriptors: Counselor Training, Counseling, Measures (Individuals), Competence
Azman Ong, Mohd Hanafi; Mohd Yasin, Norazlina; Ibrahim, Nur Syafikah – Asian Association of Open Universities Journal, 2022
Purpose: Measuring internal response of online learning is seen as fundamental to absorptive capacity which stimulates knowledge assimilation. However, the evaluation of practice and research of validated instruments that could effectively measure online learning response behavior is limited. Thus, in this study, a new instrument was designed…
Descriptors: Online Courses, Student Surveys, Student Attitudes, Factor Analysis
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Looney, Marilyn A. – Measurement in Physical Education and Exercise Science, 2018
The purpose of this article was two-fold (1) provide an overview of the commonly reported and under-reported absolute agreement indices in the kinesiology literature for continuous data; and (2) present examples of these indices for hypothetical data along with recommendations for future use. It is recommended that three types of information be…
Descriptors: Interrater Reliability, Evaluation Methods, Kinetics, Indexes
Gioia, Anthony R.; Ahmed, Yusra; Woods, Steven P.; Cirino, Paul T. – Reading and Writing: An Interdisciplinary Journal, 2023
There is significant overlap between reading and writing, but no known standardized measure assesses these jointly. The goal of the present study is to evaluate the properties of a novel measure, the Assessment of Writing, Self-Monitoring, and Reading (AWSM Reader), that simultaneously evaluates both reading comprehension and writing. In doing so,…
Descriptors: Reading Writing Relationship, Writing Evaluation, Self Evaluation (Individuals), Executive Function