Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 16 |
Descriptor
Generalizability Theory | 26 |
Interrater Reliability | 26 |
Scores | 26 |
Test Reliability | 13 |
Error of Measurement | 8 |
Scoring | 8 |
Performance Based Assessment | 6 |
Evaluators | 5 |
Graduate Students | 4 |
Standardized Tests | 4 |
Test Validity | 4 |
More ▼ |
Source
Author
Abedi, Jamal | 1 |
Aktas, Mehtap | 1 |
Alves, Cecilia Brito | 1 |
Asiret, Semih | 1 |
Attali, Yigal | 1 |
Bahry, Louise M. | 1 |
Baldus, Robert | 1 |
Bennett, Randy Elliot | 1 |
Berenbon, Rebecca | 1 |
Bordage, Georges | 1 |
Bunch, Michael B. | 1 |
More ▼ |
Publication Type
Journal Articles | 20 |
Reports - Research | 17 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 4 |
Reports - Descriptive | 3 |
Tests/Questionnaires | 2 |
Education Level
Higher Education | 7 |
Postsecondary Education | 5 |
Early Childhood Education | 1 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 1 | 1 |
Grade 8 | 1 |
Grade 9 | 1 |
Primary Education | 1 |
Audience
Researchers | 1 |
Location
Turkey (Ankara) | 2 |
China (Beijing) | 1 |
Idaho | 1 |
Oklahoma | 1 |
Laws, Policies, & Programs
Assessments and Surveys
United States Medical… | 1 |
What Works Clearinghouse Rating
D'Agostino, Jerome V.; Rodgers, Emily; Winkler, Christa; Johnson, Tracy; Berenbon, Rebecca – Reading Psychology, 2021
Running Records provide a standardized method for recording and assessing students' oral reading behaviors and are excellent formative assessment tools to guide instructional decision-making. This study expands on prior Running Record reliability work by evaluating the extent to which external raters and teachers consistently assessed students'…
Descriptors: Accuracy, Oral Reading, Generalizability Theory, Error Correction
Roduta Roberts, Mary; Alves, Cecilia Brito; Werther, Karin; Bahry, Louise M. – Journal of Psychoeducational Assessment, 2019
The purpose of this study was to examine the reliability and sources of score variation from a performance assessment of practice competencies within an occupational therapy program. Data from 99 students who participated in a practical exam were examined. A generalizability analysis of analytic, total, and overall holistic scores was completed…
Descriptors: Performance Based Assessment, Test Reliability, Scores, Occupational Therapy
McCaffrey, Daniel F.; Oliveri, Maria Elena; Holtzman, Steven – ETS Research Report Series, 2018
Scores from noncognitive measures are increasingly valued for their utility in helping to inform postsecondary admissions decisions. However, their use has presented challenges because of faking, response biases, or subjectivity, which standardized third-party evaluations (TPEs) can help minimize. Analysts and researchers using TPEs, however, need…
Descriptors: Generalizability Theory, Scores, College Admission, Admission Criteria
Irby, Sarah M.; Floyd, Randy G. – Psychology in the Schools, 2017
This study examined the exchangeability of total scores (i.e., intelligent quotients [IQs]) from three brief intelligence tests. Tests were administered to 36 children with intellectual giftedness, scored live by one set of primary examiners and later scored by a secondary examiner. For each student, six IQs were calculated, and all 216 values…
Descriptors: Intelligence Tests, Gifted, Error of Measurement, Scores
Uzun, N. Bilge; Aktas, Mehtap; Asiret, Semih; Yormaz, Seha – Asian Journal of Education and Training, 2018
The goal of this study is to determine the reliability of the performance points of dentistry students regarding communication skills and to examine the scoring reliability by generalizability theory in balanced random and fixed facet (mixed design) data, considering also the interactions of student, rater and duty. The study group of the research…
Descriptors: Foreign Countries, Generalizability Theory, Scores, Test Reliability
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Park, Yoon Soo; Hyderi, Abbas; Bordage, Georges; Xing, Kuan; Yudkowsky, Rachel – Advances in Health Sciences Education, 2016
Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing…
Descriptors: Interrater Reliability, Generalizability Theory, Licensing Examinations (Professions), Physicians
Attali, Yigal – Language Testing, 2016
A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators
Semmelroth, Carrie Lisa; Johnson, Evelyn – Assessment for Effective Intervention, 2014
This study used generalizability theory to measure reliability on the Recognizing Effective Special Education Teachers (RESET) observation tool designed to evaluate special education teacher effectiveness. At the time of this study, the RESET tool included three evidence-based instructional practices (direct, explicit instruction; whole-group…
Descriptors: Observation, Special Education Teachers, Teacher Effectiveness, Teacher Evaluation
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Gugiu, Mihaiela R.; Gugiu, Paul C.; Baldus, Robert – Journal of MultiDisciplinary Evaluation, 2012
Background: Educational researchers have long espoused the virtues of writing with regard to student cognitive skills. However, research on the reliability of the grades assigned to written papers reveals a high degree of contradiction, with some researchers concluding that the grades assigned are very reliable whereas others suggesting that they…
Descriptors: Grades (Scholastic), Grading, Scoring Rubrics, Research Design
Hill, Heather C.; Charalambous, Charalambos Y.; Kraft, Matthew A. – Educational Researcher, 2012
In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to…
Descriptors: Generalizability Theory, Observation, Classroom Observation Techniques, Evaluation
Guler, Nese; Gelbal, Selahattin – Educational Sciences: Theory and Practice, 2010
In this study, the Classical test theory and generalizability theory were used for determination to reliability of scores obtained from measurement tool of mathematics success. 24 open-ended mathematics question of the TIMSS-1999 was applied to 203 students in 2007-spring semester. Internal consistency of scores was found as 0.92. For…
Descriptors: Generalizability Theory, Test Theory, Test Reliability, Interrater Reliability
Hathcoat, John D.; Penn, Jeremy D. – Research & Practice in Assessment, 2012
Critics of standardized testing have recommended replacing standardized tests with more authentic assessment measures, such as classroom assignments, projects, or portfolios rated by a panel of raters using common rubrics. Little research has examined the consistency of scores across multiple authentic assignments or the implications of this…
Descriptors: Generalizability Theory, Performance Based Assessment, Writing Across the Curriculum, Standardized Tests
Gebril, Atta – Assessing Writing, 2010
Integrated tasks are currently employed in a number of L2 exams since they are perceived as an addition to the writing-only task type. Given this trend, the current study investigates composite score generalizability of both reading-to-write and writing-only tasks. For this purpose, a multivariate generalizability analysis is used to investigate…
Descriptors: Scoring, Scores, Second Language Instruction, Writing Evaluation
Previous Page | Next Page »
Pages: 1 | 2