Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 19 |
Descriptor
Evaluation Methods | 26 |
Generalizability Theory | 26 |
Reliability | 26 |
Scores | 6 |
Error of Measurement | 5 |
College Faculty | 4 |
Observation | 4 |
Scoring | 4 |
Student Evaluation | 4 |
Writing Skills | 4 |
College Students | 3 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 21 |
Reports - Research | 18 |
Reports - Evaluative | 3 |
Dissertations/Theses -… | 2 |
Reports - Descriptive | 2 |
Speeches/Meeting Papers | 2 |
Information Analyses | 1 |
Tests/Questionnaires | 1 |
Education Level
Audience
Location
Turkey | 1 |
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Sturgis, Paul W.; Marchand, Leslie; Miller, M. David; Xu, Wei; Castiglioni, Analia – Association for Institutional Research, 2022
This article introduces generalizability theory (G-theory) to institutional research and assessment practitioners, and explains how it can be utilized to evaluate the reliability of assessment procedures in order to improve student learning outcomes. The fundamental concepts associated with G-theory are briefly discussed, followed by a discussion…
Descriptors: Generalizability Theory, Institutional Research, Reliability, Computer Software
Gresham, Frank M.; Dart, Evan H.; Collins, Tai A. – School Psychology Review, 2017
The concept of treatment integrity is an essential component to databased decision making within a response-to-intervention model. Although treatment integrity is a topic receiving increased attention in the school-based intervention literature, relatively few studies have been conducted regarding the technical adequacy of treatment integrity…
Descriptors: Fidelity, Generalizability Theory, Observation, Measurement Techniques
McLaughlin, Tara W.; Snyder, Patricia A.; Algina, James – Grantee Submission, 2017
The Learning Target Rating Scale (LTRS) is a measure designed to evaluate the quality of teacher-developed learning targets for embedded instruction for early learning. In the present study, we examined the measurement dependability of LTRS scores by conducting a generalizability study (G-study). We used a partially nested, three-facet model to…
Descriptors: Generalizability Theory, Scores, Rating Scales, Evaluation Methods
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie – Reading and Writing: An Interdisciplinary Journal, 2017
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…
Descriptors: Writing Evaluation, Elementary School Students, Grade 3, Grade 4
Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie – Grantee Submission, 2017
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…
Descriptors: Writing Evaluation, Elementary School Students, Grade 3, Grade 4
Zhang, Bo; Xiao, Yunnan; Luo, Juan – Language Testing in Asia, 2015
Previous studies comparing holistic scoring to analytic scoring of second language writing have given mixed results. Some of them suffer from methodological drawbacks, such as limited writing sample size, limited number of raters, and lack of direct comparison of the two methods. Based on 300 writing samples graded by 14 raters, this research…
Descriptors: Evaluators, Reliability, Scores, Holistic Approach
Cetin, Bayram; Guler, Nese; Sarica, Rabia – Eurasian Journal of Educational Research, 2016
Problem Statement: In addition to being teaching tools, concept maps can be used as effective assessment tools. The use of concept maps for assessment has raised the issue of scoring them. Concept maps generated and used in different ways can be scored via various methods. Holistic and relational scoring methods are two of them. Purpose of the…
Descriptors: Generalizability Theory, Concept Mapping, Scoring, Scoring Formulas
Attali, Yigal – Educational and Psychological Measurement, 2014
This article presents a comparative judgment approach for holistically scored constructed response tasks. In this approach, the grader rank orders (rather than rate) the quality of a small set of responses. A prior automated evaluation of responses guides both set formation and scaling of rankings. Sets are formed to have similar prior scores and…
Descriptors: Responses, Item Response Theory, Scores, Rating Scales
Moonen-van Loon, J. M. W.; Overeem, K.; Donkers, H. H. L. M.; van der Vleuten, C. P. M.; Driessen, E. W. – Advances in Health Sciences Education, 2013
In recent years, postgraduate assessment programmes around the world have embraced workplace-based assessment (WBA) and its related tools. Despite their widespread use, results of studies on the validity and reliability of these tools have been variable. Although in many countries decisions about residents' continuation of training and…
Descriptors: Graduate Students, Graduate Study, Graduate Medical Education, Generalizability Theory
Orem, Chris D. – ProQuest LLC, 2012
Meta-assessment, or the assessment of assessment, can provide meaningful information about the trustworthiness of an academic program's assessment results (Bresciani, Gardner, & Hickmott, 2009; Palomba & Banta, 1999; Suskie, 2009). Many institutions conduct meta-assessments for their academic programs (Fulcher, Swain, & Orem, 2012),…
Descriptors: Validity, Evidence, Evaluation Methods, Meta Analysis
Shin, Yongyun; Raudenbush, Stephen W. – Psychometrika, 2012
Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting…
Descriptors: Generalizability Theory, Neighborhoods, Intervals, Child Care Centers
Brandt, Lorilynn – ProQuest LLC, 2010
Phonics was identified as one of the critical components in reading development by the National Reading Panel. Over time, research has repeatedly identified phonics as important to early reading development. Given the compelling evidence supporting the teaching of phonics in early reading, it is critical to make sure that instructional decisions…
Descriptors: Generalizability Theory, Phonics, Early Reading, Validity
Lei, Pui-Wa; Smith, Maria; Suen, Hoi K. – Psychology in the Schools, 2007
Direct observation of behaviors is a data collection method customarily used in clinical and educational settings. Repeated measures and small samples are inherent characteristics of observational studies that pose challenges to the numerical estimation of reliability for observational data. In this article, we review some debates about the use of…
Descriptors: Generalizability Theory, Data Collection, Observation, Evaluation Methods
Murphy, Douglas J.; Bruce, David A.; Mercer, Stewart W.; Eva, Kevin W. – Advances in Health Sciences Education, 2009
To investigate the reliability and feasibility of six potential workplace-based assessment methods in general practice training: criterion audit, multi-source feedback from clinical and non-clinical colleagues, patient feedback (the CARE Measure), referral letters, significant event analysis, and video analysis of consultations. Performance of GP…
Descriptors: Reliability, Graduate Medical Education, Family Practice (Medicine), Vocational Evaluation
Previous Page | Next Page ยป
Pages: 1 | 2