Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 11 |
Descriptor
Evaluators | 16 |
Performance Based Assessment | 16 |
Scores | 16 |
Scoring | 8 |
Interrater Reliability | 7 |
Second Language Learning | 5 |
Evaluation Methods | 4 |
Language Tests | 4 |
Comparative Analysis | 3 |
Decision Making | 3 |
English (Second Language) | 3 |
More ▼ |
Source
Author
Beaudin, Barbara | 1 |
Chen, Jing | 1 |
Eskin, Daniel | 1 |
Eva, Kevin | 1 |
Gimbert, Belinda G. | 1 |
Goldberg, Gail Lynn | 1 |
Han, Chao | 1 |
Harvey, Falicia | 1 |
Hollingworth, Liz | 1 |
Janssen, Gerriet | 1 |
Lin, Chih-Kai | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 10 |
Reports - Evaluative | 3 |
Dissertations/Theses -… | 2 |
ERIC Digests in Full Text | 1 |
ERIC Publications | 1 |
Speeches/Meeting Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 3 |
Elementary Secondary Education | 1 |
Audience
Location
Colombia | 1 |
New York (New York) | 1 |
South Carolina | 1 |
Texas | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Chen, Jing; Yang, Huabo; Han, Chao – Interpreter and Translator Trainer, 2022
Rubric scoring has been gaining traction as an emergent method to assess spoken-language interpreting, with two of the most well-known methods being rating scale-based holistic and analytic scoring. While the former provides a single global score, the latter generates separate scores on different dimensions of interpreting performance. Despite the…
Descriptors: Holistic Approach, Speech Communication, Translation, Second Language Learning
Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022
For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement
Won, Yongkook – ProQuest LLC, 2019
Despite the benefits of performance-based oral communication tests, a plethora of variables, as illustrated in Ockey and Li's (2015) model of oral communication assessment, can create construct-irrelevant variance in test scores. In relation to human participants in the oral communication tests, previous studies mostly focused on the direct effect…
Descriptors: Oral Language, Language Tests, English (Second Language), Second Language Learning
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Orr, Margaret Terry; Hollingworth, Liz; Beaudin, Barbara – Journal of Educational Administration, 2019
Purpose: The purpose of this paper is to compare two years of results for one state's performance-based assessments for principal licensure Performance Assessment for Leaders (PAL). This includes the field trial (2014-2015) and first year of statewide implementation (2015-2016) when passing score requirements and fees were added. Survey results on…
Descriptors: Comparative Analysis, Principals, Performance Based Assessment, Personnel Evaluation
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Shin, Hyo Jeong – ProQuest LLC, 2015
This dissertation is comprised of three papers that propose and apply psychometric models to deal with complexities and challenges in large-scale assessments, focusing on modeling rater effects and complex learning progressions. In particular, three papers investigate extensions and applications of multilevel and multidimensional item response…
Descriptors: Item Response Theory, Psychometrics, Models, Measurement
Yeates, Peter; O'Neill, Paul; Mann, Karen; Eva, Kevin – Advances in Health Sciences Education, 2013
Assessors' scores in performance assessments are known to be highly variable. Attempted improvements through training or rating format have achieved minimal gains. The mechanisms that contribute to variability in assessors' scoring remain unclear. This study investigated these mechanisms. We used a qualitative approach to study…
Descriptors: Performance Based Assessment, Scores, Evaluators, Scoring
Xi, Xiaoming – Language Testing, 2007
This study explores the utility of analytic scoring for TAST in providing useful and reliable diagnostic information for operational use in three aspects of candidates' performance: delivery, language use and topic development. One hundred and forty examinees' responses to six TAST tasks were scored analytically on these three aspects of speech. G…
Descriptors: Scoring, Profiles, Performance Based Assessment, Academic Discourse
Harvey, Falicia; Gimbert, Belinda G. – Journal of the National Association for Alternative Certification, 2007
This study was designed to compare teachers certified in South Carolina that were trained in two different methods: through traditional college preparation programs and through the Program of Alternative Certification for Educators (PACE). This study explored three research questions. The first question addressed differences in pedagogical…
Descriptors: Pedagogical Content Knowledge, Performance Based Assessment, Comparative Analysis, Alternative Teacher Certification
Raymond, Mark R.; Viswesvaran, Chockalingam – 1991
This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…
Descriptors: Evaluators, Higher Education, Interrater Reliability, Least Squares Statistics

Tyson, LeaAnn; Silverman, Stephen – Journal of Personnel Evaluation in Education, 1994
Differences in the Texas Teacher Appraisal System scores of teacher subgroups over 2 years were examined for 2,366 teachers for scores on individual domains, sums of scores of the 1st 4 domains, and overall summary performance scores, as well as appraiser differences. Implications for teacher evaluation are discussed. (SLD)
Descriptors: Educational Assessment, Elementary Secondary Education, Evaluation Methods, Evaluators
Goldberg, Gail Lynn; Michaels, Hillary – 1995
Preliminary data was gathered to guide subsequent research that will shape training procedures and scoring practice for performance assessment activities that integrate multiple content areas. Content area integration is a key feature of many of the tasks in the Maryland School Performance Assessment Program (MSPAP), a large-scale assessment of…
Descriptors: Elementary Education, Evaluators, Grade 3, Grade 8
Rudner, Lawrence M. – 1992
Several common sources of error in assessment that depends on the use of judges are identified, and ways to reduce the impact of rating errors are examined. Numerous threats to the validity of scores based on ratings exist. These threats include: (1) the halo effect; (2) stereotyping; (3) perception differences; (4) leniency/stringency error; and…
Descriptors: Alternative Assessment, Error of Measurement, Evaluation Methods, Evaluators
Previous Page | Next Page ยป
Pages: 1 | 2