ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	11

Descriptor

Evaluators	16
Performance Based Assessment	16
Scores	16
Scoring	8
Interrater Reliability	7
Second Language Learning	5
Evaluation Methods	4
Language Tests	4
Comparative Analysis	3
Decision Making	3
English (Second Language)	3
Error of Measurement	3
Rating Scales	3
Simulation	3
Training	3
Audio Equipment	2
Educational Assessment	2
Elementary Secondary Education	2
Feedback (Response)	2
Gender Differences	2
Generalizability Theory	2
Item Response Theory	2
Language Proficiency	2
Least Squares Statistics	2
Licensing Examinations…	2
More ▼

Source

Language Testing	3
ProQuest LLC	2
Advances in Health Sciences…	1
Educational Measurement:…	1
Interpreter and Translator…	1
Journal of Educational…	1
Journal of Personnel…	1
Journal of the National…	1
Studies in Applied…	1

Publication Type

Journal Articles	10
Reports - Research	10
Reports - Evaluative	3
Dissertations/Theses -…	2
ERIC Digests in Full Text	1
ERIC Publications	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Secondary Education	1

Audience

Location

Colombia	1
New York (New York)	1
South Carolina	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Holistic versus Analytic Scoring of Spoken-Language Interpreting: A Multi-Perspectival Comparative Analysis

Peer reviewed

Direct link

Chen, Jing; Yang, Huabo; Han, Chao – Interpreter and Translator Trainer, 2022

Rubric scoring has been gaining traction as an emergent method to assess spoken-language interpreting, with two of the most well-known methods being rating scale-based holistic and analytic scoring. While the former provides a single global score, the latter generates separate scores on different dimensions of interpreting performance. Despite the…

Descriptors: Holistic Approach, Speech Communication, Translation, Second Language Learning

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Peer reviewed
PDF on ERIC

Download full text

Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022

For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement

The Effect of Task Complexity on Rater Severity in an Adaptive Performance-Based Second Language Oral Communication Test

Direct link

Won, Yongkook – ProQuest LLC, 2019

Despite the benefits of performance-based oral communication tests, a plethora of variables, as illustrated in Ockey and Li's (2015) model of oral communication assessment, can create construct-irrelevant variance in test scores. In relation to human participants in the oral communication tests, previous studies mostly focused on the direct effect…

Descriptors: Oral Language, Language Tests, English (Second Language), Second Language Learning

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Performance Assessment for School Leaders: Comparing Field Trial and Implementation Results

Peer reviewed

Direct link

Orr, Margaret Terry; Hollingworth, Liz; Beaudin, Barbara – Journal of Educational Administration, 2019

Purpose: The purpose of this paper is to compare two years of results for one state's performance-based assessments for principal licensure Performance Assessment for Leaders (PAL). This includes the field trial (2014-2015) and first year of statewide implementation (2015-2016) when passing score requirements and fees were added. Survey results on…

Descriptors: Comparative Analysis, Principals, Performance Based Assessment, Personnel Evaluation

Measuring the Impact of Rater Negotiation in Writing Performance Assessment

Peer reviewed

Direct link

Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017

Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators

Modeling Rater Effects and Complex Learning Progressions Using Item Response Models

Direct link

Shin, Hyo Jeong – ProQuest LLC, 2015

This dissertation is comprised of three papers that propose and apply psychometric models to deal with complexities and challenges in large-scale assessments, focusing on modeling rater effects and complex learning progressions. In particular, three papers investigate extensions and applications of multilevel and multidimensional item response…

Descriptors: Item Response Theory, Psychometrics, Models, Measurement

Seeing the Same Thing Differently

Peer reviewed

Direct link

Yeates, Peter; O'Neill, Paul; Mann, Karen; Eva, Kevin – Advances in Health Sciences Education, 2013

Assessors' scores in performance assessments are known to be highly variable. Attempted improvements through training or rating format have achieved minimal gains. The mechanisms that contribute to variability in assessors' scoring remain unclear. This study investigated these mechanisms. We used a qualitative approach to study…

Descriptors: Performance Based Assessment, Scores, Evaluators, Scoring

Evaluating Analytic Scoring for the TOEFL[R] Academic Speaking Test (TAST) for Operational Use

Peer reviewed

Direct link

Xi, Xiaoming – Language Testing, 2007

This study explores the utility of analytic scoring for TAST in providing useful and reliable diagnostic information for operational use in three aspects of candidates' performance: delivery, language use and topic development. One hundred and forty examinees' responses to six TAST tasks were scored analytically on these three aspects of speech. G…

Descriptors: Scoring, Profiles, Performance Based Assessment, Academic Discourse

Evaluation of Non-Traditionally and Traditionally Prepared Teachers' Pedagogical Content Knowledge and Practice Using Performance-Based Evidence

Peer reviewed
PDF on ERIC

Download full text

Harvey, Falicia; Gimbert, Belinda G. – Journal of the National Association for Alternative Certification, 2007

This study was designed to compare teachers certified in South Carolina that were trained in two different methods: through traditional college preparation programs and through the Program of Alternative Certification for Educators (PACE). This study explored three research questions. The first question addressed differences in pedagogical…

Descriptors: Pedagogical Content Knowledge, Performance Based Assessment, Comparative Analysis, Alternative Teacher Certification

Least-Squares Models to Correct for Rater Effects in Performance Assessment.

Download full text

Raymond, Mark R.; Viswesvaran, Chockalingam – 1991

This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…

Descriptors: Evaluators, Higher Education, Interrater Reliability, Least Squares Statistics

A Detailed Analysis of Statewide Teacher Appraisal Scores.

Peer reviewed

Tyson, LeaAnn; Silverman, Stephen – Journal of Personnel Evaluation in Education, 1994

Differences in the Texas Teacher Appraisal System scores of teacher subgroups over 2 years were examined for 2,366 teachers for scores on individual domains, sums of scores of the 1st 4 domains, and overall summary performance scores, as well as appraiser differences. Implications for teacher evaluation are discussed. (SLD)

Descriptors: Educational Assessment, Elementary Secondary Education, Evaluation Methods, Evaluators

Same-Scorer Judgments on Multiple Content Area Items in Integrated Performance Assessment.

Download full text

Goldberg, Gail Lynn; Michaels, Hillary – 1995

Preliminary data was gathered to guide subsequent research that will shape training procedures and scoring practice for performance assessment activities that integrate multiple content areas. Content area integration is a key feature of many of the tasks in the Maryland School Performance Assessment Program (MSPAP), a large-scale assessment of…

Descriptors: Elementary Education, Evaluators, Grade 3, Grade 8

Reducing Errors Due to the Use of Judges. ERIC/TM Digest.

Download full text

Rudner, Lawrence M. – 1992

Several common sources of error in assessment that depends on the use of judges are identified, and ways to reduce the impact of rating errors are examined. Numerous threats to the validity of scores based on ratings exist. These threats include: (1) the halo effect; (2) stereotyping; (3) perception differences; (4) leniency/stringency error; and…

Descriptors: Alternative Assessment, Error of Measurement, Evaluation Methods, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Beaudin, Barbara	1
Chen, Jing	1
Eskin, Daniel	1
Eva, Kevin	1
Gimbert, Belinda G.	1
Goldberg, Gail Lynn	1
Han, Chao	1
Harvey, Falicia	1
Hollingworth, Liz	1
Janssen, Gerriet	1
Lin, Chih-Kai	1
Mann, Karen	1
Meier, Valerie	1
Michaels, Hillary	1
O'Neill, Paul	1
Orr, Margaret Terry	1
Raymond, Mark R.	1
Rudner, Lawrence M.	1
Shavelson, Richard J.	1
Shin, Hyo Jeong	1
Silverman, Stephen	1
Trace, Jonathan	1
Tyson, LeaAnn	1
Viswesvaran, Chockalingam	1
More ▼