ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	7

Descriptor

Generalizability Theory	20
Interrater Reliability	20
Performance Based Assessment	20
Test Reliability	7
Educational Assessment	6
Error of Measurement	6
Scores	6
Scoring	5
Test Validity	5
Data Analysis	4
Test Construction	4
Evaluation Methods	3
Evaluators	3
Foreign Countries	3
Graduate Students	3
Mathematics Achievement	3
Student Evaluation	3
Elementary Education	2
Elementary School Students	2
English	2
High Schools	2
Higher Education	2
Item Response Theory	2
Language Tests	2
Licensing Examinations…	2
More ▼

Source

Language Testing	2
Alberta Journal of…	1
Applied Psychological…	1
Asian Journal of Education…	1
Educational Measurement:…	1
Educational and Psychological…	1
Journal of Educational…	1
Journal of Outcome Measurement	1
Journal of Psychoeducational…	1
Language Assessment Quarterly	1
ProQuest LLC	1
Research & Practice in…	1
More ▼

Publication Type

Reports - Research	13
Journal Articles	12
Reports - Evaluative	7
Speeches/Meeting Papers	6
Dissertations/Theses -…	1
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	3
Adult Education	1

Audience

Location

Canada	1
China (Beijing)	1
Oklahoma	1
Turkey (Ankara)	1

Laws, Policies, & Programs

Assessments and Surveys

Texas Assessment of Academic…

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Examining the Reliability of Scores from a Performance Assessment of Practice-Based Competencies

Peer reviewed

Direct link

Roduta Roberts, Mary; Alves, Cecilia Brito; Werther, Karin; Bahry, Louise M. – Journal of Psychoeducational Assessment, 2019

The purpose of this study was to examine the reliability and sources of score variation from a performance assessment of practice competencies within an occupational therapy program. Data from 99 students who participated in a practical exam were examined. A generalizability analysis of analytic, total, and overall holistic scores was completed…

Descriptors: Performance Based Assessment, Test Reliability, Scores, Occupational Therapy

Using Generalizability Theory to Assess the Score Reliability of Communication Skills of Dentistry Students

Peer reviewed
PDF on ERIC

Download full text

Uzun, N. Bilge; Aktas, Mehtap; Asiret, Semih; Yormaz, Seha – Asian Journal of Education and Training, 2018

The goal of this study is to determine the reliability of the performance points of dentistry students regarding communication skills and to examine the scoring reliability by generalizability theory in balanced random and fixed facet (mixed design) data, considering also the interactions of student, rater and duty. The study group of the research…

Descriptors: Foreign Countries, Generalizability Theory, Scores, Test Reliability

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Peer reviewed

Direct link

Han, Chao – Language Assessment Quarterly, 2016

As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…

Descriptors: Foreign Countries, Scores, English, Chinese

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors of Performance Ratings

Peer reviewed

Direct link

Raymond, Mark R.; Harik, Polina; Clauser, Brian E. – Applied Psychological Measurement, 2011

Prior research indicates that the overall reliability of performance ratings can be improved by using ordinary least squares (OLS) regression to adjust for rater effects. The present investigation extends previous work by evaluating the impact of OLS adjustment on standard errors of measurement ("SEM") at specific score levels. In…

Descriptors: Performance Based Assessment, Licensing Examinations (Professions), Least Squares Statistics, Item Response Theory

The Effect of Raters and Rating Conditions on the Reliability of the Missionary Teaching Assessment

Direct link

Ure, Abigail C. – ProQuest LLC, 2011

This study investigated how 2 different rating conditions, the controlled rating condition (CRC) and the uncontrolled rating condition (URC), effected rater behavior and the reliability of a performance assessment (PA) known as the Missionary Teaching Assessment (MTA). The CRC gives raters the capability to manipulate (pause, rewind, fast-forward)…

Descriptors: Teacher Evaluation, Performance Based Assessment, Performance Tests, Generalizability Theory

Generalizability of Student Writing across Multiple Tasks: A Challenge for Authentic Assessment

Peer reviewed
PDF on ERIC

Download full text

Hathcoat, John D.; Penn, Jeremy D. – Research & Practice in Assessment, 2012

Critics of standardized testing have recommended replacing standardized tests with more authentic assessment measures, such as classroom assignments, projects, or portfolios rated by a panel of raters using common rubrics. Little research has examined the consistency of scores across multiple authentic assignments or the implications of this…

Descriptors: Generalizability Theory, Performance Based Assessment, Writing Across the Curriculum, Standardized Tests

Scoring and Analysis of Performance Examinations: A Comparison of Methods and Interpretations.

Peer reviewed

Lunz, Mary E.; Schumacker, Randall E. – Journal of Outcome Measurement, 1997

Results and interpretations of the data from a performance examination were compared for four methods of analysis for 74 medical specialty certification candidates: (1) traditional summary statistics; (2) inter-judge correlations; (3) generalizability theory; and (4) the multifaceted Rasch model. Advantages of the Rasch model are outlined. (SLD)

Descriptors: Comparative Analysis, Data Analysis, Generalizability Theory, Interrater Reliability

A Discussion of Analytic Scoring for Writing Performance Assessments.

Download full text

Crehan, Kevin D. – 1997

Writing fits well within the realm of outcomes suitable for observation by performance assessments. Studies of the reliability of performance assessments have suggested that interrater reliability can be consistently high. Scoring consistency, however, is only one aspect of quality in decisions based on assessment results. Another is…

Descriptors: Evaluation Methods, Feedback, Generalizability Theory, Interrater Reliability

Performance-Based Assessment: Implications of Task Specificity.

Peer reviewed

Linn, Robert L.; Burton, Elizabeth – Educational Measurement: Issues and Practice, 1994

Generalizability of performance-based assessment scores across raters and tasks is examined, focusing on implications of generalizability analyses for specific uses and interpretations of assessment results. Although it seems probable that assessment conditions, task characteristics, and interactions with instructional experiences affect the…

Descriptors: Educational Assessment, Educational Experience, Generalizability Theory, Interaction

Using GENOVA and FACETS to Set Multiple Standards on Performance Assessment for Certification in Medical Translation from Japanese into English

Peer reviewed

Direct link

Kozaki ,Y. – Language Testing, 2004

This article presents a standard-setting procedure for performance assessment in a foreign language, through which some of the major problems facing performance assessment in criterion-referenced testing can be addressed. The procedure, which was geared to revealing and accommodating inter-judge variability, employed the synergy of multiple…

Descriptors: Data Analysis, Testing, Performance Tests, Generalizability Theory

The Preferability of Constrained Optimization in Determining the Number of Prompts, Modes of Discourse, and Raters in a Direct Writing Assessment.

Download full text

Parkes, Jay; Suen, Hoi K. – 1995

This study demonstrates the advantages of using a constrained optimization algorithm to explore the optimal number of prompts, modes of discourse, and raters for achieving an acceptable level of reliability during a direct writing assessment. Writing samples elicited from 50 college students were rated by 3 graduate students and the scores…

Descriptors: Algorithms, College Students, Educational Assessment, Generalizability Theory

Clarifying the Blurred Image: Estimating the Inter-Rater Reliability of Performance Assessments.

Download full text

Moore, Alan D.; Young, Suzanne – 1997

As schools move toward performance assessment, there is increasing discussion of using these assessments for accountability purposes. When used for making decisions, performance assessments must meet high standards of validity and reliability. One major source of unreliability in performance assessments is interrater disagreement. In this paper,…

Descriptors: Accountability, Correlation, Elementary Secondary Education, Generalizability Theory

Statistical Test Specifications for Performance Assessments: Is This an Oxymoron?

Download full text

Reckase, Mark D. – 1997

This paper argues that special procedures for constructing assessment tools containing performance assessment tasks are unnecessary and that current test methodology can easily be generalized to complex performance assessment tasks without destroying the desirable characteristics of those tasks. Reasonable statistical requirements for sound…

Descriptors: Educational Assessment, Generalizability Theory, High Stakes Tests, Interrater Reliability

Generalizability of Written-Response Scores for the Alberta Education English 30 Diploma Examination.

Peer reviewed

Gierl, Mark J. – Alberta Journal of Educational Research, 1998

Examined the generalizability of written-response scores on the English 30 diploma examination administered to Alberta 12th-grade students. Student scores differed as a function of rater, but this variance component was small across two tasks and two administrations; score generalizability was high using a two-rater system; and scale variability…

Descriptors: Error of Measurement, Foreign Countries, Generalizability Theory, High School Seniors

Previous Page | Next Page »

Pages: 1 | 2

Linn, Robert L.	2
Abedi, Jamal	1
Aktas, Mehtap	1
Alves, Cecilia Brito	1
Asiret, Semih	1
Bahry, Louise M.	1
Baker, Eva L.	1
Burton, Elizabeth	1
Clauser, Brian E.	1
Crehan, Kevin D.	1
Gierl, Mark J.	1
Han, Chao	1
Harik, Polina	1
Hathcoat, John D.	1
Kozaki ,Y.	1
Kulm, Gerald	1
Lane, Suzanne	1
Lin, Chih-Kai	1
Lunz, Mary E.	1
Moore, Alan D.	1
Parkes, Jay	1
Penn, Jeremy D.	1
Raymond, Mark R.	1
Reckase, Mark D.	1
More ▼