ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	7

Descriptor

Comparative Analysis	8
Generalizability Theory	8
Interrater Reliability	8
Item Response Theory	4
Accuracy	2
Correlation	2
Feedback (Response)	2
Measures (Individuals)	2
Physicians	2
Scores	2
Scoring Rubrics	2
Statistical Analysis	2
Asians	1
Bayesian Statistics	1
Certification	1
Check Lists	1
Classroom Observation…	1
Clinical Teaching (Health…	1
Communication Skills	1
Computation	1
Cutting Scores	1
Data Analysis	1
Educational Practices	1
Electronic Learning	1
Elementary School Students	1
More ▼

Source

Advances in Health Sciences…	1
Educational Researcher	1
Educational Sciences: Theory…	1
IEEE Transactions on Learning…	1
Journal of Educational…	1
Journal of Outcome Measurement	1
Language Testing	1
ProQuest LLC	1

Author

Alkahtani, Saif F.	1
Attali, Yigal	1
Charalambous, Charalambos Y.	1
Chen, Yen-Yuan	1
Dogan, C. Deha	1
Harrison, George M.	1
Hill, Heather C.	1
Kraft, Matthew A.	1
Leung, Kai-Kuen	1
Lunz, Mary E.	1
Schumacker, Randall E.	1
Ueno, Maomi	1
Uluman, Müge	1
Uto, Masaki	1
Wang, Wei-Dan	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	4
Reports - Descriptive	2
Dissertations/Theses -…	1
Reports - Evaluative	1

Education Level

Elementary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

Asia

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Non-Numeric Intrajudge Consistency Feedback in an Angoff Procedure

Peer reviewed

Direct link

Harrison, George M. – Journal of Educational Measurement, 2015

The credibility of standard-setting cut scores depends in part on two sources of consistency evidence: intrajudge and interjudge consistency. Although intrajudge consistency feedback has often been provided to Angoff judges in practice, more evidence is needed to determine whether it achieves its intended effect. In this randomized experiment with…

Descriptors: Interrater Reliability, Standard Setting (Scoring), Cutting Scores, Feedback (Response)

A Comparison of Rubrics and Graded Category Rating Scales with Various Methods Regarding Raters' Reliability

Peer reviewed
PDF on ERIC

Download full text

Dogan, C. Deha; Uluman, Müge – Educational Sciences: Theory and Practice, 2017

The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…

Descriptors: Comparative Analysis, Scoring Rubrics, Rating Scales, Interrater Reliability

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Oral Performace Scoring Using Generalizability Theory and Many-Facet Rasch Measurement: A Comparison Study

Direct link

Alkahtani, Saif F. – ProQuest LLC, 2012

The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…

Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam

When Rater Reliability Is Not Enough: Teacher Observation Systems and a Case for the Generalizability Study

Peer reviewed

Direct link

Hill, Heather C.; Charalambous, Charalambos Y.; Kraft, Matthew A. – Educational Researcher, 2012

In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to…

Descriptors: Generalizability Theory, Observation, Classroom Observation Techniques, Evaluation

Multi-Source Evaluation of Interpersonal and Communication Skills of Family Medicine Residents

Peer reviewed

Direct link

Leung, Kai-Kuen; Wang, Wei-Dan; Chen, Yen-Yuan – Advances in Health Sciences Education, 2012

There is a lack of information on the use of multi-source evaluation to assess trainees' interpersonal and communication skills in Oriental settings. This study is conducted to assess the reliability and applicability of assessing the interpersonal and communication skills of family medicine residents by patients, peer residents, nurses, and…

Descriptors: Foreign Countries, Clinical Teaching (Health Professions), Communication Skills, Patients

Scoring and Analysis of Performance Examinations: A Comparison of Methods and Interpretations.

Peer reviewed

Lunz, Mary E.; Schumacker, Randall E. – Journal of Outcome Measurement, 1997

Results and interpretations of the data from a performance examination were compared for four methods of analysis for 74 medical specialty certification candidates: (1) traditional summary statistics; (2) inter-judge correlations; (3) generalizability theory; and (4) the multifaceted Rasch model. Advantages of the Rasch model are outlined. (SLD)

Descriptors: Comparative Analysis, Data Analysis, Generalizability Theory, Interrater Reliability