NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 8 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Harrison, George M. – Journal of Educational Measurement, 2015
The credibility of standard-setting cut scores depends in part on two sources of consistency evidence: intrajudge and interjudge consistency. Although intrajudge consistency feedback has often been provided to Angoff judges in practice, more evidence is needed to determine whether it achieves its intended effect. In this randomized experiment with…
Descriptors: Interrater Reliability, Standard Setting (Scoring), Cutting Scores, Feedback (Response)
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Dogan, C. Deha; Uluman, Müge – Educational Sciences: Theory and Practice, 2017
The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Descriptors: Comparative Analysis, Scoring Rubrics, Rating Scales, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal – Language Testing, 2016
A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators
Alkahtani, Saif F. – ProQuest LLC, 2012
The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…
Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam
Peer reviewed Peer reviewed
Direct linkDirect link
Hill, Heather C.; Charalambous, Charalambos Y.; Kraft, Matthew A. – Educational Researcher, 2012
In recent years, interest has grown in using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impact evaluation of classroom-based interventions. Although education practitioners and researchers have developed numerous observational instruments for these purposes, many developers fail to…
Descriptors: Generalizability Theory, Observation, Classroom Observation Techniques, Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Leung, Kai-Kuen; Wang, Wei-Dan; Chen, Yen-Yuan – Advances in Health Sciences Education, 2012
There is a lack of information on the use of multi-source evaluation to assess trainees' interpersonal and communication skills in Oriental settings. This study is conducted to assess the reliability and applicability of assessing the interpersonal and communication skills of family medicine residents by patients, peer residents, nurses, and…
Descriptors: Foreign Countries, Clinical Teaching (Health Professions), Communication Skills, Patients
Peer reviewed Peer reviewed
Lunz, Mary E.; Schumacker, Randall E. – Journal of Outcome Measurement, 1997
Results and interpretations of the data from a performance examination were compared for four methods of analysis for 74 medical specialty certification candidates: (1) traditional summary statistics; (2) inter-judge correlations; (3) generalizability theory; and (4) the multifaceted Rasch model. Advantages of the Rasch model are outlined. (SLD)
Descriptors: Comparative Analysis, Data Analysis, Generalizability Theory, Interrater Reliability