NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 6 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Robert Schoen; Lanrong Li; Xiaotong Yang; Ahmet Guven; Claire Riddell – Society for Research on Educational Effectiveness, 2021
Many classroom-observation instruments have been developed (e.g., Gleason et al., 2017; Nava et al., 2019; Sawada et al., 2002), but a very small number of studies published in refereed journals have rigorously examined the quality of the ratings and the instrument using measurement models. For example, Gleason et al. developed a mathematics…
Descriptors: Item Response Theory, Models, Measurement, Mathematics Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Raymond, Mark R.; Harik, Polina; Clauser, Brian E. – Applied Psychological Measurement, 2011
Prior research indicates that the overall reliability of performance ratings can be improved by using ordinary least squares (OLS) regression to adjust for rater effects. The present investigation extends previous work by evaluating the impact of OLS adjustment on standard errors of measurement ("SEM") at specific score levels. In…
Descriptors: Performance Based Assessment, Licensing Examinations (Professions), Least Squares Statistics, Item Response Theory
Peer reviewed Peer reviewed
Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994
Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)
Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Robert L.; Penny, James; Gordon, Belita; Shumate, Steven R.; Fisher, Steven P. – Language Assessment Quarterly, 2005
Many studies have indicated that at least 2 raters should score writing assessments to improve interrater reliability. However, even for assessments that characteristically demonstrate high levels of rater agreement, 2 raters of the same essay can occasionally report different, or discrepant, scores. If a single score, typically referred to as an…
Descriptors: Interrater Reliability, Scores, Evaluation, Reliability
Peer reviewed Peer reviewed
Abedi, Jamal; Baker, Eva L. – Educational and Psychological Measurement, 1995
Results from a performance assessment in which 68 high school students wrote essays support the use of latent variable modeling for estimating reliability, concurrent validity, and generalizability of a scoring rubric. The latent variable modeling approach overcomes the limitations of certain conventional statistical techniques in handling…
Descriptors: Criteria, Essays, Estimation (Mathematics), Generalizability Theory
Dovell, Patricia; Buhr, Dianne C. – 1986
This study examined the difficulty level of essay topics used in the large-scale assessment of writing in relation to five different scoring models, and sought to determine what effects the scoring models would have on passing rates. In model one, examinee's score is the direct result of a score assigned by the reader or the sum of scores assigned…
Descriptors: College Students, Difficulty Level, Essay Tests, Essays