ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	2

Descriptor

Generalizability Theory	6
Interrater Reliability	6
Models	6
Essays	3
Scoring Rubrics	3
Essay Tests	2
Item Response Theory	2
Performance Based Assessment	2
Research Methodology	2
Scores	2
Statistical Analysis	2
Academic Achievement	1
Attention	1
Classroom Observation…	1
College Students	1
Content Validity	1
Criteria	1
Difficulty Level	1
Educational Testing	1
Estimation (Mathematics)	1
Evaluation	1
Evaluators	1
High School Students	1
High Schools	1
Higher Education	1
More ▼

Source

Applied Psychological…	1
Educational and Psychological…	1
Journal of Educational and…	1
Language Assessment Quarterly	1
Society for Research on…	1

Author

Abedi, Jamal	1
Ahmet Guven	1
Baker, Eva L.	1
Buhr, Dianne C.	1
Claire Riddell	1
Clauser, Brian E.	1
Dovell, Patricia	1
Fisher, Steven P.	1
Gordon, Belita	1
Harik, Polina	1
Johnson, Robert L.	1
Lanrong Li	1
Longford, N. T.	1
Penny, James	1
Raymond, Mark R.	1
Robert Schoen	1
Shumate, Steven R.	1
Xiaotong Yang	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	4
Reports - Evaluative	2
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Using a Many-Facet Rasch Model to Gain Insight into Measurement of Instructional Practice in Mathematics

Peer reviewed

Direct link

Robert Schoen; Lanrong Li; Xiaotong Yang; Ahmet Guven; Claire Riddell – Society for Research on Educational Effectiveness, 2021

Many classroom-observation instruments have been developed (e.g., Gleason et al., 2017; Nava et al., 2019; Sawada et al., 2002), but a very small number of studies published in refereed journals have rigorously examined the quality of the ratings and the instrument using measurement models. For example, Gleason et al. developed a mathematics…

Descriptors: Item Response Theory, Models, Measurement, Mathematics Instruction

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors of Performance Ratings

Peer reviewed

Direct link

Raymond, Mark R.; Harik, Polina; Clauser, Brian E. – Applied Psychological Measurement, 2011

Prior research indicates that the overall reliability of performance ratings can be improved by using ordinary least squares (OLS) regression to adjust for rater effects. The present investigation extends previous work by evaluating the impact of OLS adjustment on standard errors of measurement ("SEM") at specific score levels. In…

Descriptors: Performance Based Assessment, Licensing Examinations (Professions), Least Squares Statistics, Item Response Theory

Reliability of Essay Rating and Score Adjustment.

Peer reviewed

Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994

Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)

Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability

Resolving Score Differences in the Rating of Writing Samples: Does Discussion Improve the Accuracy of Scores?

Peer reviewed

Direct link

Johnson, Robert L.; Penny, James; Gordon, Belita; Shumate, Steven R.; Fisher, Steven P. – Language Assessment Quarterly, 2005

Many studies have indicated that at least 2 raters should score writing assessments to improve interrater reliability. However, even for assessments that characteristically demonstrate high levels of rater agreement, 2 raters of the same essay can occasionally report different, or discrepant, scores. If a single score, typically referred to as an…

Descriptors: Interrater Reliability, Scores, Evaluation, Reliability

A Latent-Variable Modeling Approach to Assessing Interrater Reliability, Topic Generalizability, and Validity of a Content Assessment Scoring Rubric.

Peer reviewed

Abedi, Jamal; Baker, Eva L. – Educational and Psychological Measurement, 1995

Results from a performance assessment in which 68 high school students wrote essays support the use of latent variable modeling for estimating reliability, concurrent validity, and generalizability of a scoring rubric. The latent variable modeling approach overcomes the limitations of certain conventional statistical techniques in handling…

Descriptors: Criteria, Essays, Estimation (Mathematics), Generalizability Theory

Essay Topic Difficulty in Relation to Scoring Models.

Dovell, Patricia; Buhr, Dianne C. – 1986

This study examined the difficulty level of essay topics used in the large-scale assessment of writing in relation to five different scoring models, and sought to determine what effects the scoring models would have on passing rates. In model one, examinee's score is the direct result of a score assigned by the reader or the sum of scores assigned…

Descriptors: College Students, Difficulty Level, Essay Tests, Essays