ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	8

Descriptor

Error of Measurement	11
Evaluators	11
Scores	11
Interrater Reliability	6
Generalizability Theory	4
Scoring	4
Accuracy	3
Correlation	3
Performance Based Assessment	3
Rating Scales	3
Reliability	3
Academic Discourse	2
Educational Assessment	2
Elementary School Students	2
English (Second Language)	2
Evaluation Methods	2
Item Response Theory	2
Language Tests	2
Models	2
Scaling	2
Second Language Learning	2
Training	2
Validity	2
Academic Achievement	1
Accountability	1
More ▼

Source

Applied Measurement in…	1
Applied Psychological…	1
ETS Research Report Series	1
Educational Assessment	1
Educational Psychology	1
Educational and Psychological…	1
Journal of Clinical Child and…	1
Language Testing	1
Society for Research on…	1

Publication Type

Journal Articles	8
Reports - Research	8
Reports - Evaluative	2
ERIC Digests in Full Text	1
ERIC Publications	1
Tests/Questionnaires	1

Education Level

Elementary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

Australia

Laws, Policies, & Programs

Assessments and Surveys

Flesch Kincaid Grade Level…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Improving the Precision of Classroom Observation Scores Using a Multi-Rater and Multi-Timepoint Item Response Theory Model

Peer reviewed

Direct link

Kelly Edwards; James Soland – Educational Assessment, 2024

Classroom observational protocols, in which raters observe and score the quality of teachers' instructional practices, are often used to evaluate teachers for consequential purposes despite evidence that scores from such protocols are frequently driven by factors, such as rater and temporal effects, that have little to do with teacher quality. In…

Descriptors: Classroom Observation Techniques, Teacher Evaluation, Accuracy, Scores

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

High-Dimensional Explanatory Random Item Effects Models for Rater-Mediated Assessments

Peer reviewed
PDF on ERIC

Download full text

Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016

Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…

Descriptors: Test Items, Models, Evaluators, Longitudinal Studies

Understanding the Growth of ESL Paragraph Writing Skills and Its Relationships with Linguistic Features

Peer reviewed

Direct link

Aryadoust, Vahid – Educational Psychology, 2016

This study sought to examine the development of paragraph writing skills of 116 English as a second language university students over the course of 12 weeks and the relationship between the linguistic features of students' written texts as measured by Coh-Metrix--a computational system for estimating textual features such as cohesion and…

Descriptors: English (Second Language), Second Language Learning, Writing Skills, College Students

Applications of Generalizability Theory to Clinical Child and Adolescent Psychology Research

Peer reviewed

Direct link

Lakes, Kimberley D.; Hoyt, William T. – Journal of Clinical Child and Adolescent Psychology, 2009

Using generalizability theory to evaluate the reliability of child and adolescent measures enables researchers to enhance precision of measurement and consequently increase confidence in research findings. With an observer-rated measure of child self-regulation, we illustrate how multiple sources of error variance (e.g., raters, items) affect the…

Descriptors: Generalizability Theory, Error of Measurement, Children, Adolescents

Generalizability of Scaling Gradients on Direct Behavior Ratings

Peer reviewed

Direct link

Chafouleas, Sandra M.; Christ, Theodore J.; Riley-Tillman, T. Chris – Educational and Psychological Measurement, 2009

Generalizability theory is used to examine the impact of scaling gradients on a single-item Direct Behavior Rating (DBR). A DBR refers to a type of rating scale used to efficiently record target behavior(s) following an observation occasion. Variance components associated with scale gradients are estimated using a random effects design for persons…

Descriptors: Generalizability Theory, Undergraduate Students, Scaling, Rating Scales

Coefficients for Interrater Agreement.

Peer reviewed

Zegers, Frits E. – Applied Psychological Measurement, 1991

The degree of agreement between two raters rating several objects for a single characteristic can be expressed through an association coefficient, such as the Pearson product-moment correlation. How to select an appropriate association coefficient, and the desirable properties and uses of a class of such coefficients--the Euclidean…

Descriptors: Classification, Correlation, Data Interpretation, Equations (Mathematics)

Reducing Errors Due to the Use of Judges. ERIC/TM Digest.

Download full text

Rudner, Lawrence M. – 1992

Several common sources of error in assessment that depends on the use of judges are identified, and ways to reduce the impact of rating errors are examined. Numerous threats to the validity of scores based on ratings exist. These threats include: (1) the halo effect; (2) stereotyping; (3) perception differences; (4) leniency/stringency error; and…

Descriptors: Alternative Assessment, Error of Measurement, Evaluation Methods, Evaluators

Investigating the Utility of Analytic Scoring for the TOEFL Academic Speaking Test (TAST). TOEFL iBT Research Report. TOEFL iBT-01. ETS RR-06-07

Peer reviewed
PDF on ERIC

Download full text

Xi, Xiaoming; Mollaun, Pam – ETS Research Report Series, 2006

This study explores the utility of analytic scoring for the TOEFL® Academic Speaking Test (TAST) in providing useful and reliable diagnostic information in three aspects of candidates' performance: delivery, language use, and topic development. G studies were used to investigate the dependability of the analytic scores, the distinctness of the…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Oral Language

Sampling Variability of Performance Assessments. Report on the Status of Generalizability Performance: Generalizability and Transfer of Performance Assessments. Project 2.4: Design Theory and Psychometrics for Complex Performance Assessment in Science.

Download full text

Shavelson, Richard J.; And Others – 1993

In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…

Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators

Aryadoust, Vahid	1
Chafouleas, Sandra M.	1
Christ, Theodore J.	1
Cox, Kyle	1
Hoyt, William T.	1
James Soland	1
Kelcey, Ben	1
Kelly Edwards	1
Lakes, Kimberley D.	1
Lee, Won-Chan	1
Lin, Chih-Kai	1
Mollaun, Pam	1
Riley-Tillman, T. Chris	1
Rudner, Lawrence M.	1
Shavelson, Richard J.	1
Song, Yoon Ah	1
Wang, Shanshan	1
Xi, Xiaoming	1
Zegers, Frits E.	1
More ▼