ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	14

Descriptor

Generalizability Theory	27
Interrater Reliability	27
Scoring	27
Test Reliability	9
Scores	8
Error of Measurement	7
Scoring Rubrics	6
Evaluators	5
Performance Based Assessment	5
Writing Evaluation	5
Writing Tests	5
Evaluation Methods	4
Analysis of Variance	3
Classroom Observation…	3
Educational Assessment	3
Foreign Countries	3
Higher Education	3
Item Response Theory	3
Sampling	3
Test Items	3
Accuracy	2
Computer Assisted Testing	2
Difficulty Level	2
Direct Instruction	2
Elementary School Students	2
More ▼

Source

Educational and Psychological…	2
Advances in Health Sciences…	1
Applied Measurement in…	1
Assessing Writing	1
Educational Measurement:…	1
European Journal of…	1
Grantee Submission	1
International Journal of…	1
Journal of Experimental…	1
Journal of Technology and…	1
Journal of Technology,…	1
Language Testing	1
Practical Assessment,…	1
ProQuest LLC	1
Reading Psychology	1
School Psychology Quarterly	1
More ▼

Publication Type

Reports - Research	16
Journal Articles	15
Reports - Evaluative	10
Speeches/Meeting Papers	9
Tests/Questionnaires	3
Dissertations/Theses -…	1

Education Level

Elementary Education	3
Higher Education	3
Postsecondary Education	3
Grade 4	1
Grade 8	1

Audience

Researchers

Location

Turkey	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Teacher Performance…	1
Texas Assessment of Academic…	1
Trends in International…	1
United States Medical…	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 27 results Save | Export

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach

Peer reviewed

Direct link

Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Journal of Experimental Education, 2022

In this study, we examined the scoring and generalizability assumptions of an explicit instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…

Descriptors: Direct Instruction, Teacher Education, Classroom Observation Techniques, Validity

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach

Peer reviewed
PDF on ERIC

Download full text

Direct link

Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Grantee Submission, 2020

In this study, we examined the scoring and generalizability assumptions of an Explicit Instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…

Descriptors: Direct Instruction, Teacher Evaluation, Classroom Observation Techniques, Validity

Reliability of the Analytic Rubric and Checklist for the Assessment of Story Writing Skills: G and Decision Study in Generalizability Theory

Peer reviewed
PDF on ERIC

Download full text

Uzun, N. Bilge; Alici, Devrim; Aktas, Mehtap – European Journal of Educational Research, 2019

The purpose of study is to examine the reliability of analytical rubrics and checklists developed for the assessment of story writing skills by means of generalizability theory. The study group consisted of 52 students attending the 5th grade at primary school and 20 raters in Mersin University. The G study was carried out with the fully crossed…

Descriptors: Foreign Countries, Scoring Rubrics, Check Lists, Writing Tests

Using Rater Cognition to Improve Generalizability of an Assessment of Scientific Argumentation

Peer reviewed
PDF on ERIC

Download full text

Borowiec, Katrina; Castle, Courtney – Practical Assessment, Research & Evaluation, 2019

Rater cognition or "think-aloud" studies have historically been used to enhance rater accuracy and consistency in writing and language assessments. As assessments are developed for new, complex constructs from the "Next Generation Science Standards (NGSS)," the present study illustrates the utility of extending…

Descriptors: Evaluators, Scoring, Scoring Rubrics, Protocol Analysis

Dependability of Data Derived from Time Sampling Methods with Multiple Observation Targets

Peer reviewed

Direct link

Johnson, Austin H.; Chafouleas, Sandra M.; Briesch, Amy M. – School Psychology Quarterly, 2017

In this study, generalizability theory was used to examine the extent to which (a) time-sampling methodology, (b) number of simultaneous behavior targets, and (c) individual raters influenced variance in ratings of academic engagement for an elementary-aged student. Ten graduate-student raters, with an average of 7.20 hr of previous training in…

Descriptors: Generalizability Theory, Sampling, Elementary School Students, Learner Engagement

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Inter-Rater Reliability and Generalizability of Patient Note Scores Using a Scoring Rubric Based on the USMLE Step-2 CS Format

Peer reviewed

Direct link

Park, Yoon Soo; Hyderi, Abbas; Bordage, Georges; Xing, Kuan; Yudkowsky, Rachel – Advances in Health Sciences Education, 2016

Recent changes to the patient note (PN) format of the United States Medical Licensing Examination have challenged medical schools to improve the instruction and assessment of students taking the Step-2 clinical skills examination. The purpose of this study was to gather validity evidence regarding response process and internal structure, focusing…

Descriptors: Interrater Reliability, Generalizability Theory, Licensing Examinations (Professions), Physicians

Reliability of Ratings of Children's Expressive Reading

Peer reviewed

Direct link

Moser, Gary P.; Sudweeks, Richard R.; Morrison, Timothy G.; Wilcox, Brad – Reading Psychology, 2014

This study examined ratings of fourth graders' oral reading expression. Randomly assigned participants (n = 36) practiced repeated readings using narrative or informational passages for 7 weeks. After this period raters used the "Multidimensional Fluency Scale" (MFS) on two separate occasions to rate students' expressive…

Descriptors: Elementary School Students, Oral Reading, Reading Skills, Suprasegmentals

Oral Performace Scoring Using Generalizability Theory and Many-Facet Rasch Measurement: A Comparison Study

Direct link

Alkahtani, Saif F. – ProQuest LLC, 2012

The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…

Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam

Interrater Reliability of a Team-Scored Electronic Portfolio

Peer reviewed

Direct link

Yao, Yuankun; Foster, Karen; Aldrich, Jennifer – Journal of Technology and Teacher Education, 2009

This study applied generalizability theory to investigate the interrater reliability of a team-scored electronic portfolio required for initial teacher certification. The sample consisted of 31 preservice teacher portfolios which were assigned to three groups of portfolio review teams. The review teams, which had received several rounds of…

Descriptors: Interrater Reliability, Portfolio Assessment, Generalizability Theory, Electronic Publishing

Bringing Reading-to-Write and Writing-Only Assessment Tasks Together: A Generalizability Analysis

Peer reviewed

Direct link

Gebril, Atta – Assessing Writing, 2010

Integrated tasks are currently employed in a number of L2 exams since they are perceived as an addition to the writing-only task type. Given this trend, the current study investigates composite score generalizability of both reading-to-write and writing-only tasks. For this purpose, a multivariate generalizability analysis is used to investigate…

Descriptors: Scoring, Scores, Second Language Instruction, Writing Evaluation

When Inter-Rater Reliability Is Obtained from Only Part of a Sample.

Download full text

Fan, Xitao; Chen, Michael – 1999

It is erroneous to extend or generalize the inter-rater reliability coefficient estimated from only a (small) proportion of the sample to the rest of the sample data where only one rater is used for scoring, although such generalization is often made implicitly in practice. It is shown that if inter-rater reliability estimate from part of a sample…

Descriptors: Estimation (Mathematics), Generalizability Theory, Interrater Reliability, Sample Size

Toward More Substantively Meaningful Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Ben-Simon, Anat; Bennett, Randy Elliott – Journal of Technology, Learning, and Assessment, 2007

This study evaluated a "substantively driven" method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater[R], to compare the performance of three approaches to automated essay scoring: a "brute-empirical" approach in which variables are selected and weighted solely according to…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

Previous Page | Next Page »

Pages: 1 | 2

Crawford, Angela R.	2
Johnson, Evelyn S.	2
Moylan, Laura A.	2
Zheng, Yuzhu	2
Aksu, Gökhan	1
Aktas, Mehtap	1
Aldrich, Jennifer	1
Alici, Devrim	1
Alkahtani, Saif F.	1
Ben-Simon, Anat	1
Bennett, Randy Elliot	1
Bennett, Randy Elliott	1
Bimpeh, Yaw	1
Bordage, Georges	1
Borowiec, Katrina	1
Brennan, Robert L.	1
Briesch, Amy M.	1
Buhr, Dianne C.	1
Burton, Elizabeth	1
Capie, William	1
Castle, Courtney	1
Chafouleas, Sandra M.	1
Chen, Michael	1
Crehan, Kevin D.	1
More ▼