ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	12

Descriptor

Generalizability Theory	17
Reliability	17
Scoring	17
Scores	9
Error of Measurement	5
Evaluation Methods	4
Foreign Countries	4
Comparative Analysis	3
Language Tests	3
Performance Based Assessment	3
Second Language Learning	3
Algebra	2
Classroom Observation…	2
English (Second Language)	2
Evaluators	2
Holistic Approach	2
Holistic Evaluation	2
Inferences	2
Mathematics Tests	2
Measurement	2
Measures (Individuals)	2
Multivariate Analysis	2
Scoring Formulas	2
Scoring Rubrics	2
Social Studies	2
More ▼

Source

ETS Research Report Series	2
Educational and Psychological…	2
Applied Measurement in…	1
Asia Pacific Education Review	1
Educational Testing Service	1
Eurasian Journal of…	1
European Journal of…	1
Grantee Submission	1
Language Testing	1
Language Testing in Asia	1
ProQuest LLC	1
Psychological Assessment	1
Reading Psychology	1
More ▼

Publication Type

Reports - Research	14
Journal Articles	13
Speeches/Meeting Papers	3
Reports - Evaluative	2
Dissertations/Theses -…	1
Tests/Questionnaires	1

Education Level

Secondary Education	3
Early Childhood Education	1
Elementary Education	1
Grade 4	1
Grade 5	1
Grade 7	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
Preschool Education	1
More ▼

Audience

Location

Australia	1
Canada	1
Hong Kong	1
Iowa	1
Japan	1
Mexico	1
Netherlands	1
Taiwan	1
Turkey	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Test of English as a Foreign…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

Peer reviewed

Direct link

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…

Descriptors: Automation, Scoring, Social Studies, Test Items

The Consistency of "TOEIC"® Speaking Scores across Ratings and Tasks. Research Report. ETS RR-17-46

Peer reviewed
PDF on ERIC

Download full text

Schmidgall, Jonathan E. – ETS Research Report Series, 2017

This report briefly reviews the design and scoring procedure for the "TOEIC"® Speaking test and summarizes existing evidence about the consistency of TOEIC Speaking test scores. It then describes several analyses conducted using generalizability theory to provide additional information about the consistency of scores across different…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Speech Tests

Using Generalizability Theory to Examine the Dependability of Scores from the Learning Target Rating Scale

Peer reviewed
PDF on ERIC

Download full text

Direct link

McLaughlin, Tara W.; Snyder, Patricia A.; Algina, James – Grantee Submission, 2017

The Learning Target Rating Scale (LTRS) is a measure designed to evaluate the quality of teacher-developed learning targets for embedded instruction for early learning. In the present study, we examined the measurement dependability of LTRS scores by conducting a generalizability study (G-study). We used a partially nested, three-facet model to…

Descriptors: Generalizability Theory, Scores, Rating Scales, Evaluation Methods

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Teaching Historical Contextualization: The Construction of a Reliable Observation Instrument

Peer reviewed

Direct link

Huijgen, Tim; van de Grift, Wim; van Boxtel, Carla; Holthuis, Paul – European Journal of Psychology of Education, 2017

Since the 1970s, many observation instruments have been constructed to map teachers' general pedagogic competencies. However, few of these instruments focus on teachers' subject-specific competencies. This study presents the development of the "Framework for Analyzing the Teaching of Historical Contextualization" (FAT-HC). This…

Descriptors: Measures (Individuals), History Instruction, Teacher Competencies, Content Validity

Rater Reliability and Score Discrepancy under Holistic and Analytic Scoring of Second Language Writing

Peer reviewed

Direct link

Zhang, Bo; Xiao, Yunnan; Luo, Juan – Language Testing in Asia, 2015

Previous studies comparing holistic scoring to analytic scoring of second language writing have given mixed results. Some of them suffer from methodological drawbacks, such as limited writing sample size, limited number of raters, and lack of direct comparison of the two methods. Based on 300 writing samples graded by 14 raters, this research…

Descriptors: Evaluators, Reliability, Scores, Holistic Approach

Using Generalizability Theory to Examine Different Concept Map Scoring Methods

Peer reviewed
PDF on ERIC

Download full text

Cetin, Bayram; Guler, Nese; Sarica, Rabia – Eurasian Journal of Educational Research, 2016

Problem Statement: In addition to being teaching tools, concept maps can be used as effective assessment tools. The use of concept maps for assessment has raised the issue of scoring them. Concept maps generated and used in different ways can be scored via various methods. Holistic and relational scoring methods are two of them. Purpose of the…

Descriptors: Generalizability Theory, Concept Mapping, Scoring, Scoring Formulas

A Generalizability Analysis of Score Consistency for the Balanced Inventory of Desirable Responding

Peer reviewed

Direct link

Vispoel, Walter P.; Tao, Shuqin – Psychological Assessment, 2013

Our goal in this investigation was to evaluate the reliability of scores from the Balanced Inventory of Desirable Responding (BIDR) more comprehensively than in prior research using a generalizability-theory framework based on both dichotomous and polytomous scoring of items. Generalizability coefficients accounting for specific-factor, transient,…

Descriptors: Reliability, Scores, Measures (Individuals), Generalizability Theory

Reliability and Validity of Inferences about Teachers Based on Student Scores. William H. Angoff Memorial Lecture Series

Download full text

Haertel, Edward H. – Educational Testing Service, 2013

Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…

Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness

Effect of Observation Mode on Measures of Secondary Mathematics Teaching

Peer reviewed

Direct link

Casabianca, Jodi M.; McCaffrey, Daniel F.; Gitomer, Drew H.; Bell, Courtney A.; Hamre, Bridget K.; Pianta, Robert C. – Educational and Psychological Measurement, 2013

Classroom observation of teachers is a significant part of educational measurement; measurements of teacher practice are being used in teacher evaluation systems across the country. This research investigated whether observations made live in the classroom and from video recording of the same lessons yielded similar inferences about teaching.…

Descriptors: Secondary School Mathematics, Mathematics Instruction, Classroom Observation Techniques, Algebra

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Generalizability Theory: Measuring the Dependability of Selected Methods for Scoring Classroom Assessments

Direct link

Lengh, Carolyn J. – ProQuest LLC, 2010

This study compares the dependability of four classroom assessment scoring methods. Generalizability theory (G) and alternative decision (D) are used to measure the results of students' classroom assessment scores and compare the results of the four scoring methods on variability of rater by person variance and the level of G and D coefficients…

Descriptors: Generalizability Theory, Scoring, Social Studies, Tests

Using Generalizability Theory To Estimate the Reliability of Writing Scores Derived from Holistic and Analytical Scoring Methods.

Peer reviewed

Swartz, Carl W.; Hooper, Stephen R.; Mongomery, James W.; Wakely, Melissa B.; De Kruif, Renee E. L.; Reed, Martha; Brown, Timothy T.; Levine, Melvin D.; White, Kinnard P. – Educational and Psychological Measurement, 1999

Used generalizability theory to investigate the impact of the number of raters and the type of decision (relative versus absolute) on the reliability of writing scores. Results from 251 middle school students and 20 intermediate grade students show that reliability coefficients decline as the number of raters declines and when absolute decisions…

Descriptors: Estimation (Mathematics), Generalizability Theory, Holistic Evaluation, Intermediate Grades

A Construct-Centered Generalizability Model: Analyzing Underlying Constructs of Cognitively Complex Performance Assessments.

Download full text

Jiang, Ying Hong; Smith, Philip L. – 2000

With a construct-centered reliability analytical approach the reliability analysis should crystallize the multi-traits or constructs that the test specialists developed to measure from student performance and then estimate the degree of fit between the theoretical expectations of test developers and the performance exhibited by students. This…

Descriptors: Cognitive Tests, Construct Validity, Elementary Education, Elementary School Students

Dependability of New ESL Writing Test Scores: Evaluating Prototype Tasks and Alternative Rating Schemes. TOEFL® Monograph Series. MS-31. ETS RR-05-14

Peer reviewed
PDF on ERIC

Download full text

Lee, Yong-Won; Kantor, Robert – ETS Research Report Series, 2005

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of TOEFL® (Test of English as a Foreign Language™) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and…

Descriptors: Language Tests, English (Second Language), Second Language Learning, Writing Tests

Previous Page | Next Page »

Pages: 1 | 2

Algina, James	1
Bell, Courtney A.	1
Brown, Timothy T.	1
Casabianca, Jodi M.	1
Cetin, Bayram	1
Chon, Kyong Hee	1
Daniel, Cathy	1
De Kruif, Renee E. L.	1
Dellinger, Amy	1
Denny, R. Kenton	1
Gitomer, Drew H.	1
Guler, Nese	1
Haertel, Edward H.	1
Hamre, Bridget K.	1
Harnisch, Delwyn L.	1
Holthuis, Paul	1
Hooper, Stephen R.	1
Huijgen, Tim	1
Jiang, Ying Hong	1
Kachchaf, Rachel	1
Kantor, Robert	1
Lee, Yong-Won	1
Lengh, Carolyn J.	1
Levine, Melvin D.	1
Lin, Chih-Kai	1
More ▼