ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	13

Descriptor

Comparative Analysis	17
Generalizability Theory	17
Reliability	17
Scores	5
Error of Measurement	4
Statistical Analysis	4
Multivariate Analysis	3
Scoring	3
Scoring Rubrics	3
Classroom Environment	2
Correlation	2
Educational Environment	2
Elementary Secondary Education	2
Kindergarten	2
Measurement	2
Middle School Students	2
Middle School Teachers	2
Simulation	2
Social Studies	2
Statistical Bias	2
Statistical Studies	2
Student Evaluation	2
Surveys	2
Test Items	2
Validity	2
More ▼

Source

Asia Pacific Education Review	3
ProQuest LLC	2
American Journal of…	1
Applied Measurement in…	1
Assessing Writing	1
ETS Research Report Series	1
Educational and Psychological…	1
Journal of Adolescence	1
Journal of Educational…	1
Middle Grades Research Journal	1
National Center for Research…	1
Psychometrika	1
Reading Psychology	1
More ▼

Publication Type

Journal Articles	13
Reports - Research	10
Reports - Evaluative	5
Dissertations/Theses -…	2
Numerical/Quantitative Data	1

Education Level

Elementary Secondary Education	3
Middle Schools	2
Elementary Education	1
Grade 7	1
Grade 8	1
Higher Education	1
Junior High Schools	1
Kindergarten	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

North Carolina	3
California	2
Canada	1
Florida	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

Peer reviewed

Direct link

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…

Descriptors: Automation, Scoring, Social Studies, Test Items

Quantifying Error in Survey Measures of School and Classroom Environments

Peer reviewed

Direct link

Schweig, Jonathan David – Applied Measurement in Education, 2014

Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…

Descriptors: Error of Measurement, Educational Environment, Classroom Environment, Surveys

Reliability of Scores on the Summative Performance Assessments

Peer reviewed

Direct link

Yang, Yanyun; Oosterhof, Albert; Xia, Yan – Journal of Educational Research, 2015

The authors address the reliability of scores obtained on the summative performance assessments during the pilot year of our research. Contrary to classical test theory, we discussed the advantages of using generalizability theory for estimating reliability of scores for summative performance assessments. Generalizability theory was used as the…

Descriptors: Summative Evaluation, Comparative Analysis, Reliability, Scores

A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

Peer reviewed

Direct link

Lee, Guemin; Park, In-Yong – Asia Pacific Education Review, 2012

Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

Descriptors: Generalizability Theory, Simulation, Computation, Item Response Theory

How Many Classroom Observations Are Sufficient? Empirical Findings in the Context of a Longitudinal Study

Peer reviewed

Direct link

Shih, Jeffrey C.; Ing, Marsha; Tarr, James E. – Middle Grades Research Journal, 2013

One method to investigate classroom quality is for a person to observe what is happening in the classroom. However, this method raises practical and technical concerns such as how many observations to collect, when to collect these observations and who should collect these observations. The purpose of this study is to provide empirical evidence to…

Descriptors: Observation, Longitudinal Studies, Mathematics, Middle School Teachers

Measurement Error in Multilevel Models of School and Classroom Environments: Implications for Reliability, Precision, and Prediction. CRESST Report 828

Download full text

Schweig, Jonathan – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2013

Measuring school and classroom environments has become central in a nation-wide effort to develop comprehensive programs that measure teacher quality and teacher effectiveness. Formulating successful programs necessitates accurate and reliable methods for measuring these environmental variables. This paper uses a generalizability theory framework…

Descriptors: Error of Measurement, Hierarchical Linear Modeling, Educational Environment, Classroom Environment

Stability of Measures from Children's Interviews: The Effects of Time, Sample Length, and Topic

Peer reviewed

Direct link

Heilmann, John; DeBrock, Lindsay; Riley-Tillman, T. Chris – American Journal of Speech-Language Pathology, 2013

Purpose: The purpose of this study was to examine the reliability of, and sources of variability in, language measures from interviews collected from young school-age children. Method: Two 10-min interviews were collected from 20 at-risk kindergarten children by an examiner using a standardized set of questions. Test-retest reliability…

Descriptors: Measures (Individuals), Structured Interviews, Reliability, Kindergarten

Demonstrating Validity Evidence of Meta-Assessment Scores Using Generalizability Theory

Direct link

Orem, Chris D. – ProQuest LLC, 2012

Meta-assessment, or the assessment of assessment, can provide meaningful information about the trustworthiness of an academic program's assessment results (Bresciani, Gardner, & Hickmott, 2009; Palomba & Banta, 1999; Suskie, 2009). Many institutions conduct meta-assessments for their academic programs (Fulcher, Swain, & Orem, 2012),…

Descriptors: Validity, Evidence, Evaluation Methods, Meta Analysis

Generalizability Theory: Measuring the Dependability of Selected Methods for Scoring Classroom Assessments

Direct link

Lengh, Carolyn J. – ProQuest LLC, 2010

This study compares the dependability of four classroom assessment scoring methods. Generalizability theory (G) and alternative decision (D) are used to measure the results of students' classroom assessment scores and compare the results of the four scoring methods on variability of rater by person variance and the level of G and D coefficients…

Descriptors: Generalizability Theory, Scoring, Social Studies, Tests

Estimating Reliability of School-Level Scores Using Multilevel and Generalizability Theory Models

Peer reviewed

Direct link

Jeon, Min-Jeong; Lee, Guemin; Hwang, Jeong-Won; Kang, Sang-Jin – Asia Pacific Education Review, 2009

The purpose of this study was to investigate the methods of estimating the reliability of school-level scores using generalizability theory and multilevel models. Two approaches, "student within schools" and "students within schools and subject areas," were conceptualized and implemented in this study. Four methods resulting from the combination…

Descriptors: Generalizability Theory, Scores, Reliability, Statistical Analysis

How Many Heads Are Better than One? The Reliability and Validity of Teenagers' Self- and Peer Assessments

Peer reviewed

Direct link

Sung, Yao-Ting; Chang, Kuo-En; Chang, Tzyy-Hua; Yu, Wen-Cheng – Journal of Adolescence, 2010

Self- and peer assessments are becoming more popular in classrooms, but there are few data on the reliability and validity of such assessments performed by school children. Because these factors are greatly affected by the number of raters, we conducted two studies to determine the rating behaviours of teenagers in self- and peer assessments, and…

Descriptors: Generalizability Theory, Peer Evaluation, Validity, Reliability

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

How Accurate Are ESL Students' Holistic Writing Scores on Large-Scale Assessments?--A Generalizability Theory Approach

Peer reviewed

Direct link

Huang, Jinyan – Assessing Writing, 2008

Using generalizability theory, this study examined both the rating variability and reliability of ESL students' writing in the provincial English examinations in Canada. Three years' data were used in order to complete the analyses and examine the stability of the results. The major research question that guided this study was: Are there any…

Descriptors: Generalizability Theory, Foreign Countries, English (Second Language), Writing Tests

Inferential Procedures for Multifaceted Coefficients of Generalizability.

Peer reviewed

Schroeder, Marsha L.; Hakstian, A. Ralph – Psychometrika, 1990

A 2-facet measurement model is identified, and its coefficient of generalizability (CG) is examined. Three other multifaceted measurement models and their CGs are identified. An empirical investigation of all four procedures is conducted using data from a study of the psychopathology of 71 prison inmates. (SLD)

Descriptors: Comparative Analysis, Equations (Mathematics), Generalizability Theory, Mathematical Models

Selecting Weighting Schemes in Multivariate Generalizability Studies.

Peer reviewed

Marcoulides, George A. – Educational and Psychological Measurement, 1994

Effects of different weighting schemes on selecting the optimal number of observations in multivariate-multifacet generalizability designs are studied when cost constraints are imposed. Comparison of four schemes through simulation indicates that all four produce similar optimal values and that reliability should be similar. (SLD)

Descriptors: Budgeting, Comparative Analysis, Costs, Factor Analysis

Previous Page | Next Page »

Pages: 1 | 2

Lee, Guemin	2
Brennan, Robert L.	1
Chang, Kuo-En	1
Chang, Tzyy-Hua	1
Chon, Kyong Hee	1
Daniel, Cathy	1
DeBrock, Lindsay	1
Dellinger, Amy	1
Denny, R. Kenton	1
Hakstian, A. Ralph	1
Heilmann, John	1
Huang, Jinyan	1
Hwang, Jeong-Won	1
Ing, Marsha	1
Jeon, Min-Jeong	1
Kang, Sang-Jin	1
Kim, Sooyeon	1
Lengh, Carolyn J.	1
Marcoulides, George A.	1
Moses, Tim	1
Noh, Eun Hee	1
Oosterhof, Albert	1
Orem, Chris D.	1
Park, In-Yong	1
Powers, Taylor	1
More ▼