ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	9

Descriptor

Error of Measurement	15
Generalizability Theory	15
Scoring	15
Interrater Reliability	7
Reliability	5
Scores	5
Measurement Techniques	3
Test Items	3
Test Reliability	3
Comparative Analysis	2
Correlation	2
Data Analysis	2
English	2
Evaluation Methods	2
Evaluators	2
Grade 4	2
Grade 5	2
Inferences	2
Item Response Theory	2
Mathematics	2
Performance Based Assessment	2
Psychometrics	2
Scoring Rubrics	2
Test Interpretation	2
Test Results	2
More ▼

Source

Educational Measurement:…	2
Educational and Psychological…	2
ProQuest LLC	2
Applied Measurement in…	1
Educational Testing Service	1
Evaluation and the Health…	1
International Journal of…	1
Language Testing	1

Publication Type

Journal Articles	8
Reports - Research	6
Reports - Evaluative	5
Speeches/Meeting Papers	4
Dissertations/Theses -…	2
Opinion Papers	1
Reports - Descriptive	1

Education Level

Grade 4	2
Grade 5	2
Elementary Education	1
Elementary Secondary Education	1
Secondary Education	1

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Trends in International…	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Reliability and Validity of Inferences about Teachers Based on Student Scores. William H. Angoff Memorial Lecture Series

Download full text

Haertel, Edward H. – Educational Testing Service, 2013

Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…

Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness

Effect of Observation Mode on Measures of Secondary Mathematics Teaching

Peer reviewed

Direct link

Casabianca, Jodi M.; McCaffrey, Daniel F.; Gitomer, Drew H.; Bell, Courtney A.; Hamre, Bridget K.; Pianta, Robert C. – Educational and Psychological Measurement, 2013

Classroom observation of teachers is a significant part of educational measurement; measurements of teacher practice are being used in teacher evaluation systems across the country. This research investigated whether observations made live in the classroom and from video recording of the same lessons yielded similar inferences about teaching.…

Descriptors: Secondary School Mathematics, Mathematics Instruction, Classroom Observation Techniques, Algebra

Oral Performace Scoring Using Generalizability Theory and Many-Facet Rasch Measurement: A Comparison Study

Direct link

Alkahtani, Saif F. – ProQuest LLC, 2012

The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…

Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Generalizability Theory: Measuring the Dependability of Selected Methods for Scoring Classroom Assessments

Direct link

Lengh, Carolyn J. – ProQuest LLC, 2010

This study compares the dependability of four classroom assessment scoring methods. Generalizability theory (G) and alternative decision (D) are used to measure the results of students' classroom assessment scores and compare the results of the four scoring methods on variability of rater by person variance and the level of G and D coefficients…

Descriptors: Generalizability Theory, Scoring, Social Studies, Tests

Generalizability of Cognitive Interview-Based Measures across Cultural Groups

Peer reviewed

Direct link

Solano-Flores, Guillermo; Li, Min – Educational Measurement: Issues and Practice, 2009

We addressed the challenge of scoring cognitive interviews in research involving multiple cultural groups. We interviewed 123 fourth- and fifth-grade students from three cultural groups to probe how they related a mathematics item to their personal lives. Item meaningfulness--the tendency of students to relate the content and/or context of an item…

Descriptors: Generalizability Theory, Scoring, Error of Measurement, Grade 5

Same-Form Retest Effects on Credentialing Examinations

Peer reviewed

Direct link

Raymond, Mark R.; Neustel, Sandra; Anderson, Dan – Educational Measurement: Issues and Practice, 2009

Examinees who take high-stakes assessments are usually given an opportunity to repeat the test if they are unsuccessful on their initial attempt. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign a different test form to repeat examinees. The use of multiple…

Descriptors: Test Results, Test Items, Testing, Aptitude Tests

The Use of Aggregate Scoring for a Recertifying Examination.

Peer reviewed

Norcini, John J.; And Others – Evaluation and the Health Professions, 1990

Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)

Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory

Latent Traits or Latent States? The Role of Discrete Models for Ability and Performance.

Download full text

Haertel, Edward H. – 1992

Classical test theory, item response theory, and generalizability theory all treat the abilities to be measured as continuous variables, and the items of a test as independent probes of underlying continua. These models are well-suited to measuring the broad, diffuse traits of traditional differential psychology, but not for measuring the outcomes…

Descriptors: Ability, Data Analysis, Error of Measurement, Generalizability Theory

Essay Reliability: Form and Meaning.

Download full text

Shale, Doug – 1986

This study is an attempt at a cohesive characterization of the concept of essay reliability. As such, it takes as a basic premise that previous and current practices in reporting reliability estimates for essay tests have certain shortcomings. The study provides an analysis of these shortcomings--partly to encourage a fuller understanding of the…

Descriptors: Analysis of Variance, Correlation, Error of Measurement, Essay Tests

Generalizability Analyses of Work Keys Listening and Writing Tests.

Peer reviewed

Brennan, Robert L.; And Others – Educational and Psychological Measurement, 1995

Generalizability theory is used to examine the psychometric characteristics of the Listening and Writing Tests developed by American College Testing for its Work Keys program. Results with samples of 50 suggest the desirability of a minimum number of the tests' tape-recorded messages and the use of at least 2 raters. (SLD)

Descriptors: Audiotape Recordings, Error of Measurement, Generalizability Theory, Interaction

The Generalizability of Scoring TIMSS Open-Ended Items.

Download full text

Smith, Teresa A. – 1997

The Third International Mathematics and Science Study (TIMSS) measured mathematics and science achievement of middle school students in more than 40 countries. About one quarter of the tests' nearly 300 items were free response items requiring students to generate their own answers. Scoring these responses used a two-digit diagnostic code rubric…

Descriptors: Comparative Education, English, Error of Measurement, Foreign Countries

Sampling Variability of Performance Assessments. Report on the Status of Generalizability Performance: Generalizability and Transfer of Performance Assessments. Project 2.4: Design Theory and Psychometrics for Complex Performance Assessment in Science.

Download full text

Shavelson, Richard J.; And Others – 1993

In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…

Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators

Haertel, Edward H.	2
Solano-Flores, Guillermo	2
Aksu, Gökhan	1
Alkahtani, Saif F.	1
Anderson, Dan	1
Bell, Courtney A.	1
Brennan, Robert L.	1
Casabianca, Jodi M.	1
Eser, Mehmet Taha	1
Gitomer, Drew H.	1
Hamre, Bridget K.	1
Kachchaf, Rachel	1
Lengh, Carolyn J.	1
Li, Min	1
Lin, Chih-Kai	1
McCaffrey, Daniel F.	1
Neustel, Sandra	1
Norcini, John J.	1
Pianta, Robert C.	1
Raymond, Mark R.	1
Shale, Doug	1
Shavelson, Richard J.	1
Smith, Teresa A.	1
More ▼