NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Location
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Haertel, Edward H. – Educational Testing Service, 2013
Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…
Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness
Peer reviewed Peer reviewed
Direct linkDirect link
Casabianca, Jodi M.; McCaffrey, Daniel F.; Gitomer, Drew H.; Bell, Courtney A.; Hamre, Bridget K.; Pianta, Robert C. – Educational and Psychological Measurement, 2013
Classroom observation of teachers is a significant part of educational measurement; measurements of teacher practice are being used in teacher evaluation systems across the country. This research investigated whether observations made live in the classroom and from video recording of the same lessons yielded similar inferences about teaching.…
Descriptors: Secondary School Mathematics, Mathematics Instruction, Classroom Observation Techniques, Algebra
Alkahtani, Saif F. – ProQuest LLC, 2012
The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…
Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam
Peer reviewed Peer reviewed
Direct linkDirect link
Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…
Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers
Lengh, Carolyn J. – ProQuest LLC, 2010
This study compares the dependability of four classroom assessment scoring methods. Generalizability theory (G) and alternative decision (D) are used to measure the results of students' classroom assessment scores and compare the results of the four scoring methods on variability of rater by person variance and the level of G and D coefficients…
Descriptors: Generalizability Theory, Scoring, Social Studies, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Solano-Flores, Guillermo; Li, Min – Educational Measurement: Issues and Practice, 2009
We addressed the challenge of scoring cognitive interviews in research involving multiple cultural groups. We interviewed 123 fourth- and fifth-grade students from three cultural groups to probe how they related a mathematics item to their personal lives. Item meaningfulness--the tendency of students to relate the content and/or context of an item…
Descriptors: Generalizability Theory, Scoring, Error of Measurement, Grade 5
Peer reviewed Peer reviewed
Direct linkDirect link
Raymond, Mark R.; Neustel, Sandra; Anderson, Dan – Educational Measurement: Issues and Practice, 2009
Examinees who take high-stakes assessments are usually given an opportunity to repeat the test if they are unsuccessful on their initial attempt. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign a different test form to repeat examinees. The use of multiple…
Descriptors: Test Results, Test Items, Testing, Aptitude Tests
Peer reviewed Peer reviewed
Norcini, John J.; And Others – Evaluation and the Health Professions, 1990
Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)
Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory
Haertel, Edward H. – 1992
Classical test theory, item response theory, and generalizability theory all treat the abilities to be measured as continuous variables, and the items of a test as independent probes of underlying continua. These models are well-suited to measuring the broad, diffuse traits of traditional differential psychology, but not for measuring the outcomes…
Descriptors: Ability, Data Analysis, Error of Measurement, Generalizability Theory
Shale, Doug – 1986
This study is an attempt at a cohesive characterization of the concept of essay reliability. As such, it takes as a basic premise that previous and current practices in reporting reliability estimates for essay tests have certain shortcomings. The study provides an analysis of these shortcomings--partly to encourage a fuller understanding of the…
Descriptors: Analysis of Variance, Correlation, Error of Measurement, Essay Tests
Peer reviewed Peer reviewed
Brennan, Robert L.; And Others – Educational and Psychological Measurement, 1995
Generalizability theory is used to examine the psychometric characteristics of the Listening and Writing Tests developed by American College Testing for its Work Keys program. Results with samples of 50 suggest the desirability of a minimum number of the tests' tape-recorded messages and the use of at least 2 raters. (SLD)
Descriptors: Audiotape Recordings, Error of Measurement, Generalizability Theory, Interaction
Smith, Teresa A. – 1997
The Third International Mathematics and Science Study (TIMSS) measured mathematics and science achievement of middle school students in more than 40 countries. About one quarter of the tests' nearly 300 items were free response items requiring students to generate their own answers. Scoring these responses used a two-digit diagnostic code rubric…
Descriptors: Comparative Education, English, Error of Measurement, Foreign Countries
Shavelson, Richard J.; And Others – 1993
In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…
Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators