Publication Date
In 2025 | 2 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 37 |
Descriptor
Generalizability Theory | 65 |
Test Validity | 65 |
Test Reliability | 37 |
Interrater Reliability | 15 |
Psychometrics | 15 |
Evaluation Methods | 12 |
Performance Based Assessment | 12 |
Scores | 12 |
Test Construction | 11 |
Statistical Analysis | 10 |
Factor Analysis | 9 |
More ▼ |
Source
Author
Bishop, Crystal Crowe | 2 |
Fox, Lise | 2 |
Hemmeter, Mary Louise | 2 |
Miller, M. David | 2 |
Shavelson, Richard J. | 2 |
Snyder, Patricia A. | 2 |
Abedi, Jamal | 1 |
Agboh, Darren | 1 |
Akaeze, Hope O. | 1 |
Alkhuwaiter, Munirah | 1 |
Allegra, Laurie | 1 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 6 |
Policymakers | 1 |
Location
Canada | 2 |
Tennessee | 2 |
Belgium | 1 |
California | 1 |
Colorado | 1 |
Cyprus | 1 |
Georgia | 1 |
Japan | 1 |
Michigan | 1 |
Netherlands | 1 |
Norway | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Khodi, Ali – Language Testing in Asia, 2021
The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and…
Descriptors: Writing Tests, Scores, Generalizability Theory, English (Second Language)
Lyrica Lucas; Anum Khushal; Robert Mayes; Brian A. Couch; Joseph Dauer – International Journal of Science Education, 2025
Educational reform priorities such as emphasis on quantitative modelling (QM) have positioned undergraduate biology instructors as designers of QM experiences to engage students in authentic science practices that support the development of data-driven and evidence-based reasoning. Yet, little is known about how biology instructors adapt to the…
Descriptors: Undergraduate Students, College Science, Biology, Classroom Observation Techniques
Paul T. von Hippel; Brendan A. Schuetze – Annenberg Institute for School Reform at Brown University, 2025
Researchers across many fields have called for greater attention to heterogeneity of treatment effects--shifting focus from the average effect to variation in effects between different treatments, studies, or subgroups. True heterogeneity is important, but many reports of heterogeneity have proved to be false, non-replicable, or exaggerated. In…
Descriptors: Educational Research, Replication (Evaluation), Generalizability Theory, Inferences
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Validity is broadly defined as how well something measures what it's supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other.
Descriptors: Screening Tests, Scores, Test Validity, Test Reliability
Hatala, Rose; Gutman, Jacqueline; Lineberry, Matthew; Triola, Marc; Pusic, Martin – Advances in Health Sciences Education, 2019
Learning curves can support a competency-based approach to assessment for learning. When interpreting repeated assessment data displayed as learning curves, a key assessment question is: "How well is each learner learning?" We outline the validity argument and investigation relevant to this question, for a computer-based repeated…
Descriptors: Medicine, Metabolism, Physicians, Clinical Diagnosis
Clain, Alex E.; Alkhuwaiter, Munirah; Davidson, Kate; Martin-Harris, Bonnie – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The purpose of this study was to extend the assessment of the psychometric properties of the Modified Barium Swallow Impairment Profile (MBSImP). Here, we re-examined structural validity and internal consistency using a large clinical-registry data set and formally examined rater reliability in a smaller data set. Method: This study…
Descriptors: Diagnostic Tests, Disability Identification, Physical Disabilities, Eating Disorders
Akaeze, Hope O.; Wu, Jamie Heng-Chieh; Lawrence, Frank R.; Weber, Everett P. – Journal of Psychoeducational Assessment, 2023
This paper reports an investigation into the psychometric properties of the COR-Advantage1.5 (COR-Adv1.5) assessment tool, a criterion-referenced observation-based instrument designed to assess the developmental abilities of children from birth through kindergarten. Using data from 8534 children participating in a state-funded preschool program…
Descriptors: Criterion Referenced Tests, Evaluation Methods, Measures (Individuals), Measurement Techniques
May, Henry; Blackman, Horatio; Van Horne, Sam; Tilley, Katherine; Farley-Ripple, Elizabeth N.; Shewchuk, Samantha; Agboh, Darren; Micklos, Deborah Amsden – Center for Research Use in Education, 2022
In this technical report, the Center for Research Use in Education (CRUE) presents the methodological design of a large-scale quantitative investigation of research use by school-based practitioners through the "Survey of Evidence in Education for Schools (SEE-S)." It documents the major technical aspects of the development of SEE-S,…
Descriptors: Surveys, Schools, Educational Research, Research Utilization
Peck, Charles A.; Young, Maia Goodman; Zhang, Wenqi – National Academy of Education, 2021
In this paper the authors examine the uses of teaching performance assessments (TPAs) as resources for learning, program evaluation, and improvement in teacher education. The authors begin by outlining their conceptual framing and related research questions about the uses of TPAs as resources for program evaluation and improvement. They describe…
Descriptors: Performance Based Assessment, Preservice Teachers, Teacher Evaluation, Program Evaluation
Fosnacht, Kevin; Gonyea, Robert M. – Research & Practice in Assessment, 2018
This study utilized generalizability theory to assess the context where the National Survey of Student Engagement's (NSSE) summary measures, the Engagement Indicators, produce dependable group-level means. The dependability of NSSE group means is an important topic for the higher education assessment community given its wide utilization and usage…
Descriptors: College Freshmen, College Seniors, Learner Engagement, National Surveys
Irby, Sarah M.; Floyd, Randy G. – Psychology in the Schools, 2017
This study examined the exchangeability of total scores (i.e., intelligent quotients [IQs]) from three brief intelligence tests. Tests were administered to 36 children with intellectual giftedness, scored live by one set of primary examiners and later scored by a secondary examiner. For each student, six IQs were calculated, and all 216 values…
Descriptors: Intelligence Tests, Gifted, Error of Measurement, Scores
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Charalambous, Charalambos Y.; Kyriakides, Ermis; Tsangaridou, Niki; Kyriakides, Leonidas – School Effectiveness and School Improvement, 2017
Heightened accountability pressures and an increased emphasis on teaching quality have directed scholarly attention to scrutinizing instruction, particularly with respect to issues of validity and reliability. However, these attempts have largely been directed toward "core" content areas and investigated generic or content-specific…
Descriptors: Physical Education, Instructional Effectiveness, Lesson Plans, Interrater Reliability
Oh, Yoonkyung; Osgood, D. Wayne; Smith, Emilie P. – Journal of Early Adolescence, 2015
The importance of afterschool hours for youth development is widely acknowledged, and afterschool settings have recently received increasing attention as an important venue for youth interventions, bringing a growing need for reliable and valid measures of afterschool quality. This study examined the extent to which the two observational tools,…
Descriptors: After School Programs, Program Effectiveness, Observation, Rating Scales
Crawford, Angela R.; Johnson, Evelyn S.; Moylan, Laura A.; Zheng, Yuzhu – Grantee Submission, 2018
This study describes the development and initial psychometric evaluation of a Recognizing Effective Special Education Teachers (RESET) teacher observation instrument. Specifically, the study uses generalizability theory to compare two versions of a rubric, one with general descriptors of performance levels and one with item-specific descriptors of…
Descriptors: Special Education Teachers, Direct Instruction, Observation, Teaching Methods