Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 28 |
Since 2006 (last 20 years) | 57 |
Descriptor
Source
Author
Alonzo, Julie | 6 |
Anderson, Daniel | 6 |
Tindal, Gerald | 6 |
Lai, Cheng-Fei | 4 |
Park, Bitnara Jasmine | 4 |
Al Otaiba, Stephanie | 2 |
Atilgan, Hakan | 2 |
French, Brian F. | 2 |
Gatlin, Brandy | 2 |
Guler, Nese | 2 |
Hebert, Michael | 2 |
More ▼ |
Publication Type
Reports - Research | 46 |
Journal Articles | 43 |
Reports - Evaluative | 8 |
Numerical/Quantitative Data | 5 |
Dissertations/Theses -… | 3 |
Reports - Descriptive | 2 |
Tests/Questionnaires | 2 |
Education Level
Audience
Location
Turkey | 4 |
Cyprus | 2 |
Germany | 2 |
Hong Kong | 2 |
Indiana | 2 |
Norway | 2 |
Texas | 2 |
Turkey (Ankara) | 2 |
Australia | 1 |
Austria | 1 |
California | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Anthony, Christopher J.; Styck, Kara M.; Volpe, Robert J.; Robert, Christopher R. – School Psychology, 2023
Although originally conceived of as a marriage of direct behavioral observation and indirect behavior rating scales, recent research has indicated that Direct Behavior Ratings (DBRs) are affected by rater idiosyncrasies (rater effects) similar to other indirect forms of behavioral assessment. Most of this research has been conducted using…
Descriptors: Item Response Theory, Generalizability Theory, Interrater Reliability, Behavior Rating Scales
Deniz, Kaan Zulfikar; Ilican, Emel – International Journal of Assessment Tools in Education, 2021
This study aims to compare the G and Phi coefficients as estimated by D studies for a measurement tool with the G and Phi coefficients obtained from real cases in which items of differing difficulty levels were added and also to determine the conditions under which the D studies estimated reliability coefficients closer to reality. The study group…
Descriptors: Generalizability Theory, Test Items, Difficulty Level, Test Reliability
Jeffrey Shero; Jessica Logan – Society for Research on Educational Effectiveness, 2024
Background/Context: Previous research in educational assessment has consistently emphasized the importance of reliability as a cornerstone of test quality. Traditional measures of reliability, such as test-retest and split-half reliability, offer a broad view of how internally consistent a measure is but overlook the variability in this internal…
Descriptors: Educational Assessment, Special Education, Students with Disabilities, Learning Disabilities
Huebner, Alan; Skar, Gustaf B. – Practical Assessment, Research & Evaluation, 2021
Writing assessments often consist of students responding to multiple prompts, which are judged by more than one rater. To establish the reliability of these assessments, there exist different methods to disentangle variation due to prompts and raters, including classical test theory, Many Facet Rasch Measurement (MFRM), and Generalizability Theory…
Descriptors: Error of Measurement, Test Theory, Generalizability Theory, Item Response Theory
Atilgan, Hakan; Demir, Elif Kübra; Ogretmen, Tuncay; Basokcu, Tahsin Oguz – International Journal of Progressive Education, 2020
It has become a critical question what the reliability level would be when open-ended questions are used in large-scale selection tests. One of the aims of the present study is to determine what the reliability would be in the event that the answers given by test-takers are scored by experts when open-ended short answer questions are used in…
Descriptors: Foreign Countries, Secondary School Students, Test Items, Test Reliability
Solomon, Benjamin G.; VanDerHeyden, Amanda M.; Solomon, Emily C.; Korzeniewski, Erika R.; Payne, Lexy L.; Campaña, Kayla V.; Dillon, Chasen R. – School Psychology, 2022
Math curriculum-based measurement (CBM) is an essential tool for multi-tiered systems of support decision making, but the reliability of math CBMs has received little research, particularly using more rigorous methods such as generalizability (G) theory. Math CBM is historically organized into two domains: mastery measures and general outcome…
Descriptors: Mathematics Tests, Mathematics Skills, Mathematics Achievement, Curriculum Based Assessment
Merchant, Stefan; Rich, Jessica; Klinger, Don A. – Canadian Journal of Educational Administration and Policy, 2022
Both school and district administrators use the results of standardized, large-scale tests to inform decisions about the need for, or success of, educational programs and interventions. However, test results at the school level are subject to random fluctuations due to changes in cohort, test items, and other factors outside of the school's…
Descriptors: Standardized Tests, Foreign Countries, Generalizability Theory, Scores
Patrick, Helen; French, Brian F.; Mantzicopoulos, Panayota – Journal of Psychoeducational Assessment, 2020
We evaluated the score stability of the Framework for Teaching (FFT), a prominent observation instrument used for teacher evaluation. Three raters each scored 200 reading and mathematics lessons taught by 20 kindergarten teachers. Using Generalizability theory analyses, we decomposed the FFT's Classroom Environment, Instruction, and Total scores…
Descriptors: Teacher Evaluation, Observation, Scores, Test Reliability
D'Agostino, Jerome V.; Rodgers, Emily; Winkler, Christa; Johnson, Tracy; Berenbon, Rebecca – Reading Psychology, 2021
Running Records provide a standardized method for recording and assessing students' oral reading behaviors and are excellent formative assessment tools to guide instructional decision-making. This study expands on prior Running Record reliability work by evaluating the extent to which external raters and teachers consistently assessed students'…
Descriptors: Accuracy, Oral Reading, Generalizability Theory, Error Correction
Chan, Wendy; Oh, Jimin; Luo, Peihao – Journal of Research on Educational Effectiveness, 2021
Findings from experimental studies have increasingly been used to inform policy in school settings. Thus far, the populations in many of these studies are typically defined in a cross-sectional context; namely, the populations are defined in the same academic year in which the study took place or the population is defined at a fixed time point.…
Descriptors: Generalization, Research Design, Demography, Case Studies
Song, Juyeon; Gaspard, Hanna; Nagengast, Benjamin; Trautwein, Ulrich – Journal of Educational Psychology, 2020
Conscientiousness and interest are well-known predictors of academic effort and achievement. As hypothesized by the Conscientiousness × Interest Compensation (CONIC) model, conscientiousness and interest can (partly) compensate for each other, leading to (comparatively) high effort if either conscientiousness or interest is high. The present…
Descriptors: Personality Traits, Interests, Models, Prediction
Mantzicopoulos, Panayota; French, Brian F.; Patrick, Helen – Grantee Submission, 2018
Research Findings: We evaluated the score stability of the Mathematical Quality of Instruction (MQI), an observational measure of mathematics instruction. Three raters each scored, independently, 100 video-recorded lessons taught by 20 kindergarten teachers in the spring. Using generalizability theory analyses, we decomposed the MQI's score…
Descriptors: Kindergarten, Mathematics Instruction, Educational Quality, Classroom Observation Techniques
Briggs, Derek C.; Alzen, Jessica L. – Educational and Psychological Measurement, 2019
Observation protocol scores are commonly used as status measures to support inferences about teacher practices. When multiple observations are collected for the same teacher over the course of a year, some portion of a teacher's score on each occasion may be attributable to the rater, lesson, and the time of year of the observation. All three of…
Descriptors: Observation, Inferences, Generalizability Theory, Scores
Gresham, Frank M.; Dart, Evan H.; Collins, Tai A. – School Psychology Review, 2017
The concept of treatment integrity is an essential component to databased decision making within a response-to-intervention model. Although treatment integrity is a topic receiving increased attention in the school-based intervention literature, relatively few studies have been conducted regarding the technical adequacy of treatment integrity…
Descriptors: Fidelity, Generalizability Theory, Observation, Measurement Techniques
Wilson, Joshua; Chen, Dandan; Sandbank, Micheal P.; Hebert, Michael – Journal of Educational Psychology, 2019
The present study examined issues pertaining to the reliability of writing assessment in the elementary grades, and among samples of struggling and nonstruggling writers. The present study also extended nascent research on the reliability and the practical applications of automated essay scoring (AES) systems in Response to Intervention frameworks…
Descriptors: Computer Assisted Testing, Automation, Scores, Writing Tests