Publication Date
| In 2026 | 0 |
| Since 2025 | 60 |
| Since 2022 (last 5 years) | 286 |
| Since 2017 (last 10 years) | 782 |
| Since 2007 (last 20 years) | 2044 |
Descriptor
| Interrater Reliability | 3126 |
| Foreign Countries | 655 |
| Test Reliability | 504 |
| Evaluation Methods | 503 |
| Test Validity | 411 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
McGough, David J. – AERA Online Paper Repository, 2017
This paper describes the implementation of an inter-rater reliability measure for assessing portfolio scores in a teacher education program. The reliability coefficient for the portfolio scores from completers of a newly revised program were compared with the reliability coefficient of the scores from a second set of reviewers who discussed the…
Descriptors: Interrater Reliability, Teacher Education Programs, Program Evaluation, Portfolio Assessment
Smolinsky, Lawrence; Marx, Brian D.; Olafsson, Gestur; Ma, Yanxia A. – Journal of Educational Computing Research, 2020
Computer-based testing is an expanding use of technology offering advantages to teachers and students. We studied Calculus II classes for science, technology, engineering, and mathematics majors using different testing modes. Three sections with 324 students employed: paper-and-pencil testing, computer-based testing, and both. Computer tests gave…
Descriptors: Test Format, Computer Assisted Testing, Paper (Material), Calculus
Chan, Stephanie W. Y.; Cheung, Wai Ming; Huang, Yanli; Lam, Wai-Ip; Lin, Chin-Hsi – Language Testing, 2020
Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners' L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners'…
Descriptors: Chinese, Second Language Learning, Second Language Instruction, Written Language
Kouo, Jennifer Lee – Focus on Autism and Other Developmental Disabilities, 2019
Deficits in social communication and interaction have been identified as distinguishing impairments for individuals with an autism spectrum disorder (ASD). As a pivotal skill, the successful development of social communication and interaction in individuals with ASD is a lifelong objective. Point-of-view video modeling (VM) has the potential to…
Descriptors: Interpersonal Competence, Autism, Pervasive Developmental Disorders, Video Technology
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2015
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…
Descriptors: Interrater Reliability, Monte Carlo Methods, Measurement Techniques, Accuracy
Kahraman, Nilufer; Brown, Crystal B. – Applied Measurement in Education, 2015
Psychometric models based on structural equation modeling framework are commonly used in many multiple-choice test settings to assess measurement invariance of test items across examinee subpopulations. The premise of the current article is that they may also be useful in the context of performance assessment tests to test measurement invariance…
Descriptors: Factor Analysis, Structural Equation Models, Medical Students, Performance Based Assessment
National Council on Teacher Quality, 2023
Up until 2020, National Assessment of Educational Progress (NAEP) reading scores had increased only slightly since the early 1990s with large achievement gaps for students of color and students living in poverty. Modest gains in fourth grade reading proficiency since 1992 were erased during the pandemic. The insufficient progress in reading even…
Descriptors: National Competency Tests, Reading Achievement, Reading Instruction, Scores
Cato, Heather; Walker, Katie – Journal of Language and Literacy Education, 2022
Standardized testing and accountability are currently unavoidable components of Texas Public Education. Through years of push-back, parents and educators have demanded that Texas consider alternative testing options that would reduce the high-stakes testing burden on students and schools. In 2015, the State of Texas passed legislation requiring…
Descriptors: Writing Evaluation, Writing Instruction, Pedagogical Content Knowledge, State Legislation
Lavesson, Ann; Lövdén, Martin; Hansson, Kristina – International Journal of Language & Communication Disorders, 2018
Background: The Swedish Program for health surveillance of preschool children includes screening of language and communication abilities. One important language screening is carried out at age 4 years as part of a general screening conducted by health nurses at child health centres. The instruments presently in use for this screening mainly focus…
Descriptors: Preschool Children, Language Impairments, Semantics, Allied Health Personnel
Eldar, Eitan; Ayvazo, Shiri; Hirschmann, Michal – Journal of International Special Needs Education, 2018
Classroom management still remains a topic of major apprehension for teachers, and especially for those teaching students who display challenging behaviors. This paper presents an empirical examination that supplemented an exceptional project of the ministry of education in a small Middle-East country to support students with severe problem…
Descriptors: Classroom Techniques, Student Behavior, Behavior Disorders, Self Contained Classrooms
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills
Splett, Joni W.; Smith-Millman, Marissa; Raborn, Anthony; Brann, Kristy L.; Flaspohler, Paul D.; Maras, Melissa A. – School Psychology Quarterly, 2018
The current study examined between-teacher variance in teacher ratings of student behavioral and emotional risk to identify student, teacher and classroom characteristics that predict such differences and can be considered in future research and practice. Data were taken from seven elementary schools in one school district implementing universal…
Descriptors: Student Behavior, Risk, Behavior Problems, Emotional Problems
Wu, Siew Mei; Tan, Susan – Higher Education Research and Development, 2016
Rating essays is a complex task where students' grades could be adversely affected by test-irrelevant factors such as rater characteristics and rating scales. Understanding these factors and controlling their effects are crucial for test validity. Rater behaviour has been extensively studied through qualitative methods such as questionnaires and…
Descriptors: Scoring, Item Response Theory, Student Placement, College Students
Conati, Cristina; Gutica, Mirela – International Journal of Artificial Intelligence in Education, 2016
We present the results of a study that explored the emotions experienced by students during interaction with an educational game for math (Heroes of Math Island). Starting from emotion frameworks in affective computing and education, we considered a larger set of emotions than in related research. For emotion labeling, we started from a standard…
Descriptors: Educational Games, Emotional Response, Evaluators, Interrater Reliability

Direct link
Peer reviewed
