Publication Date
| In 2026 | 0 |
| Since 2025 | 60 |
| Since 2022 (last 5 years) | 286 |
| Since 2017 (last 10 years) | 782 |
| Since 2007 (last 20 years) | 2044 |
Descriptor
| Interrater Reliability | 3126 |
| Foreign Countries | 655 |
| Test Reliability | 504 |
| Evaluation Methods | 503 |
| Test Validity | 411 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Hampton, Lauren H.; Curtis, Philip R.; Roberts, Megan Y. – Autism: The International Journal of Research and Practice, 2019
Borrowing from a clinical psychology observational methodology, thin-slice observations were used to assess autism characteristics in toddlers. Thin-slices are short observations taken from a longer behavior stream which are assigned ratings by multiple raters using a 5-point scale. The raters' observations are averaged together to assign a…
Descriptors: Autism, Pervasive Developmental Disorders, Observation, Toddlers
Dogan, C. Deha; Uluman, Müge – Educational Sciences: Theory and Practice, 2017
The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Descriptors: Comparative Analysis, Scoring Rubrics, Rating Scales, Interrater Reliability
Yamamoto, Kentaro; He, Qiwei; Shin, Hyo Jeong; von Davier, Mattias – ETS Research Report Series, 2017
Approximately a third of the Programme for International Student Assessment (PISA) items in the core domains (math, reading, and science) are constructed-response items and require human coding (scoring). This process is time-consuming, expensive, and prone to error as often (a) humans code inconsistently, and (b) coding reliability in…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students
Dedering, Kathrin; Sowada, Moritz G. – Educational Assessment, Evaluation and Accountability, 2017
School inspections have become an important instrument of quality assurance and quality development in many European countries. So far, the focus of empirical research on school inspections has been on the acceptance of the procedure among the school-internal actors, its influence for internal quality development and its effects on student…
Descriptors: Inspection, Administrative Policy, Administrative Principles, Teamwork
Chuang, Tsung-Yen; Huang, Yun-Hsuan – Creativity Research Journal, 2015
Mobile technology has rapidly made digital games a popular entertainment to this digital generation, and thus digital game design received considerable attention in both the game industry and design education. Digital game design involves diverse dimensions in which digital game story design (DGSD) particularly attracts our interest, as the…
Descriptors: Creativity, Interrater Reliability, Construct Validity, Creativity Tests
Raczynski, Kevin R.; Cohen, Allan S.; Engelhard, George, Jr.; Lu, Zhenqiu – Journal of Educational Measurement, 2015
There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large-scale writing assessments. This study compared the effectiveness of two widely used rater training methods--self-paced and collaborative…
Descriptors: Interrater Reliability, Writing Evaluation, Training Methods, Pacing
Anderson, Daniel; Irvin, Shawn; Alonzo, Julie; Tindal, Gerald A. – Educational Measurement: Issues and Practice, 2015
The alignment of test items to content standards is critical to the validity of decisions made from standards-based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the…
Descriptors: Test Items, Alignment (Education), Standards, Online Systems
Stipancic, Kaila L.; Tjaden, Kris; Wilding, Gregory – Journal of Speech, Language, and Hearing Research, 2016
Purpose: This study obtained judgments of sentence intelligibility using orthographic transcription for comparison with previously reported intelligibility judgments obtained using a visual analog scale (VAS) for individuals with Parkinson's disease and multiple sclerosis and healthy controls (K. Tjaden, J. E. Sussman, & G. E. Wilding, 2014).…
Descriptors: Diseases, Neurological Impairments, Sentences, Measures (Individuals)
Derrick, Deirdre J. – TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect, 2016
Second language (L2) researchers often have to develop or change the instruments they use to measure numerous constructs (Norris & Ortega, 2012). Given the prevalence of researcher-developed and -adapted data collection instruments, and given the profound effect instrumentation can have on results, thorough reporting of instrumentation is…
Descriptors: Second Language Learning, Language Research, Research Methodology, Interrater Reliability
McGrane, Joshua Aaron; Humphry, Stephen Mark; Heldsinger, Sandra – Applied Measurement in Education, 2018
National standardized assessment programs have increasingly included extended written performances, amplifying the need for reliable, valid, and efficient methods of assessment. This article examines a two-stage method using comparative judgments and calibrated exemplars as a complement and alternative to existing methods of assessing writing.…
Descriptors: Standardized Tests, Foreign Countries, Writing Tests, Writing Evaluation
Cankoy, Osman; Özder, Hasan – EURASIA Journal of Mathematics, Science & Technology Education, 2017
The aim of this study is to develop a scoring rubric to assess primary school students' problem posing skills. The rubric including five dimensions namely solvability, reasonability, mathematical structure, context and language was used. The raters scored the students' problem posing skills both with and without the scoring rubric to test the…
Descriptors: Generalizability Theory, Elementary School Students, Foreign Countries, Problem Solving
Garte, Rebecca – International Journal of Progressive Education, 2017
This paper provides a historical analysis of the past century of progressive education, within the general socio-political context of schooling within the US. The purpose of this review is to create a social, historical and philosophical context for understanding the current narrative of progressive education that exists in educational policy…
Descriptors: Progressive Education, Educational History, Educational Practices, Philosophy
Wan, Ming Wai; Brooks, Ami; Green, Jonathan; Abel, Kathryn; Elmadih, Alya – International Journal of Behavioral Development, 2017
This study investigated the psychometrics of a recently developed global rating measure of videotaped parent-infant interaction, the "Manchester Assessment of Caregiver-Infant Interaction" (MACI), in a normative sample. Inter-rater reliability, stability over time, and convergent and discriminant validity were tested. Six-minute play…
Descriptors: Rating Scales, Parent Child Relationship, Infants, Interaction
Nehring, Andreas; Päßler, Andreas; Tiemann, Rüdiger – International Journal of Science and Mathematics Education, 2017
With regard to the moderate performance of German students in international large-scale assessments, one branch of German science education research is concerned with the construction and evaluation of competence models. Based on the theory-driven definition of competence levels, these models imply a correlation between the complexity of a…
Descriptors: Foreign Countries, Science Education, Chemistry, Science Teachers
Roberts, William L.; Boulet, John; Sandella, Jeanne – Advances in Health Sciences Education, 2017
When the safety of the public is at stake, it is particularly relevant for licensing and credentialing exam agencies to use defensible standard setting methods to categorize candidates into competence categories (e.g., pass/fail). The aim of this study was to gather evidence to support change to the Comprehensive Osteopathic Medical Licensing-USA…
Descriptors: Standard Setting, Comparative Analysis, Clinical Experience, Skill Analysis

Peer reviewed
Direct link
