Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 12 |
Descriptor
Interrater Reliability | 38 |
Test Interpretation | 38 |
Scoring | 13 |
Test Reliability | 10 |
Evaluators | 8 |
Test Validity | 8 |
Scores | 7 |
Foreign Countries | 6 |
Measurement Techniques | 6 |
Standard Setting (Scoring) | 6 |
Test Construction | 6 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 5 |
Postsecondary Education | 4 |
Early Childhood Education | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
Grade 9 | 1 |
More ▼ |
Location
China | 1 |
Germany | 1 |
Kentucky | 1 |
South Africa | 1 |
Sweden | 1 |
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Adult Attachment Interview | 1 |
Bender Gestalt Test | 1 |
Early Childhood Longitudinal… | 1 |
Minnesota Multiphasic… | 1 |
Program for International… | 1 |
Self Directed Search | 1 |
Strong Campbell Interest… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Tengberg, Michael – Language Assessment Quarterly, 2018
Reading comprehension is often treated as a multidimensional construct. In many reading tests, items are distributed over reading process categories to represent the subskills expected to constitute comprehension. This study explores (a) the extent to which specified subskills of reading comprehension tests are conceptually conceivable to…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Results
Skaggs, Gary – Measurement: Interdisciplinary Research and Perspectives, 2013
The construct map is a particularly good way to approach instrument development, and this author states that he was delighted to read Adam Wyse's thoughts about how to use construct maps for standard setting. For a number of popular standard-setting methods, Wyse shows how typical feedback to panelists fits within a construct map framework.…
Descriptors: Standard Setting (Scoring), Maps, Test Construction, Measurement
Rindermann, Heiner; Baumeister, Antonia E. E. – International Journal of Testing, 2015
Scholastic tests regard cognitive abilities to be domain-specific competences. However, high correlations between competences indicate either high task similarity or a dependence on common factors. The present rating study examined the validity of 12 Programme for International Student Assessment (PISA) and Third or Trends in International…
Descriptors: Test Validity, Test Interpretation, Competence, Reading Tests
Reed, Deborah K.; Sturges, Keith M. – Remedial and Special Education, 2013
Researchers have expressed concern about "implementation" fidelity in intervention research but have not extended that concern to "assessment" fidelity, or the extent to which pre-/posttests are administered and interpreted as intended. When studying reading interventions, data gathering heavily influences the identification of…
Descriptors: Reading Tests, Fidelity, Pretests Posttests, Intervention
Murley, Lisa D.; Stobaugh, Rebecca; Jukes, Pamela; Tassell, Janet – Educational Renaissance, 2014
The purpose of this article is to provide an overview of the process used to examine the inter-rater reliability of the Teacher Work Sample (TWS) Scoring Rubric involved with the senior culminating experience for teacher candidates used at a large comprehensive university. The study compared holistic and analytic scores reported by Student Teacher…
Descriptors: Teacher Education, Interrater Reliability, Scoring Rubrics, Preservice Teachers
Martinez, Jose Felipe; Stecher, Brian; Borko, Hilda – Educational Assessment, 2009
In this study we use data from the Early Childhood Longitudinal Survey third- and fifth-grade samples to investigate teacher judgments of student achievement, the extent to which they offer a similar picture of student mathematics achievement compared to standardized test scores, and whether classroom assessment practices moderate the relationship…
Descriptors: Mathematics Achievement, Standardized Tests, Grade 5, Student Evaluation
Amir, Tamar; Gati, Itamar; Kleiman, Tali – Journal of Career Assessment, 2008
This research develops and tests a procedure for interpreting individuals' responses in multiscale career assessments, using the Career Decision-Making Difficulties Questionnaire (CDDQ). In Study 1, criteria for ascertaining the credibility of responses were developed, based on the judgments of 39 career-counseling experts. In Study 2, the…
Descriptors: Career Choice, Decision Making Skills, Career Development, Questionnaires
Lang, W. Steve; Wilkerson, Judy R. – Online Submission, 2008
The National Council for Accreditation of Teacher Education (NCATE, 2002) requires teacher education units to develop assessment systems and evaluate both the success of candidates and unit operations. Because of a stated, but misguided, fear of statistics, NCATE fails to use accepted terminology to assure the quality of institutional evaluative…
Descriptors: State Standards, Validity, Resource Materials, Reliability

Vance, B.; And Others – Psychology in the Schools, 1983
Investigated the interscorer reliability between a novice and a professional psychologist for the Minnesota Percepto-Diagnostic Test-Revised (MPDT-R), using a sample of 30 individuals. Results indicated that for three of the four MPDT-R scores there was a significant positive correlation between expert and novice scoring criteria. (JAC)
Descriptors: Experimenter Characteristics, Interrater Reliability, Psychological Evaluation, Psychologists

Aucone, Ernest J.; Raphael, Alan J.; Golden, Charles J.; Espe-Pfeifer, Patricia; Seldon, Jen; Pospisil, Tanya; Dornheim, Liane; Proctor-Weber, Zoe; Calabria, Michael – Assessment, 1999
Assessed the interrater reliability of the revised Advanced Psychodiagnostic Interpretation (API) (A. Raphael and C. Golden, 1998) scoring system for the Bender Gestalt Test (L. Bender, 1938). Agreement across nine raters exceeded 90% for each of three clinical protocols, and kappa statistics indicated good interrater reliability. (SLD)
Descriptors: Diagnostic Tests, Interrater Reliability, Psychological Testing, Scoring
Bosch, Holger; Steinkamp, Fiona; Boller, Emil – Psychological Bulletin, 2006
H. Bosch, F. Steinkamp, and E. Boller's (see record 2006-08436-001) meta-analysis, which demonstrated (a) a small but highly significant overall effect, (b) a small-study effect, and (c) extreme heterogeneity, has provoked widely differing responses. After considering D. B. Wilson and W. R. Shadish's (see record 2006-08436-002) and D. Radin, R.…
Descriptors: Meta Analysis, Publications, Bias, Models

Bakermans-Kranenburg, Marian J; van IJzendoorn, Marinus H. – Developmental Psychology, 1993
Examined the validity of the Adult Attachment Interview (AAI) measure by interviewing 83 mothers twice over 2 months, using different interviewers on each occasion. The results indicated that the reliability of the AAI classifications was quite high over time and across interviewers. The AAI classifications were independent of nonattachment…
Descriptors: Attachment Behavior, Examiners, Interrater Reliability, Mothers
Smith, Linda B.; Samuelson, Larissa – Developmental Psychology, 2006
Recently, "Developmental Psychology" published 2 articles on the shape bias; both rejected the authors' previous proposals about the role of attentional learning in the development of a shape bias in object name learning. A. Cimpian and E. Markman (2005; see record EJ733667) did so by arguing that the shape bias does not exist but is an…
Descriptors: Developmental Psychology, Cognitive Development, Misconceptions, Attention
Arnold, Margery E. – 1996
It is incorrect to say "the test is reliable" because reliability is a function not only of the test itself, but of many factors. The present paper explains how different factors affect classical reliability estimates such as test-retest, interrater, internal consistency, and equivalent forms coefficients. Furthermore, the limits of classical test…
Descriptors: Estimation (Mathematics), Generalizability Theory, Heuristics, Interrater Reliability