Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 8 |
Descriptor
Evaluators | 21 |
Interrater Reliability | 21 |
Test Reliability | 21 |
Test Validity | 12 |
Evaluation Methods | 7 |
Scoring | 7 |
Test Construction | 6 |
Educational Assessment | 5 |
Evaluation Criteria | 5 |
Higher Education | 4 |
Scores | 4 |
More ▼ |
Source
Author
Aaron Zimmerman | 1 |
Abedi, Jamal | 1 |
Angoff, William H. | 1 |
Apache, R. R. | 1 |
Ballard, Laura | 1 |
Bejar, Isaac I. | 1 |
Bethany L. Miller | 1 |
Brooks, Val | 1 |
Carifio, James | 1 |
Curtis, Philip R. | 1 |
Dempster, Edith R. | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 10 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 5 |
Reports - Descriptive | 4 |
Dissertations/Theses -… | 1 |
Numerical/Quantitative Data | 1 |
Tests/Questionnaires | 1 |
Education Level
Postsecondary Education | 2 |
Adult Education | 1 |
Elementary Secondary Education | 1 |
Higher Education | 1 |
Secondary Education | 1 |
Audience
Researchers | 1 |
Location
Hong Kong | 1 |
South Africa | 1 |
United Kingdom | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Alabama High School… | 1 |
National Assessment of… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024
In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…
Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods
Dempster, Edith R.; Kirby, Nicola F. – Perspectives in Education, 2018
Taxonomies of cognitive demand are frequently used to ensure that assessment tasks include questions ranging from low to high cognitive demand. This paper investigates inter-rater agreement among four evaluators on the cognitive demand of the South African National Senior Certificate Life Sciences examinations after training, practice and…
Descriptors: Interrater Reliability, Biological Sciences, Cognitive Processes, Test Items
Ballard, Laura – ProQuest LLC, 2017
Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
Descriptors: Evaluators, Schemata (Cognition), Eye Movements, Scoring Rubrics
Tam, Cheung On – International Journal of Art & Design Education, 2018
This article reports on the development and validation of a rubric for assessing students' written responses to artworks. Since the implementation of the Hong Kong New Senior Secondary Curriculum in 2009, art educators have seen responding to artworks as increasingly important. In this context, the Art Criticism Assessment Rubric (ACAR) was…
Descriptors: Foreign Countries, Art Education, Art Appreciation, Student Evaluation
Hampton, Lauren H.; Curtis, Philip R.; Roberts, Megan Y. – Autism: The International Journal of Research and Practice, 2019
Borrowing from a clinical psychology observational methodology, thin-slice observations were used to assess autism characteristics in toddlers. Thin-slices are short observations taken from a longer behavior stream which are assigned ratings by multiple raters using a 5-point scale. The raters' observations are averaged together to assign a…
Descriptors: Autism, Pervasive Developmental Disorders, Observation, Toddlers
Brooks, Val – Research Papers in Education, 2012
An aspect of assessment which has received little attention compared with perennial concerns, such as standards or reliability, is the role of judgment in marking. This paper explores marking as an act of judgment, paying particular attention to the nature of judgment and the processes involved. It brings together studies which have explored…
Descriptors: Educational Assessment, Test Reliability, Test Validity, Value Judgment
Feldman, Moshe; Lazzara, Elizabeth H.; Vanderbilt, Allison A.; DiazGranados, Deborah – Journal of Continuing Education in the Health Professions, 2012
Competency-based assessment and an emphasis on obtaining higher-level outcomes that reflect physicians' ability to demonstrate their skills has created a need for more advanced assessment practices. Simulation-based assessments provide medical education planners with tools to better evaluate the 6 Accreditation Council for Graduate Medical…
Descriptors: Performance Based Assessment, Physicians, Accuracy, High Stakes Tests

Abedi, Jamal – Multivariate Behavioral Research, 1996
The Interrater/Test Reliability System (ITRS) is described. The ITRS is a comprehensive computer tool used to address questions of interrater reliability that computes several different indices of interrater reliability and the generalizability coefficient over raters and topics. The system is available in IBM compatible or Macintosh format. (SLD)
Descriptors: Computer Software, Computer Software Evaluation, Evaluation Methods, Evaluators
Assessing the Evidence: Different Types of NVQ Evidence and Their Impact on Reliability and Fairness
Greatorex, Jackie – Journal of Vocational Education and Training, 2005
The research literature reveals that there are many factors that influence the consistency of assessors' or examiners' judgements. One issue that has not been considered is whether National Vocational Qualifications assessors' consistency of judgement is affected by different types of evidence. In this article, 15 Customer Service and 12 Assessor…
Descriptors: Qualifications, Examiners, Interrater Reliability, Job Applicants
Firmin, Michael W.; Proemmel, Elizabeth; Hwang, Chi-en – Educational Research Quarterly, 2005
Previous studies have compared the accuracy of parent, teacher, and clinician ratings of children behavior, especially in diagnostic analysis. However, many have questioned the validity of the tests and the value of each rater. While some research has found differences among raters, few had looked at samples of non-referred children. We wanted to…
Descriptors: Parent Attitudes, Teacher Attitudes, Comparative Analysis, Child Behavior
Apache, R. R. – Physical Educator, 2006
A behavioral assessment system for scoring the behaviors of parents and coaches at youth sports games is described within this paper. The Youth Sports Behavior Assessment System (YSBAS) contains nine behavioral categories describing behaviors commonly seen during youth sports. The developmental process of YSBAS and the observer-training program…
Descriptors: Evaluators, Training, Scoring, Parent Education

Magnan, Sally Sieloff – Canadian Modern Language Review, 1987
Differences in procedures used by academic institutions and government agencies in administering the American Council on the Teaching of Foreign Languages' Oral Proficiency Interview test are examined, and results and implications of two studies of interrater reliability are discussed. (MSE)
Descriptors: Comparative Analysis, Correlation, Evaluation Methods, Evaluators
Angoff, William H. – 1989
This study was undertaken to test the hypothesis that items of the Test of English as a Foreign Language (TOEFL) containing reference to American people, places, customs, etc., tend to favor examinees who have spent some time living in the United States. Two samples of examinees were drawn from the March 1987 TOEFL administration, one tested in…
Descriptors: Context Effect, English (Second Language), Evaluators, Foreign Nationals
Nasser, Ramzi; Carifio, James – 1993
The validation of key contextual features of algebra word problems was studied in two phases. In the first phase, five experts were asked to assess the appropriateness of the concepts in the problems and the adequacy of the assignment of the contextual features to the problems. In the second phase, construct validity was established by having 6…
Descriptors: Algebra, Analysis of Variance, Construct Validity, Context Effect
Halpin, Glennelle; McLean, James E. – 1991
Although the standard-setting method of W. H. Angoff (1971) has broad-based support in the research literature, inconsistencies in the resulting standards do occur. Sources of these inconsistencies are examined in a study of judges, competencies (items), rounds (replications), and the interactions among them. A modified Angoff approach was used to…
Descriptors: Analysis of Variance, Error of Measurement, Evaluators, High Schools
Previous Page | Next Page ยป
Pages: 1 | 2