Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 9 |
| Since 2007 (last 20 years) | 17 |
Descriptor
| Evaluators | 33 |
| Test Validity | 33 |
| Test Reliability | 25 |
| Interrater Reliability | 20 |
| Evaluation Methods | 12 |
| Test Construction | 9 |
| Scoring | 7 |
| English (Second Language) | 6 |
| Evaluation Criteria | 6 |
| Language Tests | 6 |
| Scores | 6 |
| More ▼ | |
Source
Author
| Bejar, Isaac I. | 2 |
| Aaron Zimmerman | 1 |
| Ahmadi Safa, Mohammad | 1 |
| Angoff, William H. | 1 |
| Apache, R. R. | 1 |
| Barth, Amy E. | 1 |
| Barwell, Fred | 1 |
| Bethany L. Miller | 1 |
| Binghan Zheng | 1 |
| Bogorevich, Valeriia | 1 |
| Brodersen, R. Marc | 1 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 3 |
| Higher Education | 3 |
| Secondary Education | 3 |
| Elementary Education | 2 |
| Grade 7 | 2 |
| Postsecondary Education | 2 |
| Early Childhood Education | 1 |
| Grade 6 | 1 |
| Grade 8 | 1 |
| Middle Schools | 1 |
Audience
| Researchers | 2 |
| Administrators | 1 |
| Practitioners | 1 |
| Teachers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| National Assessment of… | 2 |
| Test of English as a Foreign… | 2 |
What Works Clearinghouse Rating
Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024
In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…
Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Hampton, Lauren H.; Curtis, Philip R.; Roberts, Megan Y. – Autism: The International Journal of Research and Practice, 2019
Borrowing from a clinical psychology observational methodology, thin-slice observations were used to assess autism characteristics in toddlers. Thin-slices are short observations taken from a longer behavior stream which are assigned ratings by multiple raters using a 5-point scale. The raters' observations are averaged together to assign a…
Descriptors: Autism, Pervasive Developmental Disorders, Observation, Toddlers
Tam, Cheung On – International Journal of Art & Design Education, 2018
This article reports on the development and validation of a rubric for assessing students' written responses to artworks. Since the implementation of the Hong Kong New Senior Secondary Curriculum in 2009, art educators have seen responding to artworks as increasingly important. In this context, the Art Criticism Assessment Rubric (ACAR) was…
Descriptors: Foreign Countries, Art Education, Art Appreciation, Student Evaluation
Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…
Descriptors: Oral Language, Language Tests, Interrater Reliability, Training
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Yalçin-Çolakoglu, Özlem; Selçuk, Merve – Advances in Language and Literary Studies, 2019
Criterion referenced tests of second language speaking performance are administered in different institutions using different procedures. The present study reports raters' practices of second language speaking tests, in particular the correspondence between test-takers' grades when assessed individually and in groups. Data derived from…
Descriptors: Oral Language, Language Tests, Test Validity, Inferences
Moshinsky, Avital; Ziegler, David; Gafni, Naomi – International Journal of Testing, 2017
Many medical schools have adopted multiple mini-interviews (MMI) as an advanced selection tool. MMIs are expensive and used to test only a few dozen candidates per day, making it infeasible to develop a different test version for each test administration. Therefore, some items are reused both within and across years. This study investigated the…
Descriptors: Interviews, Medical Schools, Test Validity, Test Reliability
Hood, Lisa; Rodriguez, Sarai Coba; Rosa, Pamela Reimer; Hunt, Erika Lee – AERA Online Paper Repository, 2016
Teacher-evaluation has become a key measure for determining teacher effectiveness and the examination of the evaluation process at early childhood levels is a critical area of study. Responding to federal policies, many states have reformed their teacher evaluation systems. One of the more recent developments in state policy is the inclusion of…
Descriptors: Early Childhood Teachers, Teacher Evaluation, Evaluation Methods, Test Reliability
Cherasaro, Trudy L.; Brodersen, R. Marc; Yanoski, David C.; Welp, Laura C.; Reale, Marianne L. – Regional Educational Laboratory Central, 2015
This report presents a survey tool, developed by REL Central at Marzano Research, designed to gather information from teachers about their perceptions of and responses to evaluator feedback. District or state administrators can use this survey to systematically collect teacher perceptions on five key aspects of evaluation feedback: (1) feedback…
Descriptors: Teacher Surveys, Evaluators, Teacher Attitudes, Feedback (Response)
Brooks, Val – Research Papers in Education, 2012
An aspect of assessment which has received little attention compared with perennial concerns, such as standards or reliability, is the role of judgment in marking. This paper explores marking as an act of judgment, paying particular attention to the nature of judgment and the processes involved. It brings together studies which have explored…
Descriptors: Educational Assessment, Test Reliability, Test Validity, Value Judgment
Bogorevich, Valeriia – ProQuest LLC, 2018
Rater variation in performance assessment can impact test-takers' scores and compromise assessments' fairness and validity (Crooks, Kane, & Cohen, 1996). Rater variation can also undermine a test's validity and fairness; therefore, it is important to investigate raters' scoring patterns in order to inform rater training. Substantial work has…
Descriptors: Pronunciation, Familiarity, English (Second Language), Second Language Learning
Barth, Amy E.; Stuebing, Karla K.; Fletcher, Jack M.; Cirino, Paul T.; Romain, Melissa; Francis, David; Vaughn, Sharon – Reading Psychology, 2012
We evaluated the reliability and validity of two oral reading fluency scores for 1-minute equated passages: median score and mean score. These scores were calculated from measures of reading fluency administered up to five times over the school year to students in grades six to eight (n = 1,317). Both scores were highly reliable with strong…
Descriptors: Reading Fluency, Test Validity, Test Reliability, Scores
Development and Validation of the Cultural Competence of Program Evaluators (CCPE) Self-Report Scale
Dunaway, Krystall E.; Morrow, Jennifer A.; Porter, Bryan E. – American Journal of Evaluation, 2012
No self-report measure of cultural competence currently exists in program evaluation. Adapting items from cultural competence measures in fields such as counseling and nursing, the researchers developed the Cultural Competence of Program Evaluators (CCPE) self-report scale. The goals of this study were to validate the CCPE and to assess…
Descriptors: Test Validity, Measures (Individuals), Cultural Awareness, Program Evaluation
Somers, Marie-Andree; Zhu, Pei; Wong, Edmond – National Center for Education Evaluation and Regional Assistance, 2011
This study examines the practical implications of using state tests to measure student achievement in impact evaluations that span multiple states and grades. In particular, the study examines the sensitivity of impact findings to (1) the type of assessment used to measured achievement (state tests or an external assessment administered by the…
Descriptors: Evaluators, Grades (Scholastic), Academic Achievement, Program Effectiveness

Peer reviewed
Direct link
