Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 11 |
Descriptor
Evaluators | 20 |
Interrater Reliability | 20 |
Test Validity | 20 |
Test Reliability | 12 |
Evaluation Methods | 8 |
English (Second Language) | 6 |
Scoring | 6 |
Evaluation Criteria | 5 |
Language Tests | 5 |
Test Construction | 5 |
Foreign Countries | 4 |
More ▼ |
Source
Author
Bejar, Isaac I. | 2 |
Aaron Zimmerman | 1 |
Ahmadi Safa, Mohammad | 1 |
Angoff, William H. | 1 |
Apache, R. R. | 1 |
Bethany L. Miller | 1 |
Binghan Zheng | 1 |
Bogorevich, Valeriia | 1 |
Brooks, Val | 1 |
Carifio, James | 1 |
Chao Han | 1 |
More ▼ |
Publication Type
Reports - Research | 15 |
Journal Articles | 12 |
Speeches/Meeting Papers | 3 |
Tests/Questionnaires | 3 |
Information Analyses | 2 |
Reports - Descriptive | 2 |
Reports - Evaluative | 2 |
Dissertations/Theses -… | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 7 | 1 |
Audience
Researchers | 2 |
Location
China | 1 |
Hong Kong | 1 |
Israel | 1 |
Turkey (Istanbul) | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
National Assessment of… | 1 |
What Works Clearinghouse Rating
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024
In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…
Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods
Doosti, Mehdi; Ahmadi Safa, Mohammad – International Journal of Language Testing, 2021
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees' expectations by the examiners have any effect on test-takers' perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian…
Descriptors: Oral Language, Language Tests, Interrater Reliability, Training
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Tam, Cheung On – International Journal of Art & Design Education, 2018
This article reports on the development and validation of a rubric for assessing students' written responses to artworks. Since the implementation of the Hong Kong New Senior Secondary Curriculum in 2009, art educators have seen responding to artworks as increasingly important. In this context, the Art Criticism Assessment Rubric (ACAR) was…
Descriptors: Foreign Countries, Art Education, Art Appreciation, Student Evaluation
Hampton, Lauren H.; Curtis, Philip R.; Roberts, Megan Y. – Autism: The International Journal of Research and Practice, 2019
Borrowing from a clinical psychology observational methodology, thin-slice observations were used to assess autism characteristics in toddlers. Thin-slices are short observations taken from a longer behavior stream which are assigned ratings by multiple raters using a 5-point scale. The raters' observations are averaged together to assign a…
Descriptors: Autism, Pervasive Developmental Disorders, Observation, Toddlers
Yalçin-Çolakoglu, Özlem; Selçuk, Merve – Advances in Language and Literary Studies, 2019
Criterion referenced tests of second language speaking performance are administered in different institutions using different procedures. The present study reports raters' practices of second language speaking tests, in particular the correspondence between test-takers' grades when assessed individually and in groups. Data derived from…
Descriptors: Oral Language, Language Tests, Test Validity, Inferences
Bogorevich, Valeriia – ProQuest LLC, 2018
Rater variation in performance assessment can impact test-takers' scores and compromise assessments' fairness and validity (Crooks, Kane, & Cohen, 1996). Rater variation can also undermine a test's validity and fairness; therefore, it is important to investigate raters' scoring patterns in order to inform rater training. Substantial work has…
Descriptors: Pronunciation, Familiarity, English (Second Language), Second Language Learning
Brooks, Val – Research Papers in Education, 2012
An aspect of assessment which has received little attention compared with perennial concerns, such as standards or reliability, is the role of judgment in marking. This paper explores marking as an act of judgment, paying particular attention to the nature of judgment and the processes involved. It brings together studies which have explored…
Descriptors: Educational Assessment, Test Reliability, Test Validity, Value Judgment

Nevo, Baruch – Journal of Educational Measurement, 1985
A literature review and a proposed means of measuring face validity, a test's appearance of being valid, are presented. Empirical evidence from examinees' perceptions of a college entrance examination support the reliability of measuring face validity. (GDC)
Descriptors: College Entrance Examinations, Evaluation Methods, Evaluators, Foreign Countries
Firmin, Michael W.; Proemmel, Elizabeth; Hwang, Chi-en – Educational Research Quarterly, 2005
Previous studies have compared the accuracy of parent, teacher, and clinician ratings of children behavior, especially in diagnostic analysis. However, many have questioned the validity of the tests and the value of each rater. While some research has found differences among raters, few had looked at samples of non-referred children. We wanted to…
Descriptors: Parent Attitudes, Teacher Attitudes, Comparative Analysis, Child Behavior
Apache, R. R. – Physical Educator, 2006
A behavioral assessment system for scoring the behaviors of parents and coaches at youth sports games is described within this paper. The Youth Sports Behavior Assessment System (YSBAS) contains nine behavioral categories describing behaviors commonly seen during youth sports. The developmental process of YSBAS and the observer-training program…
Descriptors: Evaluators, Training, Scoring, Parent Education
Zechner, Klaus; Bejar, Isaac I.; Hemat, Ramin – ETS Research Report Series, 2007
The increasing availability and performance of computer-based testing has prompted more research on the automatic assessment of language and speaking proficiency. In this investigation, we evaluated the feasibility of using an off-the-shelf speech-recognition system for scoring speaking prompts from the LanguEdge field test of 2002. We first…
Descriptors: Role, Computer Assisted Testing, Language Proficiency, Oral Language
Angoff, William H. – 1989
This study was undertaken to test the hypothesis that items of the Test of English as a Foreign Language (TOEFL) containing reference to American people, places, customs, etc., tend to favor examinees who have spent some time living in the United States. Two samples of examinees were drawn from the March 1987 TOEFL administration, one tested in…
Descriptors: Context Effect, English (Second Language), Evaluators, Foreign Nationals
Nasser, Ramzi; Carifio, James – 1993
The validation of key contextual features of algebra word problems was studied in two phases. In the first phase, five experts were asked to assess the appropriateness of the concepts in the problems and the adequacy of the assignment of the contextual features to the problems. In the second phase, construct validity was established by having 6…
Descriptors: Algebra, Analysis of Variance, Construct Validity, Context Effect
Previous Page | Next Page »
Pages: 1 | 2