NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign…1
What Works Clearinghouse Rating
Showing 1 to 15 of 18 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2020
Rater fit analyses provide insight into the degree to which rater judgments correspond to expected properties, as defined within a measurement framework. Parametric models such as the Rasch model provide a useful framework for evaluating rating quality; however, these models are not appropriate for all assessment contexts. The purpose of this…
Descriptors: Evaluators, Goodness of Fit, Simulation, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Walker, A. Adrienne – Language Assessment Quarterly, 2020
Scoring procedures for many rater-mediated performance assessments include score resolution procedures in which a third rater adjudicates discrepancies between two raters' ratings of the same performance. There are numerous approaches for calculating resolved scores that involve different combinations of the original and third ratings. Using data…
Descriptors: Scoring, Evaluators, Goodness of Fit, Content Area Writing
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019
Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…
Descriptors: Rating Scales, Models, Evaluators, Data Collection
Peer reviewed Peer reviewed
Direct linkDirect link
Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021
Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…
Descriptors: Language Tests, German, Second Languages, Writing Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021
This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…
Descriptors: Music Education, Music Activities, Music, Performance
Peer reviewed Peer reviewed
Direct linkDirect link
Hidri, Sahbi – Language Testing in Asia, 2021
The study investigated the alignment process of the International English Language Competency Assessment (IELCA) suite examinations' four levels, B1, B2, C1 and C2, onto the Common European Framework of Reference (CEFR) by explaining and discussing the five linking stages (Council of Europe (CoE 2009). Unlike previous studies, this study used the…
Descriptors: Literacy, Second Language Learning, Second Language Instruction, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Schumacker, Randall E. – Educational Measurement: Issues and Practice, 2017
The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start-up, plodding, boredom,…
Descriptors: Measurement, Testing Problems, Writing Tests, Performance Based Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Peer reviewed Peer reviewed
Direct linkDirect link
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Lindner, Mark D.; Vancea, Adrian; Chen, Mei-Ching; Chacko, George – American Journal of Evaluation, 2016
The National Institutes of Health (NIH) is the largest source of funding for biomedical research in the world. Funding decisions are made largely based on the outcome of a peer review process that is intended to provide a fair, equitable, timely, and unbiased review of the quality, scientific merit, and potential impact of the research. There have…
Descriptors: Medical Research, Biomedicine, Peer Evaluation, Evaluation Criteria
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Chen, Guangyan – International Journal of Language Testing, 2016
This study develops a model of analytic rating scales to assess L2 Chinese oral performance. It uses Exploratory Factor Analysis (EFA) to identify a model and employs Confirmative Factor Analysis (CFA) in a separate dataset to test the degree of model fit. The researcher videotaped ten speeches and ACTFL professional raters assessed the oral…
Descriptors: Oral Language, Factor Analysis, Chinese, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Engelhard, George, Jr.; Wesolowski, Brian – Educational Assessment, 2016
When good model-data fit is observed, the Many-Facet Rasch (MFR) model acts as a linking and equating model that can be used to estimate student achievement, item difficulties, and rater severity on the same linear continuum. Given sufficient connectivity among the facets, the MFR model provides estimates of student achievement that are equated to…
Descriptors: Evaluators, Interrater Reliability, Academic Achievement, Music Education
Peer reviewed Peer reviewed
Direct linkDirect link
Virtanen, T. E.; Pakarinen, E.; Lerkkanen, M.-K.; Poikkeus, A.-M.; Siekkinen, M.; Nurmi, J.-E. – Journal of Early Adolescence, 2018
This study examined the reliability and validity of the Classroom Assessment Scoring System-Secondary (CLASS-S) in Finnish classrooms. Trained observers coded classroom interactions based on video recordings of 46 Grade 6 classrooms (450 cycles). Concurrent associations were investigated with respect to teacher self-ratings (e.g., efficacy beliefs…
Descriptors: Factor Analysis, Classroom Observation Techniques, Foreign Countries, Factor Structure
Previous Page | Next Page ยป
Pages: 1  |  2