ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	16
Since 2006 (last 20 years)	18

Descriptor

Evaluators	18
Goodness of Fit	18
Comparative Analysis	6
Rating Scales	6
Correlation	5
Evaluation Methods	5
Interrater Reliability	5
Foreign Countries	4
Language Proficiency	4
Language Tests	4
Reliability	4
Second Language Learning	4
Statistical Analysis	4
College Students	3
Decision Making	3
English (Second Language)	3
Factor Analysis	3
Item Response Theory	3
Performance Based Assessment	3
Scoring	3
Second Language Instruction	3
Simulation	3
Student Evaluation	3
Validity	3
Writing Tests	3
More ▼

Source

American Journal of Evaluation	2
Educational Measurement:…	2
Language Testing	2
Measurement:…	2
Applied Measurement in…	1
Educational Assessment	1
Educational and Psychological…	1
International Journal of…	1
International Journal of…	1
Journal of Early Adolescence	1
Journal of Educational…	1
Journal of Research in Music…	1
Language Assessment Quarterly	1
Language Testing in Asia	1
More ▼

Publication Type

Journal Articles	18
Reports - Research	17
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	5
Grade 6	1
High Schools	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

California (San Francisco)	1
Europe	1
Finland	1
Germany	1
India	1
Maryland	1
New York (New York)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Monotonicity as a Nonparametric Approach to Evaluating Rater Fit in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2020

Rater fit analyses provide insight into the degree to which rater judgments correspond to expected properties, as defined within a measurement framework. Parametric models such as the Rasch model provide a useful framework for evaluating rating quality; however, these models are not appropriate for all assessment contexts. The purpose of this…

Descriptors: Evaluators, Goodness of Fit, Simulation, Psychometrics

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Exploring the Impacts of Different Score Resolution Procedures on Person Fit and Estimated Achievement in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Language Assessment Quarterly, 2020

Scoring procedures for many rater-mediated performance assessments include score resolution procedures in which a third rater adjudicates discrepancies between two raters' ratings of the same performance. There are numerous approaches for calculating resolved scores that involve different combinations of the original and third ratings. Using data…

Descriptors: Scoring, Evaluators, Goodness of Fit, Content Area Writing

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

Examining Severity and Centrality Effects in TestDaF Writing and Speaking Assessments: An Extended Bayesian Many-Facet Rasch Analysis

Peer reviewed

Direct link

Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021

Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…

Descriptors: Language Tests, German, Second Languages, Writing Tests

Cross-Validation and Application of a Scale Assessing School Band Performance

Peer reviewed

Direct link

Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021

This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…

Descriptors: Music Education, Music Activities, Music, Performance

Linking the International English Language Competency Assessment Suite of Examinations to the Common European Framework of Reference

Peer reviewed

Direct link

Hidri, Sahbi – Language Testing in Asia, 2021

The study investigated the alignment process of the International English Language Competency Assessment (IELCA) suite examinations' four levels, B1, B2, C1 and C2, onto the Common European Framework of Reference (CEFR) by explaining and discussing the five linking stages (Council of Europe (CoE 2009). Unlike previous studies, this study used the…

Descriptors: Literacy, Second Language Learning, Second Language Instruction, English (Second Language)

Detecting Measurement Disturbances in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Schumacker, Randall E. – Educational Measurement: Issues and Practice, 2017

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start-up, plodding, boredom,…

Descriptors: Measurement, Testing Problems, Writing Tests, Performance Based Assessment

Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models

Peer reviewed

Direct link

Lamprianou, Iasonas – Educational and Psychological Measurement, 2018

It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…

Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

NIH Peer Review: Scored Review Criteria and Overall Impact

Peer reviewed

Direct link

Lindner, Mark D.; Vancea, Adrian; Chen, Mei-Ching; Chacko, George – American Journal of Evaluation, 2016

The National Institutes of Health (NIH) is the largest source of funding for biomedical research in the world. Funding decisions are made largely based on the outcome of a peer review process that is intended to provide a fair, equitable, timely, and unbiased review of the quality, scientific merit, and potential impact of the research. There have…

Descriptors: Medical Research, Biomedicine, Peer Evaluation, Evaluation Criteria

Developing a Model of Analytic Rating Scales to Assess College Students' L2 Chinese Oral Performance

Peer reviewed
PDF on ERIC

Download full text

Chen, Guangyan – International Journal of Language Testing, 2016

This study develops a model of analytic rating scales to assess L2 Chinese oral performance. It uses Exploratory Factor Analysis (EFA) to identify a model and employs Confirmative Factor Analysis (CFA) in a separate dataset to test the degree of model fit. The researcher videotaped ten speeches and ACTFL professional raters assessed the oral…

Descriptors: Oral Language, Factor Analysis, Chinese, Second Language Learning

Exploring the Effects of Rater Linking Designs and Rater Fit on Achievement Estimates within the Context of Music Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Engelhard, George, Jr.; Wesolowski, Brian – Educational Assessment, 2016

When good model-data fit is observed, the Many-Facet Rasch (MFR) model acts as a linking and equating model that can be used to estimate student achievement, item difficulties, and rater severity on the same linear continuum. Given sufficient connectivity among the facets, the MFR model provides estimates of student achievement that are equated to…

Descriptors: Evaluators, Interrater Reliability, Academic Achievement, Music Education

A Validation Study of Classroom Assessment Scoring System-Secondary in the Finnish School Context

Peer reviewed

Direct link

Virtanen, T. E.; Pakarinen, E.; Lerkkanen, M.-K.; Poikkeus, A.-M.; Siekkinen, M.; Nurmi, J.-E. – Journal of Early Adolescence, 2018

This study examined the reliability and validity of the Classroom Assessment Scoring System-Secondary (CLASS-S) in Finnish classrooms. Trained observers coded classroom interactions based on video recordings of 46 Grade 6 classrooms (450 cycles). Concurrent associations were investigated with respect to teacher self-ratings (e.g., efficacy beliefs…

Descriptors: Factor Analysis, Classroom Observation Techniques, Foreign Countries, Factor Structure

Previous Page | Next Page »

Pages: 1 | 2

Wind, Stefanie A.	7
Walker, A. Adrienne	2
Bergee, Martin J.	1
Chacko, George	1
Chen, Guangyan	1
Chen, Mei-Ching	1
Christie, Christina A.	1
Eckes, Thomas	1
Engelhard, George, Jr.	1
Ferrara, Steve	1
Franke, Todd Michael	1
Hidri, Sahbi	1
Ho, Timothy	1
Hsu, Tammy Huei-Lien	1
Jin, Kuan-Yu	1
Jones, Eli	1
Lamprianou, Iasonas	1
Lerkkanen, M.-K.	1
Lindner, Mark D.	1
Nurmi, J.-E.	1
Pakarinen, E.	1
Poikkeus, A.-M.	1
Rossin, Emily G.	1
Schumacker, Randall E.	1
Siekkinen, M.	1
More ▼