ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	17

Descriptor

Evaluation Methods	28
Evaluators	28
Statistical Analysis	28
Program Evaluation	7
Interrater Reliability	5
Comparative Analysis	4
Correlation	4
Foreign Countries	4
Measurement Techniques	4
Qualitative Research	4
Second Language Learning	4
College Students	3
English (Second Language)	3
Essays	3
Language Tests	3
Rating Scales	3
Reliability	3
Research Methodology	3
Sampling	3
Scores	3
Scoring	3
Semi Structured Interviews	3
Simulation	3
Student Evaluation	3
Teacher Evaluation	3
More ▼

Source

American Journal of Evaluation	2
Applied Measurement in…	2
Evaluation Practice	2
Language Testing	2
Advances in Health Sciences…	1
Advances in Language and…	1
Clinical Linguistics &…	1
Contemporary Issues in…	1
ETS Research Report Series	1
Educational and Psychological…	1
Journal of Applied Psychology	1
Journal of Experimental…	1
Journal of MultiDisciplinary…	1
Language Assessment Quarterly	1
Mid-Western Educational…	1
ProQuest LLC	1
Psychological Methods	1
Society for Research on…	1
More ▼

Publication Type

Journal Articles	19
Reports - Research	15
Reports - Evaluative	4
Reports - Descriptive	3
Tests/Questionnaires	3
Guides - General	2
Guides - Non-Classroom	2
Information Analyses	2
Opinion Papers	2
Dissertations/Theses -…	1
Numerical/Quantitative Data	1
More ▼

Education Level

Higher Education	4
Postsecondary Education	3
Grade 7	2
Elementary Secondary Education	1
Junior High Schools	1

Audience

Researchers

Location

California	1
China	1
Iran	1
Minnesota	1
New Hampshire	1
Ohio	1
Oregon	1
Texas	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

Bayesian Posterior Odds Ratios: Statistical Tools for Collaborative Evaluations

Peer reviewed
PDF on ERIC

Download full text

Direct link

Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon – American Journal of Evaluation, 2018

To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…

Descriptors: Bayesian Statistics, Evaluation Methods, Statistical Analysis, Hypothesis Testing

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models

Peer reviewed

Direct link

Lamprianou, Iasonas – Educational and Psychological Measurement, 2018

It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…

Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation

Functional Adequacy in L2 Writing: Towards a New Rating Scale

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017

The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…

Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse

Exploring Examiner Judgement of Professional Competence in Rater Based Assessment

Peer reviewed

Direct link

Naumann, Fiona L.; Marshall, Stephen; Shulruf, Boaz; Jones, Philip D. – Advances in Health Sciences Education, 2016

Exercise physiology courses have transitioned to competency based, forcing Universities to rethink assessment to ensure students are competent to practice. This study built on earlier research to explore rater cognition, capturing factors that contribute to assessor decision making about students' competency. The aims were to determine the source…

Descriptors: Exercise Physiology, Evaluators, Competency Based Education, Evaluation Methods

Redesigning Teacher Evaluation: Lessons Learned from a Pilot Implementation in New Hampshire

Peer reviewed
PDF on ERIC

Download full text

Riordan, Julie; Lacireno-Paquet, Natalie; Shakman, Karen; Bocala, Candice – Society for Research on Educational Effectiveness, 2016

Many studies have called attention to the limitations of current teacher evaluation systems and the need for reform nationwide. This study addresses three research questions: (1) What are the features of the new teacher evaluation systems in New Hampshire's districts with SIG schools?; (2) To what extent did schools implement the evaluation system…

Descriptors: Teacher Evaluation, Program Implementation, Pilot Projects, Surveys

A Comparison of EFL Raters' Essay-Rating Processes across Two Types of Rating Scales

Peer reviewed

Direct link

Li, Hang; He, Lianzhen – Language Assessment Quarterly, 2015

This study used think-aloud protocols to compare essay-rating processes across holistic and analytic rating scales in the context of China's College English Test Band 6 (CET-6). A group of 9 experienced CET-6 raters scored the same batch of 10 CET-6 essays produced in an operational CET-6 administration twice, using both the CET-6 holistic…

Descriptors: Protocol Analysis, English (Second Language), Second Language Learning, Classification

Theorists' Theories of Evaluation: A Conversation with Jennifer Greene

Peer reviewed

Direct link

Tarsilla, Michele – Journal of MultiDisciplinary Evaluation, 2010

Background: Evaluation is sometimes viewed as a professional practice rather than a discipline corresponding to a well defined set of theories. However, Shadish, Cook and Leviton (1991) were able to demonstrate that evaluators' work does have theoretical foundations. In particular, the authors identified five main elements for evaluation theory…

Descriptors: Evaluation, Theories, Interviews, Evaluators

Comparison of Alternative Methods for Obtaining Severity Scores of the Speech of People Who Stutter

Peer reviewed

Direct link

Howell, Peter; Soukup-Ascencao, Tajana; Davis, Stephen; Rusbridge, Sarah – Clinical Linguistics & Phonetics, 2011

Riley's Stuttering Severity Instrument (SSI) is widely used. The manuals allow SSI assessments to be made in different ways (e.g. from digital recordings or whilst listening to speech live). Digital recordings allow segments to be selected and listened to, whereas the entire recording has to be judged when listened to live. Comparison was made…

Descriptors: Stuttering, Evaluation Methods, Severity (of Disability), Scores

Evaluation of the "e-rater"® Scoring Engine for the "TOEFL"® Independent and Integrated Prompts. Research Report. ETS RR-12-06

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012

Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…

Descriptors: Scoring, Prompting, Evaluators, Computer Software

Using Microanalytical Simulation Methods in Educational Evaluation: An Exploratory Study

Peer reviewed

Direct link

Sondergeld, Toni A.; Beltyukova, Svetlana A.; Fox, Christine M.; Stone, Gregory E. – Mid-Western Educational Researcher, 2012

Scientifically based research used to inform evidence based school reform efforts has been required by the federal government in order to receive grant funding since the reenactment of No Child Left Behind (2002). Educational evaluators are thus faced with the challenge to use rigorous research designs to establish causal relationships. However,…

Descriptors: Research Design, Research Tools, Simulation, Educational Research

Self-Assessment and Peer-Assessment in an EFL Context

Peer reviewed
PDF on ERIC

Download full text

Yamini, Morteza; Tahmasebi, Soheila – Advances in Language and Literary Studies, 2012

Salient in an EFL teaching context is students' dissatisfaction with their final scores especially in oral courses. This study tried to bridge the gap between students' and teachers' rating system through alternatives to existing measurement methods. Task-based language assessment has stimulated language teachers to question the way through which…

Descriptors: Self Evaluation (Individuals), Peer Evaluation, Reading Comprehension, Majors (Students)

The Strength of the Methodological Warrants for the Findings of Research on Program Evaluation Use

Peer reviewed

Direct link

Brandon, Paul R.; Singh, J. Malkeet – American Journal of Evaluation, 2009

Considerable research has been conducted on the use of the findings of program evaluation, but little, if any, attention has been paid to the soundness of the methods of this research. If the methods are not sound or not well described in the research, the strength of the conclusions of the research is unknown. The authors examine the empirical…

Descriptors: Evaluators, Program Evaluation, Literature Reviews, Guidance

Testing Intergroup Concordance in Ranking Experiments with Two Groups of Judges

Peer reviewed

Direct link

Dekle, Dawn J.; Leung, Denis H. Y.; Zhu, Min – Psychological Methods, 2008

Across many areas of psychology, concordance is commonly used to measure the (intragroup) agreement in ranking a number of items by a group of judges. Sometimes, however, the judges come from multiple groups, and in those situations, the interest is to measure the concordance between groups, under the assumption that there is some within-group…

Descriptors: Item Response Theory, Statistical Analysis, Psychological Studies, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Beltyukova, Svetlana A.	1
Bhola, H. S.	1
Bocala, Candice	1
Borman, Walter C.	1
Brandon, Paul R.	1
Bridgeman, Brent	1
Bronson, William H.	1
Brown, Michelle Stallone	1
Choi, Jeong Hoon	1
Cohen, Allan	1
Cook, Daniel W.	1
Cooper, Paul G.	1
Davey, Tim	1
Davis, Stephen	1
Dekle, Dawn J.	1
Fox, Christine M.	1
Gregoire, Shirley Ann	1
Hambleton, Ronald K.	1
He, Lianzhen	1
Herman, Joan	1
Hicks, Tyler	1
Howell, Peter	1
Jones, Philip D.	1
Kuiken, Folkert	1
More ▼