NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)7
Since 2006 (last 20 years)17
Audience
Researchers3
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign…1
What Works Clearinghouse Rating
Showing 1 to 15 of 28 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon – American Journal of Evaluation, 2018
To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…
Descriptors: Bayesian Statistics, Evaluation Methods, Statistical Analysis, Hypothesis Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Peer reviewed Peer reviewed
Direct linkDirect link
Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017
The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…
Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse
Peer reviewed Peer reviewed
Direct linkDirect link
Naumann, Fiona L.; Marshall, Stephen; Shulruf, Boaz; Jones, Philip D. – Advances in Health Sciences Education, 2016
Exercise physiology courses have transitioned to competency based, forcing Universities to rethink assessment to ensure students are competent to practice. This study built on earlier research to explore rater cognition, capturing factors that contribute to assessor decision making about students' competency. The aims were to determine the source…
Descriptors: Exercise Physiology, Evaluators, Competency Based Education, Evaluation Methods
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Riordan, Julie; Lacireno-Paquet, Natalie; Shakman, Karen; Bocala, Candice – Society for Research on Educational Effectiveness, 2016
Many studies have called attention to the limitations of current teacher evaluation systems and the need for reform nationwide. This study addresses three research questions: (1) What are the features of the new teacher evaluation systems in New Hampshire's districts with SIG schools?; (2) To what extent did schools implement the evaluation system…
Descriptors: Teacher Evaluation, Program Implementation, Pilot Projects, Surveys
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Hang; He, Lianzhen – Language Assessment Quarterly, 2015
This study used think-aloud protocols to compare essay-rating processes across holistic and analytic rating scales in the context of China's College English Test Band 6 (CET-6). A group of 9 experienced CET-6 raters scored the same batch of 10 CET-6 essays produced in an operational CET-6 administration twice, using both the CET-6 holistic…
Descriptors: Protocol Analysis, English (Second Language), Second Language Learning, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Tarsilla, Michele – Journal of MultiDisciplinary Evaluation, 2010
Background: Evaluation is sometimes viewed as a professional practice rather than a discipline corresponding to a well defined set of theories. However, Shadish, Cook and Leviton (1991) were able to demonstrate that evaluators' work does have theoretical foundations. In particular, the authors identified five main elements for evaluation theory…
Descriptors: Evaluation, Theories, Interviews, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Howell, Peter; Soukup-Ascencao, Tajana; Davis, Stephen; Rusbridge, Sarah – Clinical Linguistics & Phonetics, 2011
Riley's Stuttering Severity Instrument (SSI) is widely used. The manuals allow SSI assessments to be made in different ways (e.g. from digital recordings or whilst listening to speech live). Digital recordings allow segments to be selected and listened to, whereas the entire recording has to be judged when listened to live. Comparison was made…
Descriptors: Stuttering, Evaluation Methods, Severity (of Disability), Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…
Descriptors: Scoring, Prompting, Evaluators, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Sondergeld, Toni A.; Beltyukova, Svetlana A.; Fox, Christine M.; Stone, Gregory E. – Mid-Western Educational Researcher, 2012
Scientifically based research used to inform evidence based school reform efforts has been required by the federal government in order to receive grant funding since the reenactment of No Child Left Behind (2002). Educational evaluators are thus faced with the challenge to use rigorous research designs to establish causal relationships. However,…
Descriptors: Research Design, Research Tools, Simulation, Educational Research
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Yamini, Morteza; Tahmasebi, Soheila – Advances in Language and Literary Studies, 2012
Salient in an EFL teaching context is students' dissatisfaction with their final scores especially in oral courses. This study tried to bridge the gap between students' and teachers' rating system through alternatives to existing measurement methods. Task-based language assessment has stimulated language teachers to question the way through which…
Descriptors: Self Evaluation (Individuals), Peer Evaluation, Reading Comprehension, Majors (Students)
Peer reviewed Peer reviewed
Direct linkDirect link
Brandon, Paul R.; Singh, J. Malkeet – American Journal of Evaluation, 2009
Considerable research has been conducted on the use of the findings of program evaluation, but little, if any, attention has been paid to the soundness of the methods of this research. If the methods are not sound or not well described in the research, the strength of the conclusions of the research is unknown. The authors examine the empirical…
Descriptors: Evaluators, Program Evaluation, Literature Reviews, Guidance
Peer reviewed Peer reviewed
Direct linkDirect link
Dekle, Dawn J.; Leung, Denis H. Y.; Zhu, Min – Psychological Methods, 2008
Across many areas of psychology, concordance is commonly used to measure the (intragroup) agreement in ranking a number of items by a group of judges. Sometimes, however, the judges come from multiple groups, and in those situations, the interest is to measure the concordance between groups, under the assumption that there is some within-group…
Descriptors: Item Response Theory, Statistical Analysis, Psychological Studies, Evaluators
Previous Page | Next Page »
Pages: 1  |  2