Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 16 |
Descriptor
Evaluation Methods | 47 |
Scoring | 47 |
Test Reliability | 47 |
Test Validity | 22 |
Interrater Reliability | 14 |
Student Evaluation | 13 |
Writing Evaluation | 12 |
Educational Assessment | 8 |
Elementary Secondary Education | 8 |
Higher Education | 8 |
Testing | 8 |
More ▼ |
Source
Author
Gearhart, Maryl | 2 |
Kane, Thomas J. | 2 |
Koretz, Daniel | 2 |
Staiger, Douglas O. | 2 |
Aksu, Gökhan | 1 |
Andrews, Jac | 1 |
Apache, R. R. | 1 |
Bae, Yunhee | 1 |
Baker, Eva L. | 1 |
Bejar, Isaac I. | 1 |
Boccaccini, Marcus T. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 5 |
Elementary Education | 2 |
Kindergarten | 2 |
Grade 1 | 1 |
Grade 2 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Practitioners | 8 |
Policymakers | 3 |
Teachers | 3 |
Researchers | 2 |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
Advanced Placement… | 2 |
Childrens Depression Inventory | 1 |
Graduate Record Examinations | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Cesur, Kursat – Educational Policy Analysis and Strategic Research, 2019
Examinees' performances are assessed using a wide variety of different techniques. Multiple-choice (MC) tests are among the most frequently used ones. Nearly, all standardized achievement tests make use of MC test items and there is a variety of ways to score these tests. The study compares number right and liberal scoring (SAC) methods. Mixed…
Descriptors: Multiple Choice Tests, Scoring, Evaluation Methods, Guessing (Tests)
Gorbunova, Tatiana N. – European Journal of Contemporary Education, 2017
The subject of the research is to build methodologies to evaluate the student knowledge by testing. The author points to the importance of feedback about the mastering level in the learning process. Testing is considered as a tool. The object of the study is to create the test system models for defence practice problems. Special attention is paid…
Descriptors: Testing, Evaluation Methods, Feedback (Response), Simulation
Sheehan, Kathleen M. – ETS Research Report Series, 2016
The "TextEvaluator"® text analysis tool is a fully automated text complexity evaluation tool designed to help teachers and other educators select texts that are consistent with the text complexity guidelines specified in the Common Core State Standards (CCSS). This paper provides an overview of the TextEvaluator measurement approach and…
Descriptors: Automation, Evaluation Methods, Reading Material Selection, Common Core State Standards
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Silvia, Paul J. – Thinking Skills and Creativity, 2011
The present research examined the reliability of three types of divergent thinking tasks (unusual uses, instances, consequences/implications) and two types of subjective scoring (an average across all responses vs. the responses people chose as their top-two responses) within a latent variable framework, using the maximal-reliability "H"…
Descriptors: Scoring, Creative Thinking, Thinking Skills, Test Reliability
Reed, Deborah K.; Vaughn, Sharon – Scientific Studies of Reading, 2012
The purpose of this narrative synthesis is to determine the reliability and validity of retell protocols for assessing reading comprehension of students in grades K-12. Fifty-four studies were systematically coded for data related to the administration protocol, scoring procedures, and technical adequacy of the retell component. Retell was…
Descriptors: Reading Comprehension, Reading Difficulties, Elementary Secondary Education, Learning Disabilities
Heldsinger, Sandra A.; Humphry, Stephen M. – Educational Research, 2013
Background: Many in education argue for the importance of incorporating teacher judgements in the assessment and reporting of student performance. Advocates of such an approach are cognisant, though, that obtaining a satisfactory level of consistency in teacher judgements poses a challenge. Purpose: This study investigates the extent to which the…
Descriptors: Evaluation Methods, Student Evaluation, Teacher Attitudes, Comparative Analysis
Rufino, Katrina A.; Boccaccini, Marcus T.; Guy, Laura S. – Assessment, 2011
Although reliability is essential to validity, most research on violence risk assessment tools has paid little attention to strategies for improving rater agreement. The authors evaluated the degree to which perceived subjectivity in scoring guidelines for items from two measures--the Psychopathy Checklist-Revised (PCL-R) and the Historical,…
Descriptors: Risk Management, Predictive Validity, Interrater Reliability, Scoring
Bae, Yunhee – Journal of Psychoeducational Assessment, 2012
This article presents a review of the Children's Depression Inventory 2 (CDI 2), published by Multi-Health Systems (MHS) to assess depressive symptoms in 7- to 17-year-old children and adolescents. Given the importance of early diagnosis and treatment (Kovacs & Devlin, 1998), the CDI 2 can assist professionals to pinpoint critical depressive…
Descriptors: Disability Identification, Depression (Psychology), Mental Disorders, Norms
Kane, Thomas J.; Staiger, Douglas O. – Bill & Melinda Gates Foundation, 2012
There is a growing consensus that teacher evaluation in the United States is fundamentally broken. Few would argue that a system that tells 98 percent of teachers they are "satisfactory" benefits anyone--including teachers. The nation's collective failure to invest in high-quality professional feedback to teachers is inconsistent with…
Descriptors: Teacher Effectiveness, Achievement Gains, Evaluation Methods, Teaching Methods
Kane, Thomas J.; Staiger, Douglas O. – Bill & Melinda Gates Foundation, 2012
Research has long been clear that teachers matter more to student learning than any other in-school factor. Improving the quality of teaching is critical to student success. Yet only recently have many states and districts begun to take seriously the importance of evaluating teacher performance and providing teachers with the feedback they need to…
Descriptors: Teacher Effectiveness, Achievement Gains, Evaluation Methods, Teaching Methods
Pakarinen, Eija; Lerkkanen, Marja-Kristiina; Poikkeus, Anna-Maija; Kiuru, Noona; Siekkinen, Martti; Rasku-Puttonen, Helena; Nurmi, Jari-Erik – Early Education and Development, 2010
Research Findings: This study examined the validity and reliability of the Classroom Assessment Scoring System (CLASS; R. C. Pianta, K. M. La Paro, & B. K. Hamre, 2008) in Finnish kindergartens. A pair of trained observers used the CLASS to observe 49 kindergarten teachers (47 female, 2 male) on two different days. Questionnaires measuring…
Descriptors: Scoring, Factor Analysis, Kindergarten, Foreign Countries
Burgin, John; Hughes, Gail D. – Assessing Writing, 2009
The authors explored the credibility of using informal reading inventories and writing samples for 138 students (K-4) to evaluate the effectiveness of a summer literacy program. Running Records (a measure of a child's reading level) and teacher experience during daily reading instruction were used to estimate the reliability of the more formal…
Descriptors: Informal Reading Inventories, Multiple Choice Tests, Program Effectiveness, Scoring
Gearhart, Maryl; Osmundson, Ellen – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2008
This report is an analysis of the role of assessment portfolios in teacher learning. Over 18 months, 19 experienced science teachers worked in grade-level teams to design, implement, and evaluate assessments to track student learning throughout a curriculum unit, supported by semi-structured tasks and resources in assessment portfolios.…
Descriptors: Portfolios (Background Materials), Student Evaluation, Focus Groups, Scoring