Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Uzuner, Yildiz – Deafness and Education International, 2007
This action research study investigated the impacts of teaching strategies based on the balanced literacy approach on story grammar acquisition of three Turkish students with hearing loss. Data were collected from students' work, archival information, interviews and observations during lessons which were audio- and video-recorded. The pre- and…
Descriptors: Story Grammar, Test Results, Action Research, Hearing (Physiology)
Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007
The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…
Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences
Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet – Language Testing, 2007
The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…
Descriptors: Writing Evaluation, Writing Tests, Scoring Rubrics, Rating Scales
Chen, H. Julie – 1995
A study investigated 42 native English-speakers' (NSs) perceptions of the pragmatic appropriateness of refusal statements. The NSs rated the appropriateness of 24 written statements in 4 different refusal scenarios, which were collected from both native speakers and non-native speakers. Four weeks later, as a reliability check, the subjects rated…
Descriptors: Attitudes, Comparative Analysis, English (Second Language), Interrater Reliability
Parkes, Jay; Suen, Hoi K. – 1995
This study demonstrates the advantages of using a constrained optimization algorithm to explore the optimal number of prompts, modes of discourse, and raters for achieving an acceptable level of reliability during a direct writing assessment. Writing samples elicited from 50 college students were rated by 3 graduate students and the scores…
Descriptors: Algorithms, College Students, Educational Assessment, Generalizability Theory
Brody, Leslie R.; Hay, Deborah H. – 1991
This paper reports on evaluations of a projective measure of self-esteem adapted from the Tasks of Emotional Development (TED). The evaluations were conducted in 7 studies with a total sample of 416 children and adults. The revised TED uses a five-point scoring system ranging from negative to positive self-esteem. Interrater reliability in the…
Descriptors: Adults, Children, Interrater Reliability, Measurement Techniques
Engelhard, George, Jr. – 1991
A many-faceted Rasch model (FACETS) is presented for the measurement of writing ability. The FACETS model is a multivariate extension of Rasch measurement models that can be used to provide a framework for calibrating both raters and writing tasks within the context of writing assessment. A FACETS model is described based on the current procedures…
Descriptors: Grade 8, Holistic Evaluation, Interrater Reliability, Item Response Theory
Fitz, Don – 1984
The Client Observation Checklist (COC) was developed to evaluate Project ADAPT's intervention in three behavioral areas: bathing; dressing; and socialization. Project ADAPT is designed to provide services to meet the needs of chronically mentally ill residents of nursing homes. Specifically, the project provides staff trained to work with the…
Descriptors: Client Characteristics (Human Services), Hygiene, Institutionalized Persons, Interrater Reliability
Gregory, Kemp – 1991
A balanced appraisal of holistic scoring of writing is presented via: examination of the present popularity of holistic scoring; analysis of several weaknesses associated with the holistic scoring method; and recommendations for remedying these weaknesses. Six reasons for the popularity of holistic scoring are: (1) relative lack of expense; (2)…
Descriptors: Child Development, Cost Effectiveness, Elementary Secondary Education, Holistic Evaluation
Merrill, Beverly; Peterson, Sarah – 1986
When the Mesa, Arizona Public Schools initiated an ambitious writing instruction program in 1978, two assessments based on student writing samples were developed. The first is based on a ninth grade proficiency test. If the student does not pass the test, high school remediation is provided. After 1987, students must pass this test in order to…
Descriptors: Computer Assisted Testing, Elementary Secondary Education, Graduation Requirements, Holistic Evaluation
McIntyre, Kenneth E. – 1986
This paper dealt with the use of classroom observation data for formative evaluation purposes, and with a research project in which scores based on observed performance of teachers in secondary school algebra and English classes were compared with efficiency scores based on an input-output model. The model, using Data Envelopment Analysis (DEA)…
Descriptors: Algebra, Classroom Observation Techniques, Classroom Research, Evaluation Methods
Humes, Ann – 1983
This paper, as an illustration of the procedures involved in a cooperative effort, describes a project in which the Southwest Regional Laboratory (SWRL) designed and developed a minimum standards test in collaboration with a large urban school district in California. The activity described focuses on the writing sample included in the test. The…
Descriptors: High Schools, Institutional Cooperation, Interrater Reliability, Minimum Competency Testing
Mitchell, Karen J.; Anderson, Judith A. – 1986
A pilot essay was included in the 1985 Spring and Fall administrations of the Medical College Admission Test. A sample of 320 of the essays written by Fall examinees who had expressed an interest in allopathic medicine was used to calculate interrater reliability estimates. Sixteen of 20 readers who had been trained by White's suggestions for…
Descriptors: Analysis of Variance, College Entrance Examinations, Essay Tests, Higher Education
Cloud-Silva, Connie; Denton, Jon J. – 1988
A prototype low inference observation instrument to measure minimal teaching competencies of teaching candidates was deductively developed. Focus is on determining if observers could be trained to use the observation instrument with a high degree of reliability and validity. The instrument, entitled Classroom Observation and Assessment Scale for…
Descriptors: Classroom Observation Techniques, Elementary Secondary Education, Evaluation Methods, Interrater Reliability
Nicolai, Michael T. – 1987
To determine if there is a distinction between the forensics community's idea of quality and that of the general population, tournament rankings of forensics judges and those of a lay audience were compared. Undergraduate students enrolled in a variety of speech related courses were asked to attend rounds of competition at a midwest collegiate…
Descriptors: Communication Research, Comparative Analysis, Debate, Evaluation Criteria

Peer reviewed
Direct link
