Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 4 |
Descriptor
Interrater Reliability | 26 |
Testing Programs | 26 |
Scoring | 15 |
Writing Evaluation | 12 |
State Programs | 11 |
Evaluation Methods | 10 |
Educational Assessment | 9 |
Essay Tests | 7 |
Performance Based Assessment | 6 |
Test Construction | 6 |
Elementary Secondary Education | 5 |
More ▼ |
Source
Author
Publication Type
Reports - Research | 15 |
Journal Articles | 10 |
Reports - Evaluative | 9 |
Speeches/Meeting Papers | 8 |
Numerical/Quantitative Data | 3 |
Opinion Papers | 2 |
Reports - Descriptive | 2 |
Tests/Questionnaires | 2 |
Education Level
Grade 4 | 3 |
Elementary Education | 2 |
Grade 5 | 2 |
Grade 6 | 2 |
Grade 8 | 2 |
Early Childhood Education | 1 |
Grade 3 | 1 |
Grade 7 | 1 |
High Schools | 1 |
Higher Education | 1 |
Intermediate Grades | 1 |
More ▼ |
Audience
Practitioners | 2 |
Teachers | 2 |
Researchers | 1 |
Location
Pennsylvania | 2 |
California | 1 |
New York | 1 |
Texas | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
Advanced Placement… | 3 |
National Assessment of… | 2 |
General Educational… | 1 |
What Works Clearinghouse Rating
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Unger, Darian – American Journal of Business Education, 2010
Although there is significant research on improving college-level teaching practices, most literature in the field assumes an incentive for improvement. The research presented in this paper addresses the issue of poor incentives for improving university-level teaching. Specifically, it proposes instructor-designed common examinations as an…
Descriptors: Educational Innovation, Educational Improvement, Instructional Improvement, Business Administration Education

Jones, Terry; Cason, Carolyn L.; Mancini, Mary E. – Journal of Professional Nursing, 2002
Registered nurses (n=368) participated in a skills recredentialing program in which competencies were assessed by a knowledge test and performance test under simulated conditions and evaluator ratings in actual patient-care situations. No significant differences in results between the simulated and actual conditions support the validity of the…
Descriptors: Competence, Credentials, Interrater Reliability, Nurses

Miller, Jeff – College Teaching, 1999
A college faculty member who has graded Advanced Placement exam essays on U.S. government and politics, taken mostly by high school juniors and seniors, suggests that high school teachers and college faculty who assess the essays are not the best qualified persons to do so and that despite efforts to ensure consistency, the resulting scores are…
Descriptors: Advanced Placement, College Instruction, Essays, Evaluation Criteria

McLauchlan, William – College Teaching, 1999
A faculty consultant to the Educational Testing Service for advanced placement (AP) test reading in U.S. government and politics responds to an article criticizing essay evaluation methods and criteria, finding in it a fundamental misunderstanding of the AP reading process and explaining why the essays are subject to less scrutiny for style,…
Descriptors: Advanced Placement, College Instruction, Essays, Evaluation Criteria

Hollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999
Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)
Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8

Walter, Richard A.; Kapes, Jerome T. – Journal of Industrial Teacher Education, 2003
To identify a procedure for establishing cut scores for National Occupational Competency Testing Institute examinations in Pennsylvania, an expert panel assessed written and performance test items for minimally competent workers. Recommendations about the number, type, and training of judges used were made. (Contains 18 references.) (SK)
Descriptors: Cutting Scores, Interrater Reliability, Occupational Tests, Teacher Competency Testing
Gearhart, Maryl; Novak, John R.; Herman, Joan L. – 1994
Technical questions regarding the reliability and validity of large-scale portfolio assessment were studied which focused on: (1) whether raters can score collections of writing reliably with rubrics designed for single samples; (2) whether ratings derived from different frameworks differ in their capacities to support technically sound…
Descriptors: Educational Assessment, Elementary Education, Elementary School Students, Essay Tests

Congdon, Peter J.; McQueen, Joy – Journal of Educational Measurement, 2000
Studied the stability of rater severity over an extended rating period by applying multifaceted Rasch analysis to ratings of 16 raters of writing performances of 8,285 elementary school students. Findings cast doubt on the practice of using a single calibration of rate severity as the basis for adjustment of person measures. (SLD)
Descriptors: Educational Assessment, Elementary Education, Elementary School Students, Interrater Reliability

Page, Ellis Batten – Journal of Experimental Education, 1994
National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)
Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods
Vendlinski, Terry P.; Nagashima, Sam; Herman, Joan L. – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2007
Current educational policy highlights the important role that assessment can play in improving education. State standards and the assessments that are aligned with them establish targets for learning and promote school accountability for helping all students succeed; at the same time, feedback from assessment results is expected to provide …
Descriptors: Elementary School Science, Federal Legislation, State Standards, Educational Improvement
University of South Florida, Tampa. Coll. of Education. – 1980
This report describes the procedures followed in scoring the October 1978 Florida Minimal Writing Production Skills Assessment and reports the results of that assessment. The assessment was conducted on a sample of Florida public school students in grades 3, 5, 8, and 11. Sections include descriptions of the rating scale and scorer's guide as well…
Descriptors: Educational Assessment, Elementary Secondary Education, Interrater Reliability, Minimum Competency Testing

Tyson, LeaAnn; Silverman, Stephen – Journal of Personnel Evaluation in Education, 1994
Differences in the Texas Teacher Appraisal System scores of teacher subgroups over 2 years were examined for 2,366 teachers for scores on individual domains, sums of scores of the 1st 4 domains, and overall summary performance scores, as well as appraiser differences. Implications for teacher evaluation are discussed. (SLD)
Descriptors: Educational Assessment, Elementary Secondary Education, Evaluation Methods, Evaluators
Ferrara, Steven F. – 1987
The necessity of controlling the order in which trained essay raters for a statewide writing assessment program receive student essays was studied. The underlying theoretical question concerns possible rater bias caused by raters reading long strings of essays of homogeneous quality; this problem is usually referred to as context effect or…
Descriptors: Context Effect, Essay Tests, Evaluators, Graduation Requirements
Previous Page | Next Page »
Pages: 1 | 2