Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 2 |
Descriptor
Interrater Reliability | 7 |
Scores | 7 |
Test Interpretation | 7 |
Test Reliability | 4 |
Measurement Techniques | 3 |
Scoring | 3 |
Test Validity | 3 |
Difficulty Level | 2 |
Error of Measurement | 2 |
Essay Tests | 2 |
Evaluation Methods | 2 |
More ▼ |
Author
Clariana, Roy B. | 1 |
Dunbar, Stephen B. | 1 |
Koul, Ravinder | 1 |
Lunz, Mary E. | 1 |
Rudner, Lawrence M. | 1 |
Salehi, Roya | 1 |
Shale, Doug | 1 |
Sullivan, Francis J. | 1 |
Tengberg, Michael | 1 |
Publication Type
Journal Articles | 4 |
Reports - Evaluative | 3 |
Reports - Research | 3 |
Speeches/Meeting Papers | 2 |
ERIC Digests in Full Text | 1 |
ERIC Publications | 1 |
Education Level
Grade 9 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Location
Sweden | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tengberg, Michael – Language Assessment Quarterly, 2018
Reading comprehension is often treated as a multidimensional construct. In many reading tests, items are distributed over reading process categories to represent the subskills expected to constitute comprehension. This study explores (a) the extent to which specified subskills of reading comprehension tests are conceptually conceivable to…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Results

Lunz, Mary E.; And Others – Applied Measurement in Education, 1990
An extension of the Rasch model is used to obtain objective measurements for examinations graded by judges. The model calibrates elements of each facet of the examination on a common log-linear scale. Real examination data illustrate the way correcting for judge severity improves fairness of examinee measures. (SLD)
Descriptors: Certification, Difficulty Level, Interrater Reliability, Judges
Shale, Doug – 1986
This study is an attempt at a cohesive characterization of the concept of essay reliability. As such, it takes as a basic premise that previous and current practices in reporting reliability estimates for essay tests have certain shortcomings. The study provides an analysis of these shortcomings--partly to encourage a fuller understanding of the…
Descriptors: Analysis of Variance, Correlation, Error of Measurement, Essay Tests
Clariana, Roy B.; Koul, Ravinder; Salehi, Roya – International Journal of Instructional Media, 2006
This investigation seeks to confirm a computer-based approach that can be used to score concept maps (Poindexter & Clariana, 2004) and then describes the concurrent criterion-related validity of these scores. Participants enrolled in two graduate courses (n=24) were asked to read about and research online the structure and function of the heart…
Descriptors: Semantics, Human Body, Test Validity, Anatomy
Rudner, Lawrence M. – 1992
Several common sources of error in assessment that depends on the use of judges are identified, and ways to reduce the impact of rating errors are examined. Numerous threats to the validity of scores based on ratings exist. These threats include: (1) the halo effect; (2) stereotyping; (3) perception differences; (4) leniency/stringency error; and…
Descriptors: Alternative Assessment, Error of Measurement, Evaluation Methods, Evaluators

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques
Sullivan, Francis J. – 1986
A study examined how pragmatic form influences evaluation of student essays in university placement testing. Specifically, the study documented how patterns in students' use of information (assumed to be either old, inferable, or new for readers) affected the holistic scores for quality given to the essays. Subjects, 99 randomly selected entering…
Descriptors: College Freshmen, Essay Tests, Evaluation Criteria, Evaluation Methods