ERIC - Search Results

Descriptor

Evaluators	13
Interrater Reliability	13
Testing Problems	13
Scoring	7
Licensing Examinations…	4
Standard Setting (Scoring)	4
Evaluation Methods	3
Examiners	3
Latent Trait Theory	3
Models	3
Test Interpretation	3
Test Items	3
Test Reliability	3
Cutting Scores	2
Higher Education	2
Language Tests	2
Mathematical Models	2
Performance Based Assessment	2
Rating Scales	2
Reading Tests	2
Scores	2
Standardized Tests	2
Test Construction	2
Test Format	2
Analysis of Variance	1
More ▼

Source

Educational Measurement:…

Author

Auchter, Joan Chikos	1
Crews, William E., Jr.	1
Engelhard, George, Jr.	1
Geisinger, Kurt F.	1
Halpin, Glennelle	1
Jaeger, Richard M.	1
Johnson, Eugene G.	1
Kaplan, Bruce A.	1
Kreeft, Henk	1
Linacre, John M.	1
Lunz, Mary E.	1
McLean, James E.	1
Patience, Wayne	1
Plake, Barbara S.	1
Raymond, Mark R.	1
Sanders, Piet	1
Shiflett, Samuel	1
Viswesvaran, Chockalingam	1
More ▼

Publication Type

Speeches/Meeting Papers	8
Reports - Evaluative	7
Reports - Research	6
Journal Articles	3
Tests/Questionnaires	2
Opinion Papers	1

Education Level

Audience

Location

Netherlands

Laws, Policies, & Programs

Assessments and Surveys

Alabama High School…	1
General Educational…	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Objectivity for Judge-Intermediated Certification Examinations.

Download full text

Linacre, John M. – 1989

An accepted criterion for gauging the fairness of examinees' scores, derived from judge-awarded ratings, has been the size of the correlation between the judges and the inter-rater reliability. Various means of achieving inter-rater reliability were reviewed, and a model to measure inter-rater reliability is forwarded. Both theoretical and…

Descriptors: Evaluators, Interrater Reliability, Latent Trait Theory, Licensing Examinations (Professions)

Using Standard-Setting Data to Establish Cutoff Scores.

Peer reviewed

Geisinger, Kurt F. – Educational Measurement: Issues and Practice, 1991

Ways to use standard-setting data to adjust cutoff scores on examinations are reviewed. Ten sources of information to be used in determining standards are listed. The decision to modify passing scores should be based on these types of information and consideration of adverse impact or rating process irregularities. (SLD)

Descriptors: Cutting Scores, Evaluation Utilization, Evaluators, Interrater Reliability

Factors Influencing Intrajudge Consistency during Standard-Setting.

Peer reviewed

Plake, Barbara S.; And Others – Educational Measurement: Issues and Practice, 1991

Possible sources of intrajudge inconsistency in standard setting are reviewed, and approaches are presented to improve the accuracy of rating. Procedures for providing judges with feedback through discussion or computerized communication are discussed. Monitoring and maintaining judges' consistency throughout the rating process are essential. (SLD)

Descriptors: Computer Assisted Instruction, Evaluators, Examiners, Feedback

Analysis of Interrater Reliability on the Evaluation of Answers to Open-Ended Questions.

Crews, William E., Jr. – 1991

As part of a study of teacher evaluation of student replies to open-ended questions, a second question--the best method of determining interrater reliability--was examined. The standard method, the Pearson Product-Moment correlation, overestimated the degree of match between researchers' and teachers' scoring of tests. The simpler percent…

Descriptors: Comparative Analysis, Elementary School Teachers, Evaluation Methods, Evaluators

Least-Squares Models to Correct for Rater Effects in Performance Assessment.

Download full text

Raymond, Mark R.; Viswesvaran, Chockalingam – 1991

This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…

Descriptors: Evaluators, Higher Education, Interrater Reliability, Least Squares Statistics

Selection of Judges for Standard-Setting.

Peer reviewed

Jaeger, Richard M. – Educational Measurement: Issues and Practice, 1991

Issues concerning the selection of judges for standard setting are discussed. Determining the consistency of judges' recommendations, or their congruity with other expert recommendations, would help in selection. Enough judges must be chosen to allow estimation of recommendations by an entire population of judges. (SLD)

Descriptors: Cutting Scores, Evaluation Methods, Evaluators, Examiners

Accuracy of Bias Review Judges in Identifying Differential Item Functioning on Teacher Certification Tests.

Download full text

Engelhard, George, Jr.; And Others – 1989

Whether judges on bias review committees can identify test items that function differently for black and white examinees was studied. Judges (n=42) on three bias review committees were asked to examine a set of items and predict differential item functioning (DIF) without empirical data. Test items from teacher certification tests in the content…

Descriptors: Black Students, Evaluators, Interrater Reliability, Item Analysis

Sources of Variability in the Angoff Standard-Setting Process.

Download full text

Halpin, Glennelle; McLean, James E. – 1991

Although the standard-setting method of W. H. Angoff (1971) has broad-based support in the research literature, inconsistencies in the resulting standards do occur. Sources of these inconsistencies are examined in a study of judges, competencies (items), rounds (replications), and the interactions among them. A modified Angoff approach was used to…

Descriptors: Analysis of Variance, Error of Measurement, Evaluators, High Schools

Decentralized Large Scale Essay Scoring: Methods for Establishing and Evaluating Score Scale Stability and Reading Reliability.

Auchter, Joan Chikos; Patience, Wayne – 1989

The methods used by the General Educational Development Testing Service (GEDTS) to establish and maintain score stability and reading reliability on its direct assessment of writing are described. Using the 1988 site certification and monitoring results of several scoring sites, the focus is on describing how the score scale was established and…

Descriptors: Decentralization, Equivalency Tests, Essay Tests, Evaluators

Reliability of Professionally Scored Data: NAEP-Related Issues.

Kaplan, Bruce A.; Johnson, Eugene G. – 1992

Across the field of educational assessment the case has been made for alternatives to the multiple-choice item type. Most of the alternative types of items require a subjective evaluation by a rater. The reliability of this subjective rating is a key component of these types of alternative items. In this paper, measures of reliability are…

Descriptors: Educational Assessment, Elementary Secondary Education, Estimation (Mathematics), Evaluators

Model Responses for Examinations with Open-Ended Questions.

Kreeft, Henk; Sanders, Piet – 1983

In the Dutch national examinations, reading comprehension tests are used for all languages. For the native language, reading comprehension is tested with reading passages and related questions to which the test-taker provides his own response, not choosing from a group of alternatives. One problem encountered in testing with these items is…

Descriptors: Dutch, Evaluation Methods, Evaluators, Foreign Countries

Variation among Examiners and Protocols on Oral Examinations.

Lunz, Mary E.; And Others – 1989

A method for understanding and controlling the multiple facets of an oral examination (OE) or other judge-intermediated examination is presented and illustrated. This study focused on determining the extent to which the facets model (FM) analysis constructs meaningful variables for each facet of an OE involving protocols, examiners, and…

Descriptors: Computer Software, Difficulty Level, Evaluators, Examiners

The Definition and Measurement of Small Military Unit Team Functions. Final Report, July 1980-October 1981.

Download full text

Shiflett, Samuel; And Others – 1985

A study was undertaken to improve the measurement of small team performance within the Army. A provisional taxonomy of team-level performance functions was field-validated; criteria and measures of the functions were developed; and their reliability was examined. The provisional taxonomy, used for observing Army field training exercises, was used…

Descriptors: Behavior Rating Scales, Classification, Evaluation Criteria, Evaluators