ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	1

Descriptor

Measurement Techniques	7
Interrater Reliability	4
Scores	3
Test Interpretation	3
Test Reliability	3
Test Validity	3
Evaluation Methods	2
Reliability	2
Scoring	2
Test Use	2
Accuracy	1
Achievement Tests	1
Certification	1
Criteria	1
Difficulty Level	1
Educational Assessment	1
Error Patterns	1
Error of Measurement	1
Essay Tests	1
Estimation (Mathematics)	1
Evaluators	1
Expertise	1
Generalizability Theory	1
Generalization	1
Grade 7	1
More ▼

Source

Applied Measurement in…

Author

Cohen, Allan	1
Dunbar, Stephen B.	1
Feldt, Leonard S.	1
Fisher, Steve	1
Johnson, Robert L.	1
Kane, Michael	1
Kuhs, Therese	1
Lunz, Mary E.	1
Penny, Jim	1
Qualls, Audrey L.	1
Raczynski, Kevin	1
More ▼

Publication Type

Journal Articles	7
Reports - Evaluative	5
Reports - Research	2
Speeches/Meeting Papers	1

Education Level

Grade 7

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

The Sampling Theory for the Intraclass Reliability Coefficient.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1990

Sampling theory for the intraclass reliability coefficient, a Spearman-Brown extrapolation of alpha to a single measurement for each examinee, is less recognized and less cited than that of coefficient alpha. Techniques for constructing confidence intervals and testing hypotheses for the intraclass coefficient are presented. (SLD)

Descriptors: Hypothesis Testing, Measurement Techniques, Reliability, Sampling

Quality Control in the Development and Use of Performance Assessments.

Peer reviewed

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991

Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)

Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Estimating the Reliability of a Test Containing Multiple Item Formats.

Peer reviewed

Qualls, Audrey L. – Applied Measurement in Education, 1995

Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)

Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format

The Precision of Measurements.

Peer reviewed

Kane, Michael – Applied Measurement in Education, 1996

This overview of the role of error and tolerance for error in measurement asserts that the generic precision associated with a measurement procedure is defined as the root mean square error, or standard error, in some relevant population. This view of precision is explored in several applications of measurement. (SLD)

Descriptors: Error of Measurement, Error Patterns, Generalizability Theory, Measurement Techniques

Measuring the Impact of Judge Severity on Examination Scores.

Peer reviewed

Lunz, Mary E.; And Others – Applied Measurement in Education, 1990

An extension of the Rasch model is used to obtain objective measurements for examinations graded by judges. The model calibrates elements of each facet of the examination on a common log-linear scale. Real examination data illustrate the way correcting for judge severity improves fairness of examinee measures. (SLD)

Descriptors: Certification, Difficulty Level, Interrater Reliability, Judges