ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	6

Descriptor

Comparative Testing	10
Interrater Reliability	10
Test Reliability	10
Test Validity	5
Evaluation Criteria	4
Foreign Countries	4
College Students	3
Evaluation Methods	3
Scoring	3
Criterion Referenced Tests	2
Essay Tests	2
Higher Education	2
Multiple Choice Tests	2
Standardized Tests	2
Student Evaluation	2
Test Construction	2
Undergraduate Students	2
Writing Evaluation	2
Academic Standards	1
Alternative Assessment	1
Calculus	1
Child Development	1
Clinical Experience	1
College Entrance Examinations	1
College Freshmen	1
More ▼

Source

Advances in Physiology…	1
Early Child Development and…	1
International Journal of…	1
Journal of Consulting and…	1
Journal of Educational…	1
Physical Review Special…	1
Studies in Higher Education	1

Author

Alcock, Lara	1
Barter, Alice K.	1
Breland, Hunter M.	1
Goldstein, Harvey	1
Hamid Mohammadi	1
Homer, Matthew S.	1
Jones, Ian	1
Korat, Ofra	1
Mark J. Gierl	1
O'Hara, Michael W.	1
Ole J. Kemi	1
Pell, Godfrey	1
Rehm, Lynn P.	1
Roberts, Trudie E.	1
Shiell, Ralph C.	1
Slepkov, Aaron D.	1
Tahereh Firoozi	1
Wolf, Alison	1
More ▼

Publication Type

Reports - Research	8
Journal Articles	7
Reports - Evaluative	2
Tests/Questionnaires	2
Books	1
Speeches/Meeting Papers	1

Education Level

Higher Education	5
Postsecondary Education	4
Early Childhood Education	1

Audience

Researchers

Location

Canada	1
Israel	1
United Kingdom	1
United Kingdom (Leeds)	1

Laws, Policies, & Programs

Assessments and Surveys

Hamilton Rating Scale for…	1
SAT (College Admission Test)	1
Student Descriptive…	1
Test of Standard Written…	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Comparison of Integrated Testlet and Constructed-Response Question Formats

Peer reviewed

Direct link

Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014

Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…

Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests

Peer Assessment without Assessment Criteria

Peer reviewed

Direct link

Jones, Ian; Alcock, Lara – Studies in Higher Education, 2014

Peer assessment typically requires students to judge peers' work against assessment criteria. We tested an alternative approach in which students judged pairs of scripts against one another in the absence of assessment criteria. First year mathematics undergraduates (N?=?194) sat a written test on conceptual understanding of multivariable…

Descriptors: Peer Evaluation, Evaluation Criteria, Alternative Assessment, Undergraduate Students

How Accurate Can Mothers and Teachers Be regarding Children's Emergent Literacy Development? A Comparison between Mothers with High and Low Education

Peer reviewed

Direct link

Korat, Ofra – Early Child Development and Care, 2009

The relationship between mothers' and educators' evaluation of 75 children's emergent literacy levels and actual levels were investigated. Two groups of mothers participated: mothers with a low education and mothers with a high education. The children's emergent literacy was measured. The mothers evaluated their own children and 40 teachers…

Descriptors: Mothers, Emergent Literacy, Interrater Reliability, Mother Attitudes

Assessor Training: Its Effects on Criterion-Based Assessment in a Medical Context

Direct link

Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E. – International Journal of Research & Method in Education, 2008

Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…

Descriptors: Medical Education, Standardized Tests, Clinical Experience, Criterion Referenced Tests

Hamilton Rating Scale for Depression: Reliability and Validity of Judgments of Novice Raters.

Peer reviewed

O'Hara, Michael W.; Rehm, Lynn P. – Journal of Consulting and Clinical Psychology, 1983

Used the intraclass correlation coefficient to estimate the interrater reliability of judgments of clinician and novice raters of depressed females (N=20) who took the Hamilton Rating Scale for Depression (HRSD). Expert and student raters both made reliable ratings on the HRSD. Criterion validity for student raters was also satisfactory.…

Descriptors: College Students, Comparative Testing, Cost Effectiveness, Counselor Role

Assessing Writing Skill. Research Monograph No. 11.

Breland, Hunter M.; And Others – 1987

Six university English departments collaborated in this examination of the differences between multiple-choice and essay tests in evaluating writing skills. The study also investigated ways the two tools can complement one another, ways to improve cost effectiveness of essay testing, and ways to integrate assessment and the educational process.…

Descriptors: Comparative Testing, Efficiency, Essay Tests, Higher Education

A Comparison of Two Instruments for Evaluating Composition.

Barter, Alice K.; And Others – 1980

A follow-up study of two instruments for evaluating college writing was conducted. The experimental scale (E Scale) was developed in 1976 and revised for this study. The control scale (C Scale) was described in the literature in 1977. Ten English majors graded ten essays from diagnostic entrance exams. Both the E Scale and the C Scale were used,…

Descriptors: College Entrance Examinations, Comparative Testing, Essay Tests, Evaluation Criteria

Practical Testing on Trial: A Study of the Reliability and Comparability of Results under Decentralized System of Practical Assessment.

Download full text

Goldstein, Harvey; Wolf, Alison – 1986

Locally developed occupational tests were administered to 16- and 17-year-olds in a government-sponsored vocational education program in the United Kingdom over a six-month period in 1984. Job skills were tested in two occupational areas: use of a micrometer and invoice completion. Some performance tests were designed by researchers and some by…

Descriptors: Comparative Testing, Criterion Referenced Tests, Evaluation Criteria, Foreign Countries