ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	6

Descriptor

Comparative Testing	9
Interrater Reliability	9
Test Validity	9
Test Reliability	5
Foreign Countries	4
Undergraduate Students	3
College Students	2
Evaluation Criteria	2
Evaluation Methods	2
Feedback (Response)	2
Interviews	2
Peer Evaluation	2
Robustness (Statistics)	2
Standardized Tests	2
Student Attitudes	2
Student Evaluation	2
Academic Standards	1
Alternative Assessment	1
Behavior Rating Scales	1
Calculus	1
Clinical Experience	1
College Faculty	1
College Freshmen	1
College Mathematics	1
Communicative Competence…	1
More ▼

Source

Advances in Physiology…	1
ELT Journal	1
Educational and Psychological…	1
European Journal of…	1
International Journal of…	1
Journal of Educational…	1
Physical Review Special…	1
Studies in Higher Education	1

Author

Alcock, Lara	1
Edinger, Jack D.	1
Hamid Mohammadi	1
Homer, Matthew S.	1
Jones, Ian	1
Kinicki, Angelo J.	1
Liow, Jong-Leng	1
Mark J. Gierl	1
Ole J. Kemi	1
Pell, Godfrey	1
Roberts, Trudie E.	1
Shiell, Ralph C.	1
Shohamy, Elana	1
Slepkov, Aaron D.	1
Tahereh Firoozi	1
Vosk, Barbara N.	1
More ▼

Publication Type

Journal Articles	8
Reports - Research	7
Reports - Evaluative	2
Speeches/Meeting Papers	1

Education Level

Higher Education	6
Postsecondary Education	4

Audience

Researchers

Location

Australia	1
Canada	1
Israel	1
United Kingdom (Leeds)	1

Laws, Policies, & Programs

Assessments and Surveys

Minnesota Multiphasic…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Comparison of Integrated Testlet and Constructed-Response Question Formats

Peer reviewed

Direct link

Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014

Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…

Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests

Peer Assessment without Assessment Criteria

Peer reviewed

Direct link

Jones, Ian; Alcock, Lara – Studies in Higher Education, 2014

Peer assessment typically requires students to judge peers' work against assessment criteria. We tested an alternative approach in which students judged pairs of scripts against one another in the absence of assessment criteria. First year mathematics undergraduates (N?=?194) sat a written test on conceptual understanding of multivariable…

Descriptors: Peer Evaluation, Evaluation Criteria, Alternative Assessment, Undergraduate Students

Assessor Training: Its Effects on Criterion-Based Assessment in a Medical Context

Direct link

Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E. – International Journal of Research & Method in Education, 2008

Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…

Descriptors: Medical Education, Standardized Tests, Clinical Experience, Criterion Referenced Tests

Peer Assessment in Thesis Oral Presentation

Peer reviewed

Direct link

Liow, Jong-Leng – European Journal of Engineering Education, 2008

Peer assessment has been studied in various situations and actively pursued as a means by which students are given more control over their learning and assessment achievement. This study investigated the reliability of staff and student assessments in two oral presentations with limited feedback for a school-based thesis course in engineering…

Descriptors: Feedback (Response), Student Evaluation, Grade Point Average, Peer Evaluation

The Clinical Validity of the MMPI-168.

Edinger, Jack D.; Vosk, Barbara N. – 1983

Of the many short forms of the Minnesota Multiphasic Personality Inventory (MMPI) that have been developed, the MMPI-168 is among the most promising. To determine whether clinical judgments based on the MMPI-168 are comparable to judgments based on the standard MMPI, 30 clinical psychologists participated in a randomized block, repeated treatment…

Descriptors: Comparative Testing, Diagnostic Tests, Interrater Reliability, Personality Measures

Behaviorally Anchored Rating Scales vs. Summated Rating Scales: Psychometric Properties and Susceptibility to Rating Bias.

Peer reviewed

Kinicki, Angelo J.; And Others – Educational and Psychological Measurement, 1985

Using both the Behaviorally Anchored Rating Scales (BARS) and the Purdue University Scales, 727 undergraduates rated 32 instructors. The BARS had less halo effect, more leniency error, and lower interrater reliability. Both formats were valid. The two tests did not differ in rate discrimination or susceptibility to rating bias. (Author/GDC)

Descriptors: Behavior Rating Scales, College Faculty, Comparative Testing, Higher Education

Introducing a New Comprehensive Test of Oral Proficiency.

Peer reviewed

Shohamy, Elana; And Others – ELT Journal, 1986

Describes a study of the development of a new oral proficiency test which could replace the existing English as a Foreign Language Oral Matriculation test in Israel. The deficiencies of the existing Oral Matriculation test are specified, and the components of the experimental test are described. (Author/SED)

Descriptors: Communicative Competence (Languages), Comparative Testing, English (Second Language), Foreign Countries