NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 9 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Peer reviewed Peer reviewed
Direct linkDirect link
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Peer reviewed Peer reviewed
Direct linkDirect link
Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014
Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…
Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jones, Ian; Alcock, Lara – Studies in Higher Education, 2014
Peer assessment typically requires students to judge peers' work against assessment criteria. We tested an alternative approach in which students judged pairs of scripts against one another in the absence of assessment criteria. First year mathematics undergraduates (N?=?194) sat a written test on conceptual understanding of multivariable…
Descriptors: Peer Evaluation, Evaluation Criteria, Alternative Assessment, Undergraduate Students
Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E. – International Journal of Research & Method in Education, 2008
Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…
Descriptors: Medical Education, Standardized Tests, Clinical Experience, Criterion Referenced Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Liow, Jong-Leng – European Journal of Engineering Education, 2008
Peer assessment has been studied in various situations and actively pursued as a means by which students are given more control over their learning and assessment achievement. This study investigated the reliability of staff and student assessments in two oral presentations with limited feedback for a school-based thesis course in engineering…
Descriptors: Feedback (Response), Student Evaluation, Grade Point Average, Peer Evaluation
Edinger, Jack D.; Vosk, Barbara N. – 1983
Of the many short forms of the Minnesota Multiphasic Personality Inventory (MMPI) that have been developed, the MMPI-168 is among the most promising. To determine whether clinical judgments based on the MMPI-168 are comparable to judgments based on the standard MMPI, 30 clinical psychologists participated in a randomized block, repeated treatment…
Descriptors: Comparative Testing, Diagnostic Tests, Interrater Reliability, Personality Measures
Peer reviewed Peer reviewed
Kinicki, Angelo J.; And Others – Educational and Psychological Measurement, 1985
Using both the Behaviorally Anchored Rating Scales (BARS) and the Purdue University Scales, 727 undergraduates rated 32 instructors. The BARS had less halo effect, more leniency error, and lower interrater reliability. Both formats were valid. The two tests did not differ in rate discrimination or susceptibility to rating bias. (Author/GDC)
Descriptors: Behavior Rating Scales, College Faculty, Comparative Testing, Higher Education
Peer reviewed Peer reviewed
Shohamy, Elana; And Others – ELT Journal, 1986
Describes a study of the development of a new oral proficiency test which could replace the existing English as a Foreign Language Oral Matriculation test in Israel. The deficiencies of the existing Oral Matriculation test are specified, and the components of the experimental test are described. (Author/SED)
Descriptors: Communicative Competence (Languages), Comparative Testing, English (Second Language), Foreign Countries