Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 6 |
Descriptor
| Comparative Testing | 10 |
| Interrater Reliability | 10 |
| Test Reliability | 10 |
| Test Validity | 5 |
| Evaluation Criteria | 4 |
| Foreign Countries | 4 |
| College Students | 3 |
| Evaluation Methods | 3 |
| Scoring | 3 |
| Criterion Referenced Tests | 2 |
| Essay Tests | 2 |
| More ▼ | |
Source
| Advances in Physiology… | 1 |
| Early Child Development and… | 1 |
| International Journal of… | 1 |
| Journal of Consulting and… | 1 |
| Journal of Educational… | 1 |
| Physical Review Special… | 1 |
| Studies in Higher Education | 1 |
Author
| Alcock, Lara | 1 |
| Barter, Alice K. | 1 |
| Breland, Hunter M. | 1 |
| Goldstein, Harvey | 1 |
| Hamid Mohammadi | 1 |
| Homer, Matthew S. | 1 |
| Jones, Ian | 1 |
| Korat, Ofra | 1 |
| Mark J. Gierl | 1 |
| O'Hara, Michael W. | 1 |
| Ole J. Kemi | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 8 |
| Journal Articles | 7 |
| Reports - Evaluative | 2 |
| Tests/Questionnaires | 2 |
| Books | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Higher Education | 5 |
| Postsecondary Education | 4 |
| Early Childhood Education | 1 |
Audience
| Researchers | 1 |
Location
| Canada | 1 |
| Israel | 1 |
| United Kingdom | 1 |
| United Kingdom (Leeds) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Hamilton Rating Scale for… | 1 |
| SAT (College Admission Test) | 1 |
| Student Descriptive… | 1 |
| Test of Standard Written… | 1 |
What Works Clearinghouse Rating
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014
Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…
Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests
Jones, Ian; Alcock, Lara – Studies in Higher Education, 2014
Peer assessment typically requires students to judge peers' work against assessment criteria. We tested an alternative approach in which students judged pairs of scripts against one another in the absence of assessment criteria. First year mathematics undergraduates (N?=?194) sat a written test on conceptual understanding of multivariable…
Descriptors: Peer Evaluation, Evaluation Criteria, Alternative Assessment, Undergraduate Students
Korat, Ofra – Early Child Development and Care, 2009
The relationship between mothers' and educators' evaluation of 75 children's emergent literacy levels and actual levels were investigated. Two groups of mothers participated: mothers with a low education and mothers with a high education. The children's emergent literacy was measured. The mothers evaluated their own children and 40 teachers…
Descriptors: Mothers, Emergent Literacy, Interrater Reliability, Mother Attitudes
Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E. – International Journal of Research & Method in Education, 2008
Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…
Descriptors: Medical Education, Standardized Tests, Clinical Experience, Criterion Referenced Tests
Peer reviewedO'Hara, Michael W.; Rehm, Lynn P. – Journal of Consulting and Clinical Psychology, 1983
Used the intraclass correlation coefficient to estimate the interrater reliability of judgments of clinician and novice raters of depressed females (N=20) who took the Hamilton Rating Scale for Depression (HRSD). Expert and student raters both made reliable ratings on the HRSD. Criterion validity for student raters was also satisfactory.…
Descriptors: College Students, Comparative Testing, Cost Effectiveness, Counselor Role
Breland, Hunter M.; And Others – 1987
Six university English departments collaborated in this examination of the differences between multiple-choice and essay tests in evaluating writing skills. The study also investigated ways the two tools can complement one another, ways to improve cost effectiveness of essay testing, and ways to integrate assessment and the educational process.…
Descriptors: Comparative Testing, Efficiency, Essay Tests, Higher Education
Barter, Alice K.; And Others – 1980
A follow-up study of two instruments for evaluating college writing was conducted. The experimental scale (E Scale) was developed in 1976 and revised for this study. The control scale (C Scale) was described in the literature in 1977. Ten English majors graded ten essays from diagnostic entrance exams. Both the E Scale and the C Scale were used,…
Descriptors: College Entrance Examinations, Comparative Testing, Essay Tests, Evaluation Criteria
Goldstein, Harvey; Wolf, Alison – 1986
Locally developed occupational tests were administered to 16- and 17-year-olds in a government-sponsored vocational education program in the United Kingdom over a six-month period in 1984. Job skills were tested in two occupational areas: use of a micrometer and invoice completion. Some performance tests were designed by researchers and some by…
Descriptors: Comparative Testing, Criterion Referenced Tests, Evaluation Criteria, Foreign Countries

Direct link
