ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	9

Descriptor

Evaluators	10
Models	10
Scores	10
Comparative Analysis	5
Test Items	4
Foreign Countries	3
Interrater Reliability	3
Item Analysis	3
Item Response Theory	3
Scoring	3
Accuracy	2
Computer Software	2
Correlation	2
English (Second Language)	2
Error of Measurement	2
Essays	2
Generalization	2
Goodness of Fit	2
Higher Education	2
Language Fluency	2
Language Tests	2
Licensing Examinations…	2
Measurement	2
Medical Students	2
Performance Based Assessment	2
More ▼

Source

ProQuest LLC	2
Advances in Health Sciences…	1
Applied Measurement in…	1
College Student Journal	1
ETS Research Report Series	1
Interpreter and Translator…	1
Language Testing	1
Society for Research on…	1

Publication Type

Reports - Research	7
Journal Articles	6
Dissertations/Theses -…	2
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	2
Elementary Education	1

Audience

Location

Hong Kong	1
Poland	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

High-Dimensional Explanatory Random Item Effects Models for Rater-Mediated Assessments

Peer reviewed
PDF on ERIC

Download full text

Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016

Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…

Descriptors: Test Items, Models, Evaluators, Longitudinal Studies

Are Interpreters Better Respeakers?

Peer reviewed

Direct link

Szarkowska, Agnieszka; Krejtz, Krzysztof; Dutka, Lukasz; Pilipczuk, Olga – Interpreter and Translator Trainer, 2018

In this study, we examined whether interpreters and interpreting trainees are better predisposed to respeaking than people with no interpreting skills. We tested 57 participants (22 interpreters, 23 translators and 12 controls) while respeaking 5-minute videos with two parameters: speech rate (fast/slow) and number of speakers (one/many). Having…

Descriptors: Translation, Comparative Analysis, Professional Personnel, Video Technology

Modeling Rater Effects and Complex Learning Progressions Using Item Response Models

Direct link

Shin, Hyo Jeong – ProQuest LLC, 2015

This dissertation is comprised of three papers that propose and apply psychometric models to deal with complexities and challenges in large-scale assessments, focusing on modeling rater effects and complex learning progressions. In particular, three papers investigate extensions and applications of multilevel and multidimensional item response…

Descriptors: Item Response Theory, Psychometrics, Models, Measurement

Investigation of Standardized Patient Ratings of Humanistic Competence on a Medical Licensure Examination Using Many-Facet Rasch Measurement and Generalizability Theory

Peer reviewed

Direct link

Zhang, Xiuyuan; Roberts, William L. – Advances in Health Sciences Education, 2013

Humanistic doctor-patient interaction has been measured for eight years using the Global Patient Assessment (GPA) tool in the national osteopathic clinical skills medical licensure examination. Standardized patients (SPs) apply the GPA tool to rate examinees' competence on doctor-patient communication, interpersonal skills, and professionalism.…

Descriptors: Licensing Examinations (Professions), Patients, Rating Scales, Evaluators

Creativity Processes of Students in the Design Studio

Peer reviewed

Direct link

Huber, Amy Mattingly; Leigh, Katharine E.; Tremblay, Kenneth R., Jr. – College Student Journal, 2012

The creative process is a multifaceted and dynamic path of thinking required to execute a project in design-based disciplines. The goal of this research was to test a model outlining the creative design process by investigating student experiences in a design project assignment. The study used an exploratory design to collect data from student…

Descriptors: Interior Design, Creativity, Creative Thinking, Evaluators

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Justifying the Use of a Second Language Oral Test as an Exit Test in Hong Kong: An Application of Assessment Use Argument Framework

Direct link

Jia, Yujie – ProQuest LLC, 2013

This study employed Bachman and Palmer's (2010) Assessment Use Argument framework to investigate to what extent the use of a second language oral test as an exit test in a Hong Kong university can be justified. It also aimed to help test developers of this oral test identify the most critical areas in the current test design that might need…

Descriptors: Test Use, Language Tests, Oral Language, Second Language Learning

Least-Squares Models to Correct for Rater Effects in Performance Assessment.

Download full text

Raymond, Mark R.; Viswesvaran, Chockalingam – 1991

This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…

Descriptors: Evaluators, Higher Education, Interrater Reliability, Least Squares Statistics

Breyer, F. Jay	1
Cox, Kyle	1
Dutka, Lukasz	1
Gierl, Mark J.	1
Huber, Amy Mattingly	1
Jia, Yujie	1
Kelcey, Ben	1
Krejtz, Krzysztof	1
Lee, Won-Chan	1
Leigh, Katharine E.	1
Lorenz, Florian	1
Pilipczuk, Olga	1
Raymond, Mark R.	1
Roberts, William L.	1
Shin, Hyo Jeong	1
Shin, Jinnie	1
Song, Yoon Ah	1
Szarkowska, Agnieszka	1
Tremblay, Kenneth R., Jr.	1
Viswesvaran, Chockalingam	1
Wang, Shanshan	1
Zhang, Mo	1
Zhang, Xiuyuan	1
More ▼