NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016
Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…
Descriptors: Test Items, Models, Evaluators, Longitudinal Studies
Peer reviewed Peer reviewed
Direct linkDirect link
Szarkowska, Agnieszka; Krejtz, Krzysztof; Dutka, Lukasz; Pilipczuk, Olga – Interpreter and Translator Trainer, 2018
In this study, we examined whether interpreters and interpreting trainees are better predisposed to respeaking than people with no interpreting skills. We tested 57 participants (22 interpreters, 23 translators and 12 controls) while respeaking 5-minute videos with two parameters: speech rate (fast/slow) and number of speakers (one/many). Having…
Descriptors: Translation, Comparative Analysis, Professional Personnel, Video Technology
Shin, Hyo Jeong – ProQuest LLC, 2015
This dissertation is comprised of three papers that propose and apply psychometric models to deal with complexities and challenges in large-scale assessments, focusing on modeling rater effects and complex learning progressions. In particular, three papers investigate extensions and applications of multilevel and multidimensional item response…
Descriptors: Item Response Theory, Psychometrics, Models, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Zhang, Xiuyuan; Roberts, William L. – Advances in Health Sciences Education, 2013
Humanistic doctor-patient interaction has been measured for eight years using the Global Patient Assessment (GPA) tool in the national osteopathic clinical skills medical licensure examination. Standardized patients (SPs) apply the GPA tool to rate examinees' competence on doctor-patient communication, interpersonal skills, and professionalism.…
Descriptors: Licensing Examinations (Professions), Patients, Rating Scales, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Huber, Amy Mattingly; Leigh, Katharine E.; Tremblay, Kenneth R., Jr. – College Student Journal, 2012
The creative process is a multifaceted and dynamic path of thinking required to execute a project in design-based disciplines. The goal of this research was to test a model outlining the creative design process by investigating student experiences in a design project assignment. The study used an exploratory design to collect data from student…
Descriptors: Interior Design, Creativity, Creative Thinking, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Jia, Yujie – ProQuest LLC, 2013
This study employed Bachman and Palmer's (2010) Assessment Use Argument framework to investigate to what extent the use of a second language oral test as an exit test in a Hong Kong university can be justified. It also aimed to help test developers of this oral test identify the most critical areas in the current test design that might need…
Descriptors: Test Use, Language Tests, Oral Language, Second Language Learning
Raymond, Mark R.; Viswesvaran, Chockalingam – 1991
This study illustrates the use of three least-squares models to control for rater effects in performance evaluation: (1) ordinary least squares (OLS); (2) weighted least squares (WLS); and (3) OLS subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The three models were applied to ratings obtained from four…
Descriptors: Evaluators, Higher Education, Interrater Reliability, Least Squares Statistics