NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)1
Since 2016 (last 10 years)12
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 12 results Save | Export
Jennifer Hill; George Perrett; Vincent Dorie – Grantee Submission, 2023
Estimation of causal effects requires making comparisons across groups of observations exposed and not exposed to a a treatment or cause (intervention, program, drug, etc). To interpret differences between groups causally we need to ensure that they have been constructed in such a way that the comparisons are "fair." This can be…
Descriptors: Causal Models, Statistical Inference, Artificial Intelligence, Data Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Tipton, Elizabeth; Fellers, Lauren; Caverly, Sarah; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Ruiz de Castilla, Veronica – Journal of Research on Educational Effectiveness, 2016
Recently, statisticians have begun developing methods to improve the generalizability of results from large-scale experiments in education. This work has included the development of methods for improved site selection when random sampling is infeasible, including the use of stratification and targeted recruitment strategies. This article provides…
Descriptors: Generalizability Theory, Site Selection, Experiments, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen – Advances in Health Sciences Education, 2017
The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…
Descriptors: Interviews, Scores, Generalizability Theory, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Volpe, Robert J.; Briesch, Amy M. – School Psychology Review, 2016
This study examines the dependability of two scaling approaches for using a five-item Direct Behavior Rating multi-item scale to assess student disruptive behavior. A series of generalizability theory studies were used to compare a traditional frequency-based scaling approach with an approach wherein the informant compares a target student's…
Descriptors: Scaling, Behavior Rating Scales, Behavior Problems, Student Behavior
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Dogan, C. Deha; Uluman, Müge – Educational Sciences: Theory and Practice, 2017
The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Descriptors: Comparative Analysis, Scoring Rubrics, Rating Scales, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal – Language Testing, 2016
A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Möller, Jens; Müller-Kalthoff, Hanno; Helm, Friederike; Nagy, Nicole; Marsh, Herb W. – Frontline Learning Research, 2016
The dimensional comparison theory (DCT) focuses on the effects of internal, dimensional comparisons (e.g., "How good am I in math compared to English?") on academic self-concepts with widespread consequences for students' self-evaluation, motivation, and behavioral choices. DCT is based on the internal/external frame of reference model…
Descriptors: Comparative Analysis, Comparative Testing, Self Concept, Self Concept Measures
Li, Dongmei; Yi, Qing; Harris, Deborah – ACT, Inc., 2017
In preparation for online administration of the ACT® test, ACT conducted studies to examine the comparability of scores between online and paper administrations, including a timing study in fall 2013, a mode comparability study in spring 2014, and a second mode comparability study in spring 2015. This report presents major findings from these…
Descriptors: College Entrance Examinations, Computer Assisted Testing, Comparative Analysis, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Cramer, Nicholas; Asmar, Abdo; Gorman, Laurel; Gros, Bernard; Harris, David; Howard, Thomas; Hussain, Mujtaba; Salazar, Sergio; Kibble, Jonathan D. – Advances in Physiology Education, 2016
Multiple-choice questions are a gold-standard tool in medical school for assessment of knowledge and are the mainstay of licensing examinations. However, multiple-choice questions items can be criticized for lacking the ability to test higher-order learning or integrative thinking across multiple disciplines. Our objective was to develop a novel…
Descriptors: Physiology, Pharmacology, Multiple Choice Tests, Cost Effectiveness
Peer reviewed Peer reviewed
Direct linkDirect link
Strietholt, Rolf; Scherer, Ronny – Scandinavian Journal of Educational Research, 2018
The present paper aims to discuss how data from international large-scale assessments (ILSAs) can be utilized and combined, even with other existing data sources, in order to monitor educational outcomes and study the effectiveness of educational systems. We consider different purposes of linking data, namely, extending outcomes measures,…
Descriptors: International Assessment, Group Testing, Outcomes of Education, Outcome Measures