NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 59 results Save | Export
Jennifer Hill; George Perrett; Vincent Dorie – Grantee Submission, 2023
Estimation of causal effects requires making comparisons across groups of observations exposed and not exposed to a a treatment or cause (intervention, program, drug, etc). To interpret differences between groups causally we need to ensure that they have been constructed in such a way that the comparisons are "fair." This can be…
Descriptors: Causal Models, Statistical Inference, Artificial Intelligence, Data Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Tipton, Elizabeth; Fellers, Lauren; Caverly, Sarah; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Ruiz de Castilla, Veronica – Journal of Research on Educational Effectiveness, 2016
Recently, statisticians have begun developing methods to improve the generalizability of results from large-scale experiments in education. This work has included the development of methods for improved site selection when random sampling is infeasible, including the use of stratification and targeted recruitment strategies. This article provides…
Descriptors: Generalizability Theory, Site Selection, Experiments, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen – Advances in Health Sciences Education, 2017
The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…
Descriptors: Interviews, Scores, Generalizability Theory, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Harrison, George M. – Journal of Educational Measurement, 2015
The credibility of standard-setting cut scores depends in part on two sources of consistency evidence: intrajudge and interjudge consistency. Although intrajudge consistency feedback has often been provided to Angoff judges in practice, more evidence is needed to determine whether it achieves its intended effect. In this randomized experiment with…
Descriptors: Interrater Reliability, Standard Setting (Scoring), Cutting Scores, Feedback (Response)
Peer reviewed Peer reviewed
Direct linkDirect link
Volpe, Robert J.; Briesch, Amy M. – School Psychology Review, 2016
This study examines the dependability of two scaling approaches for using a five-item Direct Behavior Rating multi-item scale to assess student disruptive behavior. A series of generalizability theory studies were used to compare a traditional frequency-based scaling approach with an approach wherein the informant compares a target student's…
Descriptors: Scaling, Behavior Rating Scales, Behavior Problems, Student Behavior
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Dogan, C. Deha; Uluman, Müge – Educational Sciences: Theory and Practice, 2017
The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Descriptors: Comparative Analysis, Scoring Rubrics, Rating Scales, Interrater Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Teker, Gulsen Tasdelen; Guler, Nese; Uyanik, Gulden Kaya – Educational Sciences: Theory and Practice, 2015
Generalizability theory (G theory) provides a broad conceptual framework for social sciences such as psychology and education, and a comprehensive construct for numerous measurement events by using analysis of variance, a strong statistical method. G theory, as an extension of both classical test theory and analysis of variance, is a model which…
Descriptors: Guidelines, Generalizability Theory, Computer Software, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Schweig, Jonathan David – Applied Measurement in Education, 2014
Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…
Descriptors: Error of Measurement, Educational Environment, Classroom Environment, Surveys
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal – Language Testing, 2016
A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Möller, Jens; Müller-Kalthoff, Hanno; Helm, Friederike; Nagy, Nicole; Marsh, Herb W. – Frontline Learning Research, 2016
The dimensional comparison theory (DCT) focuses on the effects of internal, dimensional comparisons (e.g., "How good am I in math compared to English?") on academic self-concepts with widespread consequences for students' self-evaluation, motivation, and behavioral choices. DCT is based on the internal/external frame of reference model…
Descriptors: Comparative Analysis, Comparative Testing, Self Concept, Self Concept Measures
Peer reviewed Peer reviewed
Direct linkDirect link
Yang, Yanyun; Oosterhof, Albert; Xia, Yan – Journal of Educational Research, 2015
The authors address the reliability of scores obtained on the summative performance assessments during the pilot year of our research. Contrary to classical test theory, we discussed the advantages of using generalizability theory for estimating reliability of scores for summative performance assessments. Generalizability theory was used as the…
Descriptors: Summative Evaluation, Comparative Analysis, Reliability, Scores
Li, Dongmei; Yi, Qing; Harris, Deborah – ACT, Inc., 2017
In preparation for online administration of the ACT® test, ACT conducted studies to examine the comparability of scores between online and paper administrations, including a timing study in fall 2013, a mode comparability study in spring 2014, and a second mode comparability study in spring 2015. This report presents major findings from these…
Descriptors: College Entrance Examinations, Computer Assisted Testing, Comparative Analysis, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Cramer, Nicholas; Asmar, Abdo; Gorman, Laurel; Gros, Bernard; Harris, David; Howard, Thomas; Hussain, Mujtaba; Salazar, Sergio; Kibble, Jonathan D. – Advances in Physiology Education, 2016
Multiple-choice questions are a gold-standard tool in medical school for assessment of knowledge and are the mainstay of licensing examinations. However, multiple-choice questions items can be criticized for lacking the ability to test higher-order learning or integrative thinking across multiple disciplines. Our objective was to develop a novel…
Descriptors: Physiology, Pharmacology, Multiple Choice Tests, Cost Effectiveness
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4