NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Type
Reports - Research15
Journal Articles11
Speeches/Meeting Papers3
Reports - Evaluative1
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Tülin Otbiçer Acar – Measurement: Interdisciplinary Research and Perspectives, 2024
The aim of this study is to compare the results of correlation coefficient estimation of reliability with those obtained through the Bland-Altman plot technique. The scale was first divided into two halves using three different approaches. A linear and high-level relationship was found between the scale scores obtained from the halved forms.…
Descriptors: High School Students, Measurement Techniques, Psychometrics, Comparative Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Edward G. J. Stevenson; Jil Molenaar; David-Paul Pertaub; Dessalegn Tekle – Field Methods, 2025
Is it possible to measure wealth and poverty across settings while being faithful to local understandings? The stages of progress method (SoP) attempts to do this by building ladders of wealth in locally relevant terms and using these in comparisons across groups. This approach is potentially useful among pastoralist populations where monetary…
Descriptors: Foreign Countries, Poverty, Social Mobility, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Kelsey Harkness; Signe Bray; Chelsea M. Durber; Deborah Dewey; Kara Murias – Journal of Autism and Developmental Disorders, 2025
Attention and executive function (EF) dysregulation are common in a number of disorders including autism and attention-deficit/hyperactivity disorder (ADHD). Better understanding of the relationship between indirect and direct measures of attention and EF and common neurodevelopmental diagnoses may contribute to more efficient and effective…
Descriptors: Adolescents, Autism Spectrum Disorders, Attention Deficit Hyperactivity Disorder, Executive Function
Peer reviewed Peer reviewed
Direct linkDirect link
Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024
Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…
Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Rusticus, Shayna A.; Lovato, Chris Y. – Practical Assessment, Research & Evaluation, 2014
The question of equivalence between two or more groups is frequently of interest to many applied researchers. Equivalence testing is a statistical method designed to provide evidence that groups are comparable by demonstrating that the mean differences found between groups are small enough that they are considered practically unimportant. Few…
Descriptors: Sample Size, Equivalency Tests, Simulation, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Wing, Coady; Cook, Thomas D. – Journal of Policy Analysis and Management, 2013
The sharp regression discontinuity design (RDD) has three key weaknesses compared to the randomized clinical trial (RCT). It has lower statistical power, it is more dependent on statistical modeling assumptions, and its treatment effect estimates are limited to the narrow subpopulation of cases immediately around the cutoff, which is rarely of…
Descriptors: Regression (Statistics), Research Design, Statistical Analysis, Research Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Innes, Richard G. – Journal of School Choice, 2012
This article provides examples of how serious misconceptions can result when only "all student" scores from the National Assessment of Educational Progress (NAEP) are used for simplistic state-to-state comparisons. Suggestions for better treatment are presented. The article also compares Kentucky's eighth grade EXPLORE testing to NAEP…
Descriptors: National Competency Tests, Scoring, Misconceptions, Academic Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
Elosua, Paula; Iliescu, Dragos – International Journal of Testing, 2012
Psychometric practice does not always converge with the advances of psychometric theory. In order to investigate this gap, the authors focus on the 10 most used psychological tests in Europe, as identified by recent surveys. The article analyzes test manuals published in 6 different European countries for these 10 most used tests. A total of 32…
Descriptors: Psychological Testing, Personality Measures, Error of Measurement, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Nugent, William R. – Educational and Psychological Measurement, 2006
One of the most important effect sizes used in meta-analysis is the standardized mean difference (SMD). In this article, the conditions under which SMD effect sizes based on different measures of the same construct are directly comparable are investigated. The results show that SMD effect sizes from different measures of the same construct are…
Descriptors: Effect Size, Meta Analysis, True Scores, Error of Measurement
Peer reviewed Peer reviewed
Bergstrom, Betty A.; Lunz, Mary E. – Evaluation and the Health Professions, 1992
The level of confidence in pass/fail decisions obtained with computerized adaptive tests and paper-and-pencil tests was greater for 645 medical technology students when the computer adaptive test implemented a 90 percent confidence stopping rule than for paper-and-pencil tests of comparable length. (SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Confidence Testing
Legg, Sue M.; Buhr, Dianne C. – 1990
Possible causes of a 16-point mean score increase for the computer adaptive form of the College Level Academic Skills Test (CLAST) in reading over the paper-and-pencil test (PPT) in reading are examined. The adaptive form of the CLAST was used in a state-wide field test in which reading, writing, and computation scores for approximately 1,000…
Descriptors: Adaptive Testing, College Entrance Examinations, Community Colleges, Comparative Testing
Bejar, Isaac I.; And Others – 1977
Information provided by typical and improved conventional classroom achievement tests was compared with information provided by an adaptive test covering the same subject matter. Both tests were administered to over 700 college students in a general biology course. Using the same scoring method, adaptive testing was found to yield substantially…
Descriptors: Academic Achievement, Achievement Tests, Adaptive Testing, Biology
Murchan, Damian P. – 1989
The reliability, content validity, and construct validity were compared for two test formats in a public examination used to assess a secondary school geography course. The 11-item geography portion of the Intermediate Certificate Examination (essay examination) was administered in June 1987 to 400 secondary school students in Ireland who also…
Descriptors: Achievement Tests, Comparative Testing, Construct Validity, Content Validity
Macpherson, Colin R.; Rowley, Glenn L. – 1986
Teacher-made mastery tests were administered in a classroom-sized sample to study their decision consistency. Decision-consistency of criterion-referenced tests is usually defined in terms of the proportion of examinees who are classified in the same way after two test administrations. Single-administration estimates of decision consistency were…
Descriptors: Classroom Research, Comparative Testing, Criterion Referenced Tests, Cutting Scores