Publication Date
In 2025 | 4 |
Since 2024 | 7 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 18 |
Descriptor
Comparative Testing | 27 |
Error of Measurement | 27 |
Test Reliability | 9 |
Evaluation Methods | 8 |
Measurement Techniques | 6 |
Scores | 6 |
Test Validity | 6 |
Comparative Analysis | 5 |
Computer Assisted Testing | 5 |
Foreign Countries | 5 |
Scoring | 5 |
More ▼ |
Source
Author
Anderson, Dan | 1 |
Bejar, Isaac I. | 1 |
Bergstrom, Betty A. | 1 |
Buhr, Dianne C. | 1 |
Chang, Yu-Wen | 1 |
Chelsea M. Durber | 1 |
Cook, Thomas D. | 1 |
David-Paul Pertaub | 1 |
Davison, Mark L. | 1 |
Deborah Dewey | 1 |
Dessalegn Tekle | 1 |
More ▼ |
Publication Type
Journal Articles | 17 |
Reports - Research | 15 |
Reports - Evaluative | 10 |
Speeches/Meeting Papers | 4 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Education Level
Elementary Secondary Education | 5 |
Higher Education | 4 |
Postsecondary Education | 3 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
College Level Academic Skills… | 1 |
General Educational… | 1 |
Iowa Tests of Basic Skills | 1 |
National Assessment of… | 1 |
Wechsler Intelligence Scale… | 1 |
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Jiayi Deng – ProQuest LLC, 2024
Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…
Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement
Tülin Otbiçer Acar – Measurement: Interdisciplinary Research and Perspectives, 2024
The aim of this study is to compare the results of correlation coefficient estimation of reliability with those obtained through the Bland-Altman plot technique. The scale was first divided into two halves using three different approaches. A linear and high-level relationship was found between the scale scores obtained from the halved forms.…
Descriptors: High School Students, Measurement Techniques, Psychometrics, Comparative Testing
Edward G. J. Stevenson; Jil Molenaar; David-Paul Pertaub; Dessalegn Tekle – Field Methods, 2025
Is it possible to measure wealth and poverty across settings while being faithful to local understandings? The stages of progress method (SoP) attempts to do this by building ladders of wealth in locally relevant terms and using these in comparisons across groups. This approach is potentially useful among pastoralist populations where monetary…
Descriptors: Foreign Countries, Poverty, Social Mobility, Evaluation Methods
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Kelsey Harkness; Signe Bray; Chelsea M. Durber; Deborah Dewey; Kara Murias – Journal of Autism and Developmental Disorders, 2025
Attention and executive function (EF) dysregulation are common in a number of disorders including autism and attention-deficit/hyperactivity disorder (ADHD). Better understanding of the relationship between indirect and direct measures of attention and EF and common neurodevelopmental diagnoses may contribute to more efficient and effective…
Descriptors: Adolescents, Autism Spectrum Disorders, Attention Deficit Hyperactivity Disorder, Executive Function
Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024
Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…
Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing
Rusticus, Shayna A.; Lovato, Chris Y. – Practical Assessment, Research & Evaluation, 2014
The question of equivalence between two or more groups is frequently of interest to many applied researchers. Equivalence testing is a statistical method designed to provide evidence that groups are comparable by demonstrating that the mean differences found between groups are small enough that they are considered practically unimportant. Few…
Descriptors: Sample Size, Equivalency Tests, Simulation, Error of Measurement
Wing, Coady; Cook, Thomas D. – Journal of Policy Analysis and Management, 2013
The sharp regression discontinuity design (RDD) has three key weaknesses compared to the randomized clinical trial (RCT). It has lower statistical power, it is more dependent on statistical modeling assumptions, and its treatment effect estimates are limited to the narrow subpopulation of cases immediately around the cutoff, which is rarely of…
Descriptors: Regression (Statistics), Research Design, Statistical Analysis, Research Problems
Innes, Richard G. – Journal of School Choice, 2012
This article provides examples of how serious misconceptions can result when only "all student" scores from the National Assessment of Educational Progress (NAEP) are used for simplistic state-to-state comparisons. Suggestions for better treatment are presented. The article also compares Kentucky's eighth grade EXPLORE testing to NAEP…
Descriptors: National Competency Tests, Scoring, Misconceptions, Academic Achievement
Isenberg, Eric; Hock, Heinrich – Mathematica Policy Research, Inc., 2011
This report presents the value-added models that will be used to measure school and teacher effectiveness in the District of Columbia Public Schools (DCPS) in the 2010-2011 school year. It updates the earlier technical report, "Measuring Value Added for IMPACT and TEAM in DC Public Schools." The earlier report described the methods used…
Descriptors: Public Schools, Teacher Effectiveness, School Effectiveness, Models
Elosua, Paula; Iliescu, Dragos – International Journal of Testing, 2012
Psychometric practice does not always converge with the advances of psychometric theory. In order to investigate this gap, the authors focus on the 10 most used psychological tests in Europe, as identified by recent surveys. The article analyzes test manuals published in 6 different European countries for these 10 most used tests. A total of 32…
Descriptors: Psychological Testing, Personality Measures, Error of Measurement, Foreign Countries
Raymond, Mark R.; Neustel, Sandra; Anderson, Dan – Educational Measurement: Issues and Practice, 2009
Examinees who take high-stakes assessments are usually given an opportunity to repeat the test if they are unsuccessful on their initial attempt. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign a different test form to repeat examinees. The use of multiple…
Descriptors: Test Results, Test Items, Testing, Aptitude Tests
Setzer, J. Carl; He, Yi – GED Testing Service, 2009
Reliability Analysis for the Internationally Administered 2002 Series GED (General Educational Development) Tests Reliability refers to the consistency, or stability, of test scores when the authors administer the measurement procedure repeatedly to groups of examinees (American Educational Research Association [AERA], American Psychological…
Descriptors: Educational Research, Error of Measurement, Scores, Test Reliability
Kluge, Annette – Applied Psychological Measurement, 2008
The use of microworlds (MWs), or complex dynamic systems, in educational testing and personnel selection is hampered by systematic measurement errors because these new and innovative item formats are not adequately controlled for their difficulty. This empirical study introduces a way to operationalize an MW's difficulty and demonstrates the…
Descriptors: Personnel Selection, Self Efficacy, Educational Testing, Computer Uses in Education
Previous Page | Next Page »
Pages: 1 | 2