Publication Date
In 2025 | 2 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 26 |
Descriptor
Error of Measurement | 35 |
Evaluation Methods | 35 |
Reliability | 35 |
Validity | 11 |
Research Methodology | 9 |
Measurement Techniques | 8 |
Data Analysis | 7 |
Correlation | 6 |
Computation | 5 |
Generalizability Theory | 5 |
Measures (Individuals) | 5 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 28 |
Reports - Research | 15 |
Reports - Evaluative | 11 |
Reports - Descriptive | 5 |
Books | 1 |
Dissertations/Theses -… | 1 |
Guides - Non-Classroom | 1 |
Numerical/Quantitative Data | 1 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Audience
Researchers | 2 |
Students | 1 |
Location
United States | 2 |
Florida | 1 |
Louisiana | 1 |
New York | 1 |
North Carolina | 1 |
Pennsylvania | 1 |
Portugal | 1 |
Spain | 1 |
Tennessee | 1 |
Texas | 1 |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Race to the Top | 1 |
Assessments and Surveys
SAT (College Admission Test) | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025
The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…
Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis
Radu Bogdan Toma – Journal of Early Adolescence, 2024
The Expectancy-Value model has been extensively used to understand students' achievement motivation. However, recent studies propose the inclusion of cost as a separate construct from values, leading to the development of the Expectancy-Value-Cost model. This study aimed to adapt Kosovich et al.'s ("The Journal of Early Adolescence", 35,…
Descriptors: Student Motivation, Student Attitudes, Academic Achievement, Mathematics Achievement
Raykov, Tenko; Marcoulides, George A.; Li, Tenglong – Educational and Psychological Measurement, 2017
The measurement error in principal components extracted from a set of fallible measures is discussed and evaluated. It is shown that as long as one or more measures in a given set of observed variables contains error of measurement, so also does any principal component obtained from the set. The error variance in any principal component is shown…
Descriptors: Error of Measurement, Factor Analysis, Research Methodology, Psychometrics
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie – Reading and Writing: An Interdisciplinary Journal, 2017
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…
Descriptors: Writing Evaluation, Elementary School Students, Grade 3, Grade 4
Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie – Grantee Submission, 2017
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…
Descriptors: Writing Evaluation, Elementary School Students, Grade 3, Grade 4
Yuan, Ke-Hai; Zhang, Zhiyong; Zhao, Yanyun – Grantee Submission, 2017
The normal-distribution-based likelihood ratio statistic T[subscript ml] = nF[subscript ml] is widely used for power analysis in structural Equation modeling (SEM). In such an analysis, power and sample size are computed by assuming that T[subscript ml] follows a central chi-square distribution under H[subscript 0] and a noncentral chi-square…
Descriptors: Statistical Analysis, Evaluation Methods, Structural Equation Models, Reliability
Hansen, Michael; Lemke, Mariann; Sorensen, Nicholas – National Center for Analysis of Longitudinal Data in Education Research (CALDER), 2014
Teacher and principal evaluation systems now emerging in response to federal, state and/or local policy initiatives typically require that a component of teacher evaluation be based on multiple performance metrics, which must be combined to produce summative ratings of teacher effectiveness. Districts have utilized three common approaches to…
Descriptors: Teacher Evaluation, Measures (Individuals), Error of Measurement, Teacher Effectiveness
Pokropek, Artur – Sociological Methods & Research, 2015
This article combines statistical and applied research perspective showing problems that might arise when measurement error in multilevel compositional effects analysis is ignored. This article focuses on data where independent variables are constructed measures. Simulation studies are conducted evaluating methods that could overcome the…
Descriptors: Error of Measurement, Hierarchical Linear Modeling, Simulation, Evaluation Methods
Pelanek, Radek – Journal of Educational Data Mining, 2015
Researchers use many different metrics for evaluation of performance of student models. The aim of this paper is to provide an overview of commonly used metrics, to discuss properties, advantages, and disadvantages of different metrics, to summarize current practice in educational data mining, and to provide guidance for evaluation of student…
Descriptors: Models, Data Analysis, Data Processing, Evaluation Criteria
Pan, Tianshu; Yin, Yue – Psychological Methods, 2012
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…
Descriptors: Error of Measurement, Geometric Concepts, Tests, Structural Equation Models
Dory, Valerie; Gagnon, Robert; Charlin, Bernard – Advances in Health Sciences Education, 2010
Case-specificity, i.e., variability of a subject's performance across cases, has been a consistent finding in medical education. It has important implications for assessment validity and reliability. Its root causes remain a matter of discussion. One hypothesis, content-specificity, links variability of performance to variable levels of relevant…
Descriptors: Medical Education, Trainees, English (Second Language), Error of Measurement
Brandt, Lorilynn – ProQuest LLC, 2010
Phonics was identified as one of the critical components in reading development by the National Reading Panel. Over time, research has repeatedly identified phonics as important to early reading development. Given the compelling evidence supporting the teaching of phonics in early reading, it is critical to make sure that instructional decisions…
Descriptors: Generalizability Theory, Phonics, Early Reading, Validity
Ziegler, Albert; Ziegler, Albert – High Ability Studies, 2009
The aim of this paper is to demonstrate the dramatic consequences the application of cut-off points can have in the practice of identifying gifted individuals. The paradoxical attenuation effect describes the frequent situation in which measurements of the gifts and talents individuals possess are lower than their true values. However, in…
Descriptors: Gifted, Academic Achievement, Test Theory, Measurement