ERIC - Search Results

Publication Date

In 2025	2
Since 2024	3
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	26

Descriptor

Error of Measurement	35
Evaluation Methods	35
Reliability	35
Validity	11
Research Methodology	9
Measurement Techniques	8
Data Analysis	7
Correlation	6
Computation	5
Generalizability Theory	5
Measures (Individuals)	5
Models	5
Psychometrics	5
Simulation	5
Factor Analysis	4
Goodness of Fit	4
Mathematics Achievement	4
Sampling	4
Scores	4
Statistical Analysis	4
Structural Equation Models	4
Academic Achievement	3
Data Collection	3
Elementary School Students	3
Foreign Countries	3
More ▼

Publication Type

Journal Articles	28
Reports - Research	15
Reports - Evaluative	11
Reports - Descriptive	5
Books	1
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1
Opinion Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	6
Elementary Education	4
Elementary Secondary Education	3
Early Childhood Education	2
Grade 3	2
Grade 4	2
Intermediate Grades	2
Postsecondary Education	2
Primary Education	2
High Schools	1
Secondary Education	1
More ▼

Audience

Researchers	2
Students	1

Location

United States	2
Florida	1
Louisiana	1
New York	1
North Carolina	1
Pennsylvania	1
Portugal	1
Spain	1
Tennessee	1
Texas	1

Laws, Policies, & Programs

Elementary and Secondary…	1
Race to the Top	1

Assessments and Surveys

SAT (College Admission Test)	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 35 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Linear and Nonlinear Indices of Score Accuracy and Item Effectiveness for Measures That Contain Locally Dependent Items

Peer reviewed

Direct link

Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025

The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…

Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis

Cultural Adaptation of the Expectancy-Value-Cost Scale for Spanish-Speaking Students

Peer reviewed

Direct link

Radu Bogdan Toma – Journal of Early Adolescence, 2024

The Expectancy-Value model has been extensively used to understand students' achievement motivation. However, recent studies propose the inclusion of cost as a separate construct from values, leading to the development of the Expectancy-Value-Cost model. This study aimed to adapt Kosovich et al.'s ("The Journal of Early Adolescence", 35,…

Descriptors: Student Motivation, Student Attitudes, Academic Achievement, Mathematics Achievement

On the Fallibility of Principal Components in Research

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Li, Tenglong – Educational and Psychological Measurement, 2017

The measurement error in principal components extracted from a set of fallible measures is discussed and evaluated. It is shown that as long as one or more measures in a given set of observed variables contains error of measurement, so also does any principal component obtained from the set. The error variance in any principal component is shown…

Descriptors: Error of Measurement, Factor Analysis, Research Methodology, Psychometrics

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Peer reviewed

Direct link

Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie – Reading and Writing: An Interdisciplinary Journal, 2017

We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…

Descriptors: Writing Evaluation, Elementary School Students, Grade 3, Grade 4

Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

Peer reviewed
PDF on ERIC

Download full text

Direct link

Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie – Grantee Submission, 2017

Descriptors: Writing Evaluation, Elementary School Students, Grade 3, Grade 4

Reliable and More Powerful Methods for Power Analysis in Structural Equation Modeling

Peer reviewed
PDF on ERIC

Download full text

Direct link

Yuan, Ke-Hai; Zhang, Zhiyong; Zhao, Yanyun – Grantee Submission, 2017

The normal-distribution-based likelihood ratio statistic T[subscript ml] = nF[subscript ml] is widely used for power analysis in structural Equation modeling (SEM). In such an analysis, power and sample size are computed by assuming that T[subscript ml] follows a central chi-square distribution under H[subscript 0] and a noncentral chi-square…

Descriptors: Statistical Analysis, Evaluation Methods, Structural Equation Models, Reliability

Combining Multiple Performance Measures: Do Common Approaches Undermine Districts' Personnel Evaluation Systems? CALDER Working Paper No. 118

Download full text

Hansen, Michael; Lemke, Mariann; Sorensen, Nicholas – National Center for Analysis of Longitudinal Data in Education Research (CALDER), 2014

Teacher and principal evaluation systems now emerging in response to federal, state and/or local policy initiatives typically require that a component of teacher evaluation be based on multiple performance metrics, which must be combined to produce summative ratings of teacher effectiveness. Districts have utilized three common approaches to…

Descriptors: Teacher Evaluation, Measures (Individuals), Error of Measurement, Teacher Effectiveness

Phantom Effects in Multilevel Compositional Analysis: Problems and Solutions

Peer reviewed

Direct link

Pokropek, Artur – Sociological Methods & Research, 2015

This article combines statistical and applied research perspective showing problems that might arise when measurement error in multilevel compositional effects analysis is ignored. This article focuses on data where independent variables are constructed measures. Simulation studies are conducted evaluating methods that could overcome the…

Descriptors: Error of Measurement, Hierarchical Linear Modeling, Simulation, Evaluation Methods

Metrics for Evaluation of Student Models

Peer reviewed
PDF on ERIC

Download full text

Pelanek, Radek – Journal of Educational Data Mining, 2015

Researchers use many different metrics for evaluation of performance of student models. The aim of this paper is to provide an overview of commonly used metrics, to discuss properties, advantages, and disadvantages of different metrics, to summarize current practice in educational data mining, and to provide guidance for evaluation of student…

Descriptors: Models, Data Analysis, Data Processing, Evaluation Criteria

The Relationship between Mean Square Differences and Standard Error of Measurement: Comment on Barchard (2012)

Peer reviewed

Direct link

Pan, Tianshu; Yin, Yue – Psychological Methods, 2012

In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…

Descriptors: Error of Measurement, Geometric Concepts, Tests, Structural Equation Models

Is Case-Specificity Content-Specificity? An Analysis of Data from Extended-Matching Questions

Peer reviewed

Direct link

Dory, Valerie; Gagnon, Robert; Charlin, Bernard – Advances in Health Sciences Education, 2010

Case-specificity, i.e., variability of a subject's performance across cases, has been a consistent finding in medical education. It has important implications for assessment validity and reliability. Its root causes remain a matter of discussion. One hypothesis, content-specificity, links variability of performance to variable levels of relevant…

Descriptors: Medical Education, Trainees, English (Second Language), Error of Measurement

Investigating the Reliability and Validity of the Consortium on Reading Excellence (CORE) Phonics Survey

Direct link

Brandt, Lorilynn – ProQuest LLC, 2010

Phonics was identified as one of the critical components in reading development by the National Reading Panel. Over time, research has repeatedly identified phonics as important to early reading development. Given the compelling evidence supporting the teaching of phonics in early reading, it is critical to make sure that instructional decisions…

Descriptors: Generalizability Theory, Phonics, Early Reading, Validity

The Paradoxical Attenuation Effect in Tests Based on Classical Test Theory: Mathematical Background and Practical Implications for the Measurement of High Abilities

Peer reviewed

Direct link

Ziegler, Albert; Ziegler, Albert – High Ability Studies, 2009

The aim of this paper is to demonstrate the dramatic consequences the application of cut-off points can have in the practice of identifying gifted individuals. The paradoxical attenuation effect describes the frequent situation in which measurements of the gifts and talents individuals possess are lower than their true values. However, in…

Descriptors: Gifted, Academic Achievement, Test Theory, Measurement

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational and Psychological…	3
Grantee Submission	2
Multivariate Behavioral…	2
Psychological Methods	2
Advances in Health Sciences…	1
Applied Psychological…	1
Assessment & Evaluation in…	1
Behavior Therapy	1
British Educational Research…	1
Developmental Medicine &…	1
Developmental Psychology	1
Evaluation and Program…	1
High Ability Studies	1
Journal of Early Adolescence	1
Journal of Educational Data…	1
Journal of Educational and…	1
Journal of Technology,…	1
Language Testing	1
Mathematica Policy Research,…	1
Measurement and Evaluation in…	1
National Assessment Governing…	1
National Center for Analysis…	1
ProQuest LLC	1
Psychometrika	1
Reading Research and…	1
More ▼

Raykov, Tenko	3
Al Otaiba, Stephanie	2
Gatlin, Brandy	2
Kim, Young-Suk Grace	2
Schatschneider, Christopher	2
Wanzek, Jeanne	2
Alonso, Ariel	1
Brandt, Lorilynn	1
Camilli, Gregory	1
Charlin, Bernard	1
Chiang, Hanley	1
Cohen, Patricia	1
Cole, David A.	1
Conley, Valerie	1
Courvoisier, Delphine S.	1
Daniel, Larry G.	1
David Navarro-González	1
De Cock, P.	1
Deklerck, J.	1
Desloovere, K.	1
Doran, Harold C.	1
Dory, Valerie	1
Eid, Michael	1
Fabia Morales-Vives	1
Ferrao, Maria	1
More ▼