ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	16
Since 2006 (last 20 years)	22

Descriptor

Accuracy	22
Error of Measurement	22
Evaluation Methods	22
Sample Size	8
Scores	7
Statistical Bias	7
Classification	6
Simulation	6
Sampling	5
Statistical Analysis	5
Comparative Analysis	4
Interrater Reliability	4
Item Response Theory	4
Monte Carlo Methods	4
Computation	3
Correlation	3
Data Analysis	3
Factor Analysis	3
Item Analysis	3
Regression (Statistics)	3
Test Bias	3
Computer Software	2
Educational Research	2
Error Correction	2
Evidence	2
More ▼

Source

Educational and Psychological…	3
ProQuest LLC	3
Society for Research on…	2
Structural Equation Modeling:…	2
AERA Online Paper Repository	1
Carnegie Foundation for the…	1
Grantee Submission	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Research on…	1
Language Testing	1
Measurement and Evaluation in…	1
Psicologica: International…	1
Sociological Methods &…	1
More ▼

Publication Type

Reports - Research	15
Journal Articles	14
Dissertations/Theses -…	3
Reports - Evaluative	3
Opinion Papers	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

High Schools	1
Higher Education	1
Secondary Education	1

Audience

Location

Netherlands	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

A Note on Standard Errors for Multidimensional Two-Parameter Logistic Models Using Gaussian Variational Estimation

Peer reviewed

Direct link

Jiaying Xiao; Chun Wang; Gongjun Xu – Grantee Submission, 2024

Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy (Cho et al., 2021). However, the SE estimation procedure has yet to…

Descriptors: Error of Measurement, Models, Evaluation Methods, Item Analysis

Temporal Misalignment in Intensive Longitudinal Data: Consequences and Solutions Based on Dynamic Structural Equation Models

Peer reviewed

Direct link

Xiaohui Luo; Yueqin Hu – Structural Equation Modeling: A Multidisciplinary Journal, 2024

Intensive longitudinal data has been widely used to examine reciprocal or causal relations between variables. However, these variables may not be temporally aligned. This study examined the consequences and solutions of the problem of temporal misalignment in intensive longitudinal data based on dynamic structural equation models. First the impact…

Descriptors: Structural Equation Models, Longitudinal Studies, Data Analysis, Causal Models

Comparing Mimic and Mimic-Interaction to Alignment Methods for Investigating Measurement Invariance Concerning a Continuous Violator

Peer reviewed

Direct link

Yuanfang Liu; Mark H. C. Lai; Ben Kelcey – Structural Equation Modeling: A Multidisciplinary Journal, 2024

Measurement invariance holds when a latent construct is measured in the same way across different levels of background variables (continuous or categorical) while controlling for the true value of that construct. Using Monte Carlo simulation, this paper compares the multiple indicators, multiple causes (MIMIC) model and MIMIC-interaction to a…

Descriptors: Classification, Accuracy, Error of Measurement, Correlation

Linear and Nonlinear Indices of Score Accuracy and Item Effectiveness for Measures That Contain Locally Dependent Items

Peer reviewed

Direct link

Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025

The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…

Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis

A Comparison of Type I Error and Power Rates in Procedures Used Determining Test Dimensionality

Peer reviewed
PDF on ERIC

Download full text

Guler, Gul; Cikrikci, Rahime Nukhet – International Journal of Assessment Tools in Education, 2022

The purpose of this study was to investigate the Type I Error findings and power rates of the methods used to determine dimensionality in unidimensional and bidimensional psychological constructs for various conditions (characteristic of the distribution, sample size, length of the test, and interdimensional correlation) and to examine the joint…

Descriptors: Comparative Analysis, Error of Measurement, Decision Making, Factor Analysis

Classification Consistency and Accuracy with Atypical Score Distributions

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020

The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…

Descriptors: Classification, Accuracy, Scores, Cutting Scores

Exploring Rating Quality in the Context of High-Stakes Rater-Mediated Educational Assessments

Direct link

Wenjing Guo – ProQuest LLC, 2021

Constructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees' work. The judgment of human…

Descriptors: National Competency Tests, Responses, Interrater Reliability, Error of Measurement

An Unbiased Estimate of Global Interrater Agreement

Peer reviewed

Direct link

Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017

Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…

Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy

Methods to Estimate the Variance of Some Indices of the Signal Detection Theory: A Simulation Study

Peer reviewed
PDF on ERIC

Download full text

Suero, Manuel; Privado, Jesús; Botella, Juan – Psicologica: International Journal of Methodology and Experimental Psychology, 2017

A simulation study is presented to evaluate and compare three methods to estimate the variance of the estimates of the parameters d and "C" of the signal detection theory (SDT). Several methods have been proposed to calculate the variance of their estimators, "d'" and "c." Those methods have been mostly assessed by…

Descriptors: Evaluation Methods, Theories, Simulation, Statistical Analysis

Multivariate Regression with Small Samples: A Comparison of Estimation Methods

Peer reviewed

Direct link

Finch, William Holmes; Hernandez Finch, Maria E. – AERA Online Paper Repository, 2017

High dimensional multivariate data, where the number of variables approaches or exceeds the sample size, is an increasingly common occurrence for social scientists. Several tools exist for dealing with such data in the context of univariate regression, including regularization methods such as Lasso, Elastic net, Ridge Regression, as well as the…

Descriptors: Multivariate Analysis, Regression (Statistics), Sampling, Sample Size

Estimating Causal Effects of Education Interventions Using a Two-Rating Regression Discontinuity Design: Lessons from a Simulation Study and an Application

Peer reviewed

Direct link

Porter, Kristin E.; Reardon, Sean F.; Unlu, Fatih; Bloom, Howard S.; Cimpian, Joseph R. – Journal of Research on Educational Effectiveness, 2017

A valuable extension of the single-rating regression discontinuity design (RDD) is a multiple-rating RDD (MRRDD). To date, four main methods have been used to estimate average treatment effects at the multiple treatment frontiers of an MRRDD: the "surface" method, the "frontier" method, the "binding-score" method, and…

Descriptors: Regression (Statistics), Intervention, Quasiexperimental Design, Simulation

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Cross-Battery Assessment? XBA PSW? A Case of Mistaken Identity: A Commentary on Kranzler and Colleagues' "Classification Agreement Analysis of Cross-Battery Assessment in the Identification of Specific Learning Disorders in Children and Youth"

Peer reviewed

Direct link

Flanagan, Dawn P.; Schneider, W. Joel – International Journal of School & Educational Psychology, 2016

When education works, it creates productive, innovative citizens eager to contribute to a well-functioning democracy. In contrast, educational failure has lifelong consequences, with some individuals experiencing decades of preventable hardship. Dawn Flanagan and Joel Schneider write in this response that, like Kranzler, Floyd, Benson, Zabowski,…

Descriptors: Learning Disabilities, Identification, Diagnostic Tests, Criticism

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Evaluating Bias of Sequential Mixed-Mode Designs against Benchmark Surveys

Peer reviewed

Direct link

Klausch, Thomas; Schouten, Barry; Hox, Joop J. – Sociological Methods & Research, 2017

This study evaluated three types of bias--total, measurement, and selection bias (SB)--in three sequential mixed-mode designs of the Dutch Crime Victimization Survey: telephone, mail, and web, where nonrespondents were followed up face-to-face (F2F). In the absence of true scores, all biases were estimated as mode effects against two different…

Descriptors: Evaluation Methods, Statistical Bias, Sequential Approach, Benchmarking

Previous Page | Next Page »

Pages: 1 | 2

Bardhoshi, Gerta	1
Ben Kelcey	1
Bloom, Howard S.	1
Botella, Juan	1
Cheema, Jehanzeb	1
Chun Wang	1
Cikrikci, Rahime Nukhet	1
Cimpian, Joseph R.	1
Citkowicz, Martyna	1
Cousineau, Denis	1
David Navarro-González	1
DeMars, Christine E.	1
Erford, Bradley T.	1
Fabia Morales-Vives	1
Finch, William Holmes	1
Flanagan, Dawn P.	1
Gongjun Xu	1
Guler, Gul	1
Hedges, Larry V.	1
Hernandez Finch, Maria E.	1
Hox, Joop J.	1
Jiaying Xiao	1
Keller, Bryan S. B.	1
Kim, Jee-Seon	1
Kim, Stella Y.	1
More ▼