ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	10

Descriptor

Error of Measurement	13
Evaluation Methods	13
Probability	13
Measurement Techniques	6
Simulation	6
Statistical Analysis	6
Data Analysis	4
Goodness of Fit	4
Computation	3
Item Analysis	3
Item Response Theory	3
Models	3
Sample Size	3
Sampling	3
Bayesian Statistics	2
Data Collection	2
Data Interpretation	2
Educational Research	2
Effect Size	2
Equated Scores	2
Evidence	2
Measurement	2
Scores	2
Statistical Data	2
Statistical Distributions	2
More ▼

Source

Educational and Psychological…	2
Applied Measurement in…	1
Applied Psychological…	1
International Journal of…	1
Journal of Educational Data…	1
Journal of Educational and…	1
Journal of Human Resources	1
National Center for Education…	1
National Center for Research…	1
ProQuest LLC	1
Society for Research on…	1
More ▼

Publication Type

Journal Articles	8
Reports - Research	6
Reports - Evaluative	4
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Grade 9	1
High Schools	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Researchers

Location

Germany

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 13 results Save | Export

The BASIE (BAyeSian Interpretation of Estimates) Framework for Interpreting Findings from Impact Evaluations: A Practical Guide for Education Researchers. Toolkit. NCEE 2022-005

Peer reviewed
PDF on ERIC

Download full text

Deke, John; Finucane, Mariel; Thal, Daniel – National Center for Education Evaluation and Regional Assistance, 2022

BASIE is a framework for interpreting impact estimates from evaluations. It is an alternative to null hypothesis significance testing. This guide walks researchers through the key steps of applying BASIE, including selecting prior evidence, reporting impact estimates, interpreting impact estimates, and conducting sensitivity analyses. The guide…

Descriptors: Bayesian Statistics, Educational Research, Data Interpretation, Hypothesis Testing

Improving Methods for Propensity Score Analysis with Mismeasured Variables by Incorporating Background Variables with Moderated Nonlinear Factor Analysis

Direct link

Greifer, Noah – ProQuest LLC, 2018

There has been some research in the use of propensity scores in the context of measurement error in the confounding variables; one recommended method is to generate estimates of the mis-measured covariate using a latent variable model, and to use those estimates (i.e., factor scores) in place of the covariate. I describe a simulation study…

Descriptors: Evaluation Methods, Probability, Scores, Statistical Analysis

Taking the Missing Propensity into Account When Estimating Competence Scores: Evaluation of Item Response Theory Models for Nonignorable Omissions

Peer reviewed

Direct link

Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Educational and Psychological Measurement, 2015

When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically…

Descriptors: Competence, Tests, Evaluation Methods, Adults

Evaluation of Two Types of Differential Item Functioning in Factor Mixture Models with Binary Outcomes

Peer reviewed

Direct link

Lee, HwaYoung; Beretvas, S. Natasha – Educational and Psychological Measurement, 2014

Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…

Descriptors: Item Analysis, Factor Structure, Bayesian Statistics, Goodness of Fit

Metrics for Evaluation of Student Models

Peer reviewed
PDF on ERIC

Download full text

Pelanek, Radek – Journal of Educational Data Mining, 2015

Researchers use many different metrics for evaluation of performance of student models. The aim of this paper is to provide an overview of commonly used metrics, to discuss properties, advantages, and disadvantages of different metrics, to summarize current practice in educational data mining, and to provide guidance for evaluation of student…

Descriptors: Models, Data Analysis, Data Processing, Evaluation Criteria

Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression

Peer reviewed
PDF on ERIC

Download full text

Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M. – Society for Research on Educational Effectiveness, 2013

Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…

Descriptors: Probability, Scores, Statistical Analysis, Statistical Bias

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839

Download full text

Cai, Li; Monroe, Scott – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2014

We propose a new limited-information goodness of fit test statistic C[subscript 2] for ordinal IRT models. The construction of the new statistic lies formally between the M[subscript 2] statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M*[subscript 2] statistic of Cai and Hansen…

Descriptors: Item Response Theory, Models, Goodness of Fit, Probability

Using the Kernel Method of Test Equating for Estimating the Standard Errors of Population Invariance Measures

Peer reviewed

Direct link

Moses, Tim – Journal of Educational and Behavioral Statistics, 2008

Equating functions are supposed to be population invariant, meaning that the choice of subpopulation used to compute the equating function should not matter. The extent to which equating functions are population invariant is typically assessed in terms of practical difference criteria that do not account for equating functions' sampling…

Descriptors: Equated Scores, Error of Measurement, Sampling, Evaluation Methods

Food Stamps and Food Insecurity: What Can Be Learned in the Presence of Nonclassical Measurement Error?

Peer reviewed

Direct link

Gundersen, Craig; Kreider, Brent – Journal of Human Resources, 2008

Policymakers have been puzzled to observe that food stamp households appear more likely to be food insecure than observationally similar eligible nonparticipating households. We reexamine this issue allowing for nonclassical reporting errors in food stamp participation and food insecurity. Extending the literature on partially identified…

Descriptors: Security (Psychology), Poverty, Family (Sociological Unit), Measurement Techniques

Type I Error Rates for Generalized Graded Unfolding Model Fit Indices

Peer reviewed

Direct link

DeMars, Christine E. – Applied Psychological Measurement, 2004

Type I error rates were examined for several fit indices available in GGUM2000: extensions of Infit, Outfit, Andrich's X(2), and the log-likelihood ratio X(2). Infit and Outfit had Type I error rates much lower than nominal alpha. Andrich's X(2) had Type I error rates much higher than nominal alpha, particularly for shorter tests or larger sample…

Descriptors: Likert Scales, Error of Measurement, Goodness of Fit, Psychological Studies

Multiple Evaluation: A New Testing Paradigm that Exorcizes Guessing

Peer reviewed

Direct link

Dirkzwager, Arie – International Journal of Testing, 2003

The crux in psychometrics is how to estimate the probability that a respondent answers an item correctly on one occasion out of many. Under the current testing paradigm this probability is estimated using all kinds of statistical techniques and mathematical modeling. Multiple evaluation is a new testing paradigm using the person's own personal…

Descriptors: Psychometrics, Probability, Models, Measurement

Survey Sampling of Community College Students: For Better or for Worse.

Download full text

Rasor, Richard E.; Barr, James – 1998

This paper provides an overview of common sampling methods (both the good and the bad) likely to be used in community college self-evaluations and presents the results from several simulated trials. The report begins by reviewing various survey techniques, discussing the negative and positive aspects of each method. The increased accuracy and…

Descriptors: Community Colleges, Comparative Analysis, Cost Effectiveness, Data Collection

Barr, James	1
Beretvas, S. Natasha	1
Cai, Li	1
Carstensen, Claus H.	1
DeMars, Christine E.	1
Deke, John	1
Dirkzwager, Arie	1
Finucane, Mariel	1
Greifer, Noah	1
Gundersen, Craig	1
Keller, Bryan S. B.	1
Kim, Jee-Seon	1
Kreider, Brent	1
Köhler, Carmen	1
Lee, HwaYoung	1
Monroe, Scott	1
Moses, Tim	1
Pelanek, Radek	1
Phillips, Gary W.	1
Pohl, Steffi	1
Rasor, Richard E.	1
Steiner, Peter M.	1
Thal, Daniel	1
More ▼