Controlling for Measurement Error in Evaluations When Treatment Group Assignment Is Based on Noisy Measures.

Robert Meyer; Sara Hu; Michael Christian

Background: This paper develops a new method to estimate quasi-experimental evaluation models when it is necessary to control for measurement error in predictors and individual assignment to the treatment group is based on these same fallible variables. A major methodological finding of the study is that standard methods of estimating models that control for measurement error, such as the errors-in-variables method implemented in Stata and R (Fuller, 1987), yield severely biased parameter estimates. The model was developed in response to a school report card redesign initiative recently launched in a state. This initiative expanded the state school report card to focus on the growth performance of students in the bottom quartile of prior achievement in each school. This paper evaluates the impact of this gap-closing intervention and motivates development of the new estimation method using this application. Research Questions: Since student achievement is inevitably measured with error, the gap-closing model employs estimation methods that control for measurement error. The distinctive aspect of the intervention is that true prior achievement (corrected for measurement error) and the achievement quartile indicator are tightly connected yet need to be treated differently since the model specification requires measurement error control be applied to measured prior achievement, but not to the bottom quartile indicator which is based on measured prior achievement. RQ1: Does the standard errors-in-variables method of controlling measurement error yield consistent parameter estimates of the gap-closing model when assignment to the gap-closing initiative is based on measured prior achievement? RQ2: What alternative (newly developed) estimator yields consistent and efficient parameter estimates of the gap-closing model? What are the key features and assumptions of the alternative estimator? How robust is the estimator? RQ3: What is the impact of the gap-closing initiative overall and across schools? Setting, Participants, and Intervention: The gap-closing initiative expanded the state school report card to report on the growth performance of students in the bottom quartile of prior achievement in each school in the state. The names of these students are provided to each school with the expectation that schools will implement policies and practices to improve the achievement of these students, as measured by the end-of-year state assessment. The state publishes in the subsequent school report card data on the growth in student achievement in both the target and non-target groups for each school. Our research group was contracted to develop the methodology for appropriately estimating the gap-closing model and is producing estimates based on this model for inclusion in the state school report card in Summer 2023. Research Design: To address RQ1, the first part of the paper evaluates the standard errors-in-variables (EV) method of controlling measurement error using simulated Monte Carlo data. We demonstrate that the standard errors-in-variables (EV) method yields estimates of the gap-closing model that are severely biased. This bias arises from the fact that it is necessary to simultaneously treat the continuous pretest variable as measured with error (endogenous) and the bottom quartile indicator as not measured with error (exogenous). The EV estimator addresses the first need but not the second. In effect, pretest measurement is partly causal because students are assigned to the target bottom quartile group based on the test measured with error. To address RQ2, the second part of the paper develops an augmented errors-in-variables (AEV) estimator to allow for measurement error (endogeneity) in the continuous pretest predictor and no error (exogeneity) in the bottom quartile indicator. This estimator addresses this problem in two steps, using the same Monte Carlo data as used for RQ1. First, a measure of true prior achievement is constructed, given measured prior achievement and the other control variables included in the model, including demographic variables and school effects. The constructed measure replaces the fallible pretest variable. Second, the vector of predictor variables, including the fallible pretest variable, are used as instrumental variables (IV). This approach necessarily reverses the traditional IV method where a set of instrumental variables is used to "instrument" variables measured with error. One useful property of the AEV method is that it yields estimates that are identical to the EV method when the latter estimation method is appropriate. As a result, a data set constructed to allow application of the AEV method can also be used to estimate other, possibly, restricted models. We show that unlike the EV estimator it is essential that the AEV estimator explicitly allow for heteroscedasticity in measurement errors. We also consider model extensions that allow the relationship between post and pre-achievement to be nonlinear. RQ3 is addressed using the standard EV estimator and the new AEV estimator developed in RQ2. Data sources: The study uses simulated Monte Carlo data to evaluate the properties of the alternative estimation methods. The impact of the gap-closing initiative is evaluated using student-level data from a medium size state. The data set includes test scores and demographic variables for all students in the state, including students enrolled in traditional public and charter schools. Data is available from school years 2016-17 to 2022-23 (the latter available this Summer) and thus reflects student growth during three periods: pre-COVID, peak-COVID, and "post" COVID recovery. Findings and Conclusions: We have estimated the gap-closing model using state data for school years 2018-19 and 2021-22. Updated estimates will be provided using data from 2022-23. The state is particularly interested in the impact estimates from the most recent school year given the strong policy interest in spurring achievement for low-achieving students, a group that studies show experienced especially large learning losses during the pandemic. Comparing the estimates using the EV and AEV estimators based on the earlier data indicates that the bias of the overall effect size of gap-closing intervention is substantial, equal to an effect size of 0.37. A major methodological finding of the study is that standard methods of estimating models that control for measurement error yield severely biased parameter estimates. The new estimator may be applicable more generally to evaluations of education, social welfare. and health sciences interventions where individuals are assigned to treatments based on fallible measures.