Taking Arrows Seriously: Assessing Causality within Structural Equations with Latent Variables.

Steffen Erickson

Background: Structural Equation Modeling (SEM) is a powerful and broadly utilized statistical framework. Researchers employ these models to dissect relationships into direct, indirect, and total effects (Bollen, 1989). These models unpack the "black box" issues within cause-and-effect studies by examining the underlying theoretical mechanisms that generate outcomes. The models also enable researchers to use latent variables, which helps remove measurement error in observed variables. However, these models are constrained by their reliance on a strong set of assumptions about the causality linking the variables within an SEM. The act of drawing an arrow from one variable to another, whether that be latent or observed, represents a definitive hypothesis about a causal link. Yet, these causal hypotheses are seldom subjected to thorough testing in practice. Instead, models are estimated, global fit indices are examined, and models are refined until the observed data are described. The consequences of ignoring the strong assumptions required by SEM are numerous, ranging from the misspecification of measurement models and biased estimates of structural path coefficients to erroneous claims about causality. The Methodological Approach: This paper proposes an approach for assessing causality in SEM. The essence of the method lies in employing causal graphical models to identify the conditional independencies suggested by the models. It then demonstrates how each assumption may be evaluated in a structured approach. Although path diagrams and graphical models can both trace their origins to similar influences, such as Wright's Method of Path Coefficients (1934), the contemporary application of graphical models offers a rigorous approach for stating assumptions and identifying causal effects. With assumptions clearly stated using graphical models, I show that it becomes possible to probe -- if not directly test -- conditional independencies implied by both the structural and measurement components of SEMs. Strategies for evaluating these assumptions include the use of auxiliary regressions in recursive systems (Wold, 1960), model-implied instrumental variables (Bollen & Kerby, 2009), and a test designed to demonstrate that indicators of latent variables are independent of other variables conditional on the latent variable they measure (VanderWeele, 2022). Illustrative Example: To demonstrate the application of graphical models for stating assumptions and devising empirical tests, I apply this method to a longitudinal model that tracks pre-service teachers learning to do a multi-step teaching practice, called metacognitive modeling (Archer & Hughes, 2010). Teachers learned how to metacognitively model their approach to mathematical word problems using three sequential skills: they unpack the word problem, make self-instruction statements, then make self-regulation statements. The model is used to explain why teachers' skill quality differs initially, how a randomly assigned intervention improves the skills, and how using the skills in the classroom impacts classroom instruction. This example is particularly illustrative of the method, as the model makes strong assumptions about the skill development of teacher candidates. First, the model posits that skill development follows a stochastic process, wherein the quality of a skill in the current period is influenced solely by its quality in the previous period and any exogenous or predetermined variables present in the current period (Chetty et al., 2014). Furthermore, it assumes that teachers use skills sequentially within each period, which means that dependent skills cannot be acquired without first learning a predetermined skill (e.g., self-instruction cannot occur without first unpacking the word problem). An implication of this assumption is that exogenous variables only impact dependent skills indirectly through improving predetermined skills first (e.g., an intervention only improves self-instruction if teachers first improve unpacking word problems). Lastly, as skill quality is measured using latent variables, I assume that the indicators of these skills are independent of other variables in the system, given the latent variable. Figure 1 provides an annotated graphical model that formally describes these relationships. Analysis: During the estimation phase, I assess empirically the extent to which the assumptions required by the model are met, and the strength of these relationships in the SEM. Data were collected from an RCT evaluating the impact of a metacognitive model intervention on teacher candidates' skill development (Cohen J., & Jones N., 2022). The sample consisted of 146 teacher candidates, who were enrolled in two teacher preparation sites in the United States. Teacher candidates practiced metacognitive modeling during two practice sessions, one at baseline and another post intervention, and one during classroom observation that occurred post intervention. I began by specifying the theoretical model (Figure 1) as an SEM. The quality of the metacognitive modeling skills during rehearsals is measured using short, standardized performance tasks (see Figure 2), where trained observers rate recordings of candidates' modeling using a short rubric (see Figure 3). The quality of modeling skills during a classroom observation is also measured using the same rubric. Prior knowledge is assessed with the Mathematical Knowledge for Teaching Assessment (MKT), and classroom instruction quality is evaluated using the Mathematical Quality of Instruction Instrument. I next checked the conditional independencies implied by the model through auxiliary regressions, model-implied instrumental variables, and the structural rejection test proposed by VanderWeele (2022). The paper demonstrates how these tests can assist in interpreting the causal mechanisms implied by the model. Findings: Utilizing the diagnostic measures previously outlined, I determine that the assumptions within the causal graphical model are valid to varying degrees of certainty. Our analysis reveals that certain tests, especially those that capitalize on the randomly assigned intervention, offer more robust evidence in support of the model's assumptions. I conclude by interpreting the causal mechanisms depicted in the graph, providing significantly clearer guidelines for evaluating the validity of our claims. Contributions: This paper shows that interpreting SEMs causally is feasible, provided that stringent assumptions are satisfied. By initially defining the relationships between variables through a theoretically informed DAG, researchers can explicitly formulate their hypotheses regarding the causal connections among variables. They can also detail the rigorous assumptions necessary for a causal analysis of these relationships. Formalizing these assumptions offers researchers the chance to introduce empirical diagnostics to assess the validity of a causal interpretation of the SEM, including highlighting when such an interpretation is unjustified.