ERIC Number: ED677719
Record Type: Non-Journal
Publication Date: 2025-Oct-8
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Do Measurement Invariance Issues Play a Part in Explaining When Intervention Impacts Fade or Persist?
Emma R. Hart; Josh Gilbert; Drew H. Bailey; Tyler W. Watts; Ben Domingue
Society for Research on Educational Effectiveness
Background: While there is substantial heterogeneity in the extent to which educational intervention impacts persist, it has proven challenging to identify outcomes and intervention features that reliably explain variation (Hart et al., 2024). The degree to which intervention impacts capture "real" effects on the underlying construct that researchers hoped to change -- versus narrow and potentially "surface-level" changes potentially suggestive of teaching-to-the-test dynamics -- may be an important factor that affects which interventions generate enduring benefits (Bailey et al., 2020). Recent innovations in the application of differential item functioning (DIF) techniques have made it possible to examine the consistency with which interventions affect individual items that compose measures, potentially indicative of the "depth" versus "narrowness" of intervention effects, and to statistically downweight idiosyncratic item-level effects (Halpin & Gilbert, 2025). Whether the (in)consistency of item-level intervention impacts matters for impact persistence is an open question. Research Questions: The current meta-analysis explored whether intervention-impact DIF explained variation in intervention persistence rates across outcome assessments. Specifically, we set out to test whether accounting for measurement invariance issues when estimating intervention impacts affects the conditional persistence of initial intervention impacts. In preparation for SREE, we plan to examine a variety of additional questions outlined in a project pre-registration. Research Design: Individual item-response data came from the Gilbert meta-analysis (see Gilbert et al., 2024), publicly available in the Item Response Warehouse (Domingue et al., 2024). To be included in the current study, interventions had to be evaluated via: (1) randomized controlled trial (RCT) design, (2) with item-level data (3) reported at post-test and follow-up (4) on the same outcome measured consistently over time. Of the 73 RCTs in the dataset, nine met these criteria and were included in our analysis. The sample comprised diverse programs implemented internationally that targeted individuals across various phases of life. These programs included academic interventions, cognitive behavioral training programs, vocational trainings, and civic education. From these 9 studies, there were 13 outcomes for which item-level data was reported consistently at post-test and follow-up. Samples included 7,103 participants on average, though this ranged from 143 to 27,201. On average, follow-up assessments occurred 9 months after interventions ended (range = 1 to 12). Both cognitive (e.g., math, vocabulary) and social-emotional (e.g., depression, violent behaviors) outcomes were assessed. While diverse, the interventions shared an interest in generating persistent impacts on outcomes measured consistently across waves. Though the sample was small and our analyses were, thus, exploratory, this meta-analysis was nonetheless important given how little is known about the relation between impact DIF and impact persistence. We pre-registered our analysis and interpretation plans on OSF. Analysis We performed a preliminary analysis which we will expand on prior to the SREE conference. To examine the persistence of intervention impacts, we first ran a meta-regression models with study-level random effects, inverse variance weighting, and cluster-robust standard errors. Our models built on the following basic model: Level 1: ES[subscript fsi] = [beta][subscript 1]ES[subscript psi] + [epsilon] [subscript fsi]. Level 2: [beta][subscript 0s] = [gamma] [subscript 0] + [gamma] [subscript s]. Where ES indicated effect sizes in standardized units, f indicated the follow-up assessment wave, p indicated the post-test assessment wave, s indicated study, and i indicated each grouping of consistently measured outcomes. In these analyses, we considered all follow-up effects (collected on average 9 months after the intervention ended; range = 1 to 12 months). At level 2, [beta][subscript 1] was a random intercept for study. [beta][subscript 0] captured the proportion of the post-test impact that was expected to persist at follow-up ("conditional persistence") and ??$captured the average follow-up impact that was unexplained by initial effects on the same skill (see Hart et al., 2024). To examine the extent to which the conditional persistence varied by whether the impacts were adjusted for DIF ("DIF-Robust" impacts) or not ("DIF-Naïve" impacts, indicative of traditional approaches to effect size estimation), we ran this model using effects estimated using both methods (see Halpin & Gilbert, 2024) with a main effect for method and the interaction between post-test impact and method. Findings: Figure 1 depicts the descriptive trajectory of DIF-Naïve and DIF-Robust effects across time. Figure 2 and Table 1 present the regression-based results. Largely in line with the results in Hart et al. (2024), which relied on a much larger sample, post-test impacts generated without regard for DIF persisted at a rate of about 56%. Interestingly, effects estimated using the DIF-Robust method appeared to persist at a lower rate of 32%. The difference in rates was not statistically detectable (p = 0.13). For effects generated through both methods, a sizeable portion of the observed average follow-up effect ([approximately] 0.10 SD) was explained by intercept effects ([approximately] 0.04 to 0.08 SD depending on the model), suggesting that unknown mediational processes not captured by the post-test impacts contributed to observed follow-up effects. Conclusions: These preliminary results indicated that adjusting intervention impacts for DIF did not result in more persistent post-test intervention impacts. If anything, intervention impacts estimated using DIF-Robust techniques demonstrated more fadeout than DIF-Naïve effects. On the one hand, this pattern conflicts the expectation that consistent item-level impacts may reflect more "true" latent or trait-level impacts on the construct of interest -- not reflective of specific teach-to-the-test processes -- that are more likely to persist over time (Alvarez-Vargas et al., 2023; Halpin & Gilbert, 2024; Pages et al., 2022). On the other hand, greater persistence for DIF-Naïve effects, that do not down weight the contribution of DIF, may be consistent with the possibility of "true," yet narrow and specific, intervention impacts on discrete aspects of functioning that are perhaps less likely to develop under counterfactual conditions (Ahmed et al., 2024; Gilbert, Kim, & Miratrix, 2023; Rosengarten et al., 2024). In preparation for SREE, we will explore additional analyses to examine whether persistence varies according to the degree of divergence between DIF-Naïve and DIF-Robust effects, as well as the role of measurement type (researcher created versus standardized assessment), and measure difficulty.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A

Peer reviewed
Direct link
