ERIC Number: ED656921
Record Type: Non-Journal
Publication Date: 2021-Sep-29
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
New Curriculum Efficacy Study and Lord's Paradox
Perman Gochyyev; Mark Wilson
Society for Research on Educational Effectiveness
Lord's paradox arises from the conflicting inferences obtained from two alternative approaches that are typically used in evaluating the treatment effect using a pre-post test design. The two main approaches for analyzing such data are: (1) to regress the change from pretest to posttest on the treatment indicator (change score approach; CS); (2) to regress posttest on treatment indicator and pretest (regressor variable approach; RV). Yet, these two approaches can yield conflicting results--hence the paradox. Lord (1967) warned of this problem decades ago and started a debate that continues to the present day. In this paper, we demonstrate that Lord's paradox can occur even when the both approaches are accounting for the measurement error in variables within the IRT framework. In an empirical example, we investigate, using both approaches, whether the treatment--a new mathematics curriculum--had an effect on student-level outcomes. The main research question in the study is whether the new mathematics curriculum ([BLINDED]) has an effect. Schools in the study were randomly assigned to either the treatment or the control condition and pre- and post-tests were administered to students at these schools before and after the treatment. The measure that was administered as both a pretest and posttest is the [BLINDED] measure developed by [BLINDED] in conjunction with the [BLINDED]. The composite scores (Statistical Reasoning) for each student were produced based on IRT analyses--using the Rasch model. The reliability results were very good for both pretest and posttest: for pretest, the EAP/PV person separation reliability was estimated at 0.89 and Cronbach's alpha was estimated at 0.84; for posttest, the EAP reliability and Cronbach's alpha were both estimated at 0.87. A common metric was established between the pretest and posttest through a third administration of the test by anchoring common items. Of the total number of 768 students, 406 are in the control group and 362 are in the treatment group. There were differences in the proportions of Hispanic and white students between the groups. There was initial random allocation of schools to treatment and control, but a number of schools opted out after they heard the randomization results. In particular, four schools assigned to the treatment group opted out from the study immediately after the randomization results were known, as well as three more schools a bit later. Five schools from the control group opted out right after the randomization results were announced. The reasons for opting out are unknown. Considering all the factors that can potentially undermine the virtues of randomization (e.g., dropouts of clusters after learning the randomization results), the design we ended up with may well be considered a nonequivalent control group design. In addition, due to a delayed IRB approval, a subset of students (n=173) from the treatment group were pre-tested after the treatment was initiated. Because of this late pretest, the pretest score for that subset of the treated students was not a "pretreatment variable" as had been planned. We first focus on the comparison of the remaining 189 treated students with the control group. And then, we compare the subset of n=173 students in the treatment group to the control group to investigate the effect of the partial treatment. In Table 1, we present estimated treatment effects when we: ignore the measurement error in the scores (regular regression of EAP scores on a treatment dummy), and attempt to account for the measurement error in these scores--by using latent regression. As shown in the Table 1, we find a statistically significant treatment effect when we compare scores using the CS approach: the treatment group is statistically significantly higher (at 0.01 level) than the control group for the composite score using the CS approach. The effect size is 0.28. The RV approach, however, does not indicate any statistically significant differences between the groups. Also note that there was a significant difference at pretest means--the mean pretest score of the treatment group was statistically significantly lower (at 0.01 level) than the mean pretest score of the control group. This is a different form of Lord's Paradox. Findings from the RV approach will tend to be closer to the findings from the CS approach for outcome variables for which there is no difference at the pretest. In addition to the mean difference at pretest between the groups, the exchangeability assumption is questionable here due to: (1)the imbalance between treatment and control groups on an important covariate (race); and (2) the withdrawal of clusters after learning the results of randomization. When it comes to dealing with an evaluation of this sort, the first question that needs to be answered is whether groups the researcher wants to compare are exchangeable with respect to the outcome of interest. For instance, it might be that Group A and Group B are exchangeable if the outcome of interest is the mean number of hours spent in the gym, and not exchangeable if the outcome of interest is the mean number of calories burned. Subject matter expertise is required to answer this question. In the study, the control and treatment groups differ on pretreatment covariates. The lack of the balance between groups at pretest might be a result of nonignorable withdrawal or another form of selection to/out of the treatment. Therefore, using the RV approach might be misleading--which we will elaborate further in the full paper. Lord's Paradox is considered "by far, the most difficult paradox to disentangle and requires clear thinking" (Wainer & Brown, 2007, p.25; in comparison to Simpson's Paradox and Kelley's Paradox). To the question of which approach to use, the safest answer is usually "it depends." Literature will greatly benefit from a careful examination of instances of such paradox--which we aim to provide.
Descriptors: Students, Mathematics Curriculum, Mathematics Instruction, Curriculum Development, Item Response Theory, Regression (Statistics), Error of Measurement, Test Reliability, Academic Achievement
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A