The Product and Difference Fallaciesfor Indirect Effects(2/2)
Conclusion
This article has demonstrated that even when a linear model is appropriate at the individual level, and when control variables are sufficient to approximate randomized experiments, the product and difference heuristics can produce highly misleading estimates of the indirect effect. It was shown that stratification and the inclusion of interactions can ameliorate some of this problem, but unless we are willing to make untestable assumptions about the covariance terms in (8), (18), (19), and (24), we will be unable to identify the average indirect effects.
Furthermore, this article has demonstrated that by restricting inference to subpopulations, some of the assumptions underlying the estimation of indirect effects can be checked. In particular, by restricting the population of interest to the treated units, balance and overlap can be more easily achieved for the regression of Z on X (by pruning noncomparable control units), and balance and overlap can be more easily checked for the regression of Y on Z and X. Unfortunately, if the effect of X on Z is strong, it may be impossible to achieve balance and overlap for the regression of Y on Z and X.
The implications of this work for future research design are threefold. First, as demonstrated in this article, it is easy enough to stratify and include interactions so as to reduce the problems associated with effect heterogeneity and the untestable covariance terms. Furthermore, although linear models were used in this article in order to simplify presentation, the procedures described do not depend on the use of linear regression. See Pearl (2011) and Imai et al. (2010b) (with its associated R package [Imai et al. 2010c]) for approaches with nonlinear models.
Second, if the explanatory variable is binary, inference should be restricted to the treated (or control units)— even if only as a first step. It is especially important to assess overlap problems that may occur for the regression of Y on X and Z when the effect of X on Z is strong. If the explanatory variable is continuous, model dependence will likely be unavoidable, and this should be acknowledged.
Third, because randomization is not sufficient to identify average indirect effects (see Bullock et al. 2010; and Bullock and Ha 2011; Green et al. 2010; Robins and Greenland 1992; and Sobel 2008 for additional discussion), the analyst should explicitly acknowledge the additional assumptions implicit in the analysis of indirect effects. In this article, these assumptions were presented in terms of the covariances in (8), (18), (19), and (24). If these assumptions are suspect, then it may be necessary to perform a sensitivity analysis on the basis of these covariances. Recent work provides an alternative approach to sensitivity analysis and bounding (Imai et al. 2010a, 2010b, 2010c, 2010d).