상세 컨텐츠

본문 제목

9. Sources of Leverage in Causal Inference: Toward an Alternative View of Methodology

Political Science

by 腦fficial Pragmatist 2023. 3. 31. 03:40

본문

반응형

David Collier, Henry E. Brady, and Jason Seawright

 

The challenge of identifying, assessing, and eliminating rival explanations is a fundamental concern in social research. The goal of this chapter is to synthesize the view of methodology offered in the present volume by considering further the contribution of alternative quantitative and qualitative tools in evaluating rival explanations.

 

We seek to clarify several methodological distinctions that are essential to understanding causal inference. We also propose a new distinction: between data-set observations and causal-process observations. Our discussion considers the contrasting, yet complementary, forms of inferential leverage provided by each type of observation. In the final section of the chapter, we offer some observations about balancing methodological priorities in the face of the ongoing technification of method and theory in many branches of the social sciences.

 

REVISITING SOME KEY DISTINCTIONS

 

Understanding the leverage for causal inference provided by different styles of research requires close attention to several basic distinctions. If these are not treated carefully, conclusions about alternative sources of leverage may be misleading.

 

Two broad contrasts are indispensable to the argument we seek to develop: between experiments and observational studies, and between mainstream quantitative methods and perspectives drawn from statistical theory. We then consider three other distinctions, involving more specific statistical issues: determinate versus indeterminate research designs, data mining vis-à-vis specification searches, and the assumptions of conditional independence versus the specification assumption. Readers may refer to the glossary for a compilation of the definitions we employ.

 

Experiments, Quasi-Experiments, Observational Studies, and Inferential Monsters

 

As is well known, in experiments analysts randomly assign cases to different treatments, that is, to different values of the key independent variable. In observational studies, by contrast, analysts observe the values that the independent variables acquire through the unfolding of political and social processes. For the purpose of evaluating rival explanations, the most fundamental divide in methodology is neither between qualitative and quantitative approaches, nor between small-N and large-N research. Rather, it is between experimental and observational data. All researchers know this, but they often do not give adequate attention to the severe inferential problems that arise with observational data. In addition to differing on the explanatory variables of interest, such real-world cases may also differ in many other ways that the researcher cannot measure and control for, and that can distort causal inference.

 

Concern with these severe inferential problems has led the econometrician Edward Leamer to underscore ‘‘the truly sharp distinction between inference from experimental and inference from non-experimental data. . . .’’ He points out that with the latter, ‘‘there is no formal way to know what inferential monsters lurk beyond our immediate field of vision’’ (Leamer 1983: 39).

 

Given this apparently sharp dichotomy between experimental and observational data, what are we to make of the intermediate, or hybrid category, the ‘‘quasi-experiment,’’ popularized by Campbell and Stanley (1963: 3464)? A quasi-experimental design is typically based on time-series data, involving a sequence of observations focused on the outcome being explained. At some point within this time series, an event, policy innovation, or other change occurs, and the analyst examines prior and subsequent values of the dependent variable in an effort to infer the impact of this event. This design is sometimes called an ‘‘interrupted time-series.’’ A stunning exemplar is Campbell and Ross’s (1968) study of the crackdown on speeding in the state of Connecticut. They explore the surprising difficulties of causal inference encountered in assessing the impact of this crackdown on death rates in automobile accidents. Many of the obstacles to good causal inference they consider are parallel to those confronted in experiments, which reinforces the idea that this design is in many ways like an experimenthence, quasi-experimental. Although the idea of quasiexperiments is strongly identified with Campbell, he subsequently had misgivings about this hybrid category. He recognized that the studies he had included in this category were actually observational studies, and that it had been misleading to suggest that there is an intermediate type between observational and experimental research. With characteristic humor and irony, Campbell suggests that:

 

It may be that Campbell and Stanley (1966) should feel guilty for having contributed to giving quasi-experimental designs a good name. There are program evaluations in which the authors say proudly, ‘‘We used a quasi-experimental design.’’ If responsible, Campbell and Stanley should do penance, because in most social settings, there are many equally or more plausible rival hypotheses. . . . (Campbell and Boruch 1975: 202)

 

The central legacy of Campbell’s work on these issues, as both Brady (7677 this volume) and Caporaso (1995: 459) emphasize, is Campbell’s insightful inventory of threats to validity in observational studies (Campbell and Stanley 1966: 56; Cook and Campbell 1979: 5155). This inventory points to the surprisingly large number of things that can go wrong in making causal inferences from what may initially appear to be relatively straightforward observational data.

 

These words of caution from both Leamer and Campbell are crucial in assessing KKV’s methodological framework. KKV provides recommendations for researchers engaged in observational studies, yet the book’s discussion of causation takes as a point of departure an experimental model. KKV employs the counterfactual definition of causation, grounded in the model of experiments introduced by Neyman (1990 [1923]), Rubin (1974, 1978), and Holland (1986). We think that this definition is indeed valuable in helping scholars to reason about causation as an abstract concept. However, Neyman, Rubin, and Holland intended their definition primarily for application to experimental research. They express skepticism about causal inference based on observational data (Rubin 1978; Holland 1986: 949), and their initial discussions of causation were only secondarily concerned with the challenges faced by researchers who use such data.4 An account of causal inference in the social sciences must explicitly consider obstacles to causal inference in observational studies and address their practical implications for research. Yet Brady (7375 this volume) is concerned that KKV does not adequately address these issues.

 

As Brady observes, KKV could have been more careful about distinguishing between the methodological strengths of experiments and those of quantitative observational studies. In fact, KKV sometimes seems to confound the tools relevant to experiments and those relevant to conventional quantitative research. For example, KKV is not clear enough in distinguishing between the independence assumption and conditional independence (Brady 7475 this volume; see also the discussion later in this chapter), the former being relevant to experiments, and the latter applying primarily to observational studies.

 

Relatedly, KKV offers a somewhat confusing statement about the relationship between randomization and the quantitative/large-N versus qualitative/small-N distinction.5 The book argues that:

 

Randomness in selection of units and in assigning values to explanatory variables is a common procedure used by some quantitative researchers working with large numbers of observations to ensure that the conditional independence assumption is met. . . . Unfortunately, random selection and assignment have serious limitations in small-n research. (KKV 115; see also 94)

 

In this statement, KKV overstates the role of random assignment in conventional quantitative research and in effect lumps together random selection and random assignment, thereby merging the characteristic strengths of experimental design and of quantitative analysis. The book thus comes too close to making it appear as if the main divide is between these two approaches, on the one hand, and small-N, qualitative studies, on the other.

 

Caporaso’s (1995: 459) commentary on KKV, by contrast, emphasizes the importance of sharply separating these two types of randomization: the random assignment carried out in most experiments, versus the random sampling that is often used in quantitative observational studies. Caporaso emphasizes that, while random assignment does indeed eliminate several challenges to causal inference, ‘‘[r]andom sampling does not solve the problems of drawing inferences when numerous causal factors are associated with outcomes’’ (1995: 459). Thus, large-N quantitative studieswhich rarely employ random assignmentare still left with the basic inferential problem faced by small-N studies.

 

In sum, experimental and observational studies are profoundly different. The traditions of scholarship discussed in the present volume are based on observational data; quantitative and qualitative researchers therefore face the same fundamental problems of inference. KKV’s effort to address the major inferential challenges of small-N, qualitative researchbased on the norms and practices of large-N, quantitative researchthus faces a major obstacle: Large-N, quantitative methods confront many of the same inferential challenges as qualitative observational studies. In important respects, quantitative researchers do not have strong tools for solving these dilemmas, as Bartels (8487 this volume) emphasizes above.

 

Mainstream Quantitative Methods versus Statistical Theory

 

Given that our basic concern is with challenges to causal inference that arise in analyzing observational data, where can we turn for help in identifying and dealing with these inferential monsters discussed by Leamer? This question points to the need to distinguish two alternative views of how effective quantitative analysis can be in achieving valid inference: first, the perspective of mainstream quantitative methods in political science, which is at times insufficiently attentive to the difficulty of using quantitative tools; and second, perspectives drawn from statistical theory, which sometimes express serious warnings about these tools.

 

Mainstream quantitative methods are a subset of applied statistics. In the years before the publication of KKV, a central focus in political science methodology was the refinement and application of regression analysis and related econometric techniques. This body of work has been influential across several social science disciplines, and it is a major source of KKV’s methodological advice. When commentators argue that KKV adopts a quantitative perspective, they should be understood as referring to mainstream quantitative methods in this sense. Chapter 2 above (e.g., table 2.1) seeks to provide a summary of KKV’s quantitative tools.

 

The main point, for present purposes, is that mainstream quantitative methods and important currents of thinking in statistical theory have adopted quite different perspectives on the feasibility of effectively eliminating rival hypotheses in observational studies through regression-based tools. Within political science, mainstream quantitative methods have been associated with the advocacy of quantitative approachestreating them as a set of research tools that provide superior leverage in both descriptive and causal inference. We view KKV as a clear expression of such advocacy, a form of advocacy that is also strongly reflected in the standards for ‘‘good research’’ applied by many political science departments and disciplinary journals.

 

By contrast, according to arguments that can be made from the standpoint of statistical theory, the superiority of quantitative methods is less clear. Such statistical arguments place far greater emphasis on the many assumptions and preconditions required to justify the use of specific quantitative tools, suggesting that these tools may often be inapplicable in observational research. As emphasized above, more skeptical norms about inference are also fundamental to the work of Campbell.

 

Statistical ideas quite distinct from those presented in KKV are also found in psychometrics and mathematical measurement theory (20, 12931 this volume). These fields offer valuable insights into concepts, the foundations of measurement, the complex assumptions required in justifying higher levels of measurement, and the contextual specificity of measurement claimsinsights that present a different picture than that offered by mainstream quantitative methodology.

 

Bayesian statistical analysis is likewise a relevant branch of statistical theory largely neglected by KKV,8 as McKeown (chap. 4 online) emphasizes. Ideas drawn from Bayesian analysis, which have recently come to be more widely used in political science methodology, provide tools for estimating uncertainty that are relevant for several problems of research design that KKV discusses.

 

For example, KKV argues that qualitative researchers are often better off not working with random samples. Yet many of the book’s statements in favor of estimating uncertainty would seem to rely on procedures for testing statistical significance originally designed for use with inferences from a random sample to a universe of cases. Unfortunately, extending significance tests to situations where the data are not a random sample from a larger universe may not be justified. As Freedman, Pisani, and Purves (2007: 556) put it, ‘‘[i]f a test of significance is based on a sample of convenience, watch out.’’ While significance tests can be an appropriate way to handle forms of randomness other than sampling error, Greene (2000: 147) argues that standard interpretations of statistical significance tests in such situations require that the test statistic be random. When the data are a random sample, this requirement is automatically satisfied; it may not be met under other circumstances. Overall, scholars should heed Freedman and Lane’s (1983) warning against using conventional significance tests as a general tool for estimating uncertainty.

 

Bayesian statistics definitely cannot solve all the problems of making descriptive and causal inferences with a nonrandom sample. Yet these tools do provide a framework for evaluating uncertainty that may sometimes allow researchers to incorporate more kinds of uncertainty, and more detailed information about the sampling process, than do traditional significance tests. Thus, while KKV’s emphasis on estimating uncertainty is laudable, this goal might be better accomplished using insights based on a Bayesian perspective.

 

Another reason a Bayesian perspective may be relevant for thinking about small-N research is that it systematizes a research strategy noted briefly by KKV (211): overcoming the small-N problem by situating smallN findings within a larger research program. Bayesian ideas help in reasoning about the relation between the findings of prior research and the insights generated by any given small-N study. As we have argued above, Bayesian analysis also provides tools for evaluating arguments about necessary and sufficient causation (14849 this volume), and thus specifically for improving the practice of qualitative research. In some of these situations, a full Bayesian framework, including formalization of prior beliefs about all parameters, may be quite useful. More generally, however, informal applications of the central Bayesian insightthat is, that inferences should be evaluated in light of the data and of prior knowledgecan provide a useful corrective to the sometimes inappropriate use of significance tests in causal inference.

 

Overall, from this wider perspective of statistical theory, the tools emphasized by KKV are properly seen as just one optionan option that perhaps needs to be approached with greater recognition of its limitations and of available alternatives. In order to further illustrate why such caution is needed, we now discuss two additional distinctions: between determinate and indeterminate research designs, and between data mining and specification searches.

 

Determinate versus Indeterminate Research Designs

 

In discussing the challenge of eliminating rival explanations, KKV distinguishes between ‘‘determinate’’ and ‘‘indeterminate’’ research designs. The book designates as ‘‘determinate’’ those designs that meet the standards of: (a) having a sufficient N in relation to the number of explanatory parameters being estimated, and (b) avoiding the problem that two or more explanatory variables are perfectly correlatedthat is, perfect multicollinearity (KKV 119, 150; see also 120). Meeting these standards gives the researcher stronger tools for adjudicating among rival hypotheses. By contrast, designs that fail to meet these standards are called ‘‘indeterminate’’ (11824, 145, 228). Such designs do not consider enough data to distinguish the causal impact of alternative independent variables, which is one aspect of the problem of unidentifiability (KKV 118 n. 1). As a consequence, the data under consideration are compatible with numerous interpretations. KKV goes so far as to state: ‘‘[a] determinate research design is the sine qua non of causal inference’’ (116). By contrast, for research designs that are indeterminate, ‘‘virtually nothing can be learned about the causal hypotheses’’ (118).

 

The distinction between a determinate and an indeterminate research design relies on the standard idea of the power of statistical tests. Discussions about the power of a test are useful for focusing on the degree to which the analysis is capable of rejecting the null hypothesis when that hypothesis is in fact false, under the assumption that the model is correct and only random error is at stake. This is a useful, but narrow, idea.

 

Correspondingly, we find the distinction between determinate and indeterminate research designs somewhat misleading. It is true that researchers must think carefully about the size of the N, given that it is the principal source of leverage in dealing with the issue of sampling error. Yet the size of the N is hardly the only source of inferential leverage, and sampling error is certainly not the only challenge to causal inference. Correspondingly, KKV’s distinction gives these specific concerns too much weight.

 

Further, it seems particularly inappropriate to argue that a determinate research design in this sense is the sine qua non of causal inference, whereas an indeterminate design contributes little. This claim can be seen as reifying the small-N problem, in the specific sense that it establishes a vivid dichotomy, in relation to which the small-N researcher is always on the wrong side.

 

The strong contrast that KKV draws between determinate and indeterminate research designs runs the risk of obscuring the broader, and much more important, contrast between experimental and observational studies discussed above. From this broader point of view, all inferences drawn from observational data share fundamental problems of alternative explanations and misspecified models. These problems pose a much greater challenge to the validity of causal inference than the problem of insufficient dataabove all the small-N problememphasized by the idea of a determinate research design. In the realm of observational studies, the conclusions drawn from research are always partial, uncertain, and dependent on meeting underlying analytical assumptions, as KKV (passim) acknowledges.

 

To put this another way, we find it problematic to suggest that any observational study can ever be ‘‘determinate,’’ given this term’s questionable implication that the ‘‘inferential monsters’’ to which Leamer refers can definitively be ruled out. We doubt they can. Further, if no observational research design is ever really determinate, then the concept of an indeterminate research design is also misleading when applied to observational studies. All such studies can be understood as involving indeterminate research designs. For this reason, we suggest avoiding the distinction between determinate and indeterminate research designs, while recognizing the issues raised as an unavoidable aspect of the larger problem of identifiability in research design.

 

In addition, we are concerned that KKV’s use of the label ‘‘determinate research design’’ focuses attention on issues of identifiability to an extent that implicitly advocates an inversion of what we see as the most productive relationship between theory and testing. Avoiding multicollinearity and large numbers of explanatory variables vis-à-vis the N are obviously important for regression analysis, and such issues should be a concern in small-N analysis as well. However, an excessive focus on these objectives may push analysts toward redesigning theory to be conveniently testable, instead of searching for more rigorous tests of the theories that scholars actually care about.

 

We would argue that, in situations where researchers are trying to test welldeveloped theories against clear alternative explanations, adopting an approach to testing that first requires modifications of the theories in question gives up a lot. In such circumstances, it is usually best to establish the testing requirements in light of the theory and the relevant alternative explanations: only in this way can we effectively adjudicate among these alternatives. If a hypothesis is difficult to test against the relevant alternative hypotheses with the existing data, then the best approach is to find new data and new approaches to testing, not to modify the hypotheses until it is easy to test them. Hence, to reiterate, the term ‘‘determinate’’ emphasizes the standards of identifiability and statistical power in a way that can distract analysts from testing the theories that often motivate research to begin with.

 

Rather than evaluating research designs as being determinate or indeterminate, it may be more productive to ask a broader question: Are the findings and inferences yielded by a given research design interpretable, in that they can plausibly be defended? The interpretability of findings and inferences can be increased by many factors, including a larger N, a particularly revealing comparative design, a rich knowledge of cases and context, well-executed conceptualization and measurement, or an insightful theoretical model. If the research question has been modified in order to make it more testable, then the findings may be less interpretable in relation to the original research question, and inferential leverage has probably been lost, not gained. This focus on interpretable findings broadens KKV’s idea of a determinate research design by recognizing multiple sources of inferential leverage.

 

Data Mining versus Specification Searches

 

Many researchers seek to evaluate competing explanations through intensive analysis of their data; however, this practice often raises the concern that researchers have engaged in ‘‘data mining’’ (KKV 174) or ‘‘data snooping’’ (Freedman, Pisani, and Purves 2007: 547) and have thereby exhausted the inferential leverage provided by the data. If researchers try out enough different combinations of explanatory variables, they will eventually find one that fits the dataeven if the data are random. Data mining is therefore seen as an undesirable research practice that weakens causal inference. Concerns about different forms of this problem recur in the guidelines, presented in chapter 2 above, that summarize KKV’s framework. Guideline no. 27 is concerned with the problem that researchers run ‘‘regressions or qualitative analyses with whatever explanatory variables [they] can think of’’ (KKV 174). No. 34, the injunction to test theory with data other than that used to generate the theory, and no. 35, the recommendation that theory should generally not be reformulated after analyzing the data, also address concerns related to data mining.

 

We find it striking that the related, partially inductive, econometric practice of ‘‘specification searches’’ is, by contrast, viewed favorably by methodologists as an unavoidable step in making causal inferences from observational data. The literature on specification searches has proposed systematic approaches to the iterated process of fitting what are inevitably incomplete models to data. The main ideas in this literature implicitly point to the dilemma that treating these inductive practices as a problem can be misleading, if not counterproductive, in establishing criteria for good research. Such a dilemma can be seen, first of all, in quantitative research that uses complex explanatory models. In the social sciences, such models are virtually never sufficiently detailed to tell us exactly what should be in the regression equation. Scholars who wish to test these models are forced to make decisions about the underspecified elements of the model and, in actual practice, they almost never stop after running the first regression that seems reasonable to them. It is the myth that these multiple tests do not occur that leads Leamer to worry about ‘‘the fumes which leak from our computer labs’’ (1983: 43). Rather than pretending that they do not occur, Freedman, Pisani, and Purves specifically urge analysts to report ‘‘how many tests they ran before statistically significant [results] turned up’’ (2007: 547).

 

Because we usually do not know the correct specification of a model, stopping with the first specification is methodologically problematic, just as it would be unjustified to stop with the specification that most favors the working hypothesis. The methodology of specification searches is concerned with systematic procedures for deciding where to start, when to stop, how to report the steps in between, and when we should believe the results of this overall process. Some scholars present elaborate justifications for beginning with the simplest plausible model and then engaging in ‘‘fragility testing’’ or ‘‘sensitivity analysis’’ by adding variables that may change the coefficients of interest (Leamer 1983: 4042; 1994 [1986]; Levine and Renelt 1992). Other scholars work from the other side: they begin with the most elaborate plausible model and eliminate elements of the model that prove to have little explanatory power (Hendry 1980; Hendry and Richard 1982; White 1994; see Granger 1990 for statements from both sides of this debate). These two approaches both use induction to test the plausibility of findings under divergent sets of methodological assumptions. The specification searches literature thus takes a position on induction that is radically different from the simple mandate not to reformulate theory after looking at the data.

 

The idea of specification searches is, of course, just one facet of a much larger concern with the inductive component of research. Both quantitative and qualitative researchers routinely adjust their theories in light of the dataoften without taking the further step of moving to new data sets in order to test the modified theory. Whether this inductive component involves completely overturning previous models or refining them in the margins, such inductive practices are widely recognized as an essential part of research. For example, Ragin and Munck (chaps. 2 and 3 online) devote extensive attention to procedures for inductive analysis.

 

To conclude, data mining can certainly be a problem. Yet the misleading pretense that they are not routinely utilized, and even worse, the indiscriminate injunction against inductive procedures, is at least as big a problem in social research.

 

Conditional Independence or the Specification Assumption

 

Two alternative formulations of key assumptions underlying causal inference are the assumption of conditional independence and the specification assumption. The issue here is how to conceptualize and label the set of assumptions used to justify causal inference based on observational data. Rather than conceptualizing the most important of these several assumptions in terms of conditional independencethe concept employed by KKVwe find it productive to frame these issues in terms of the specification assumption. In discussing the choice between these alternative overarching concepts, it is essential to recognize that they are fundamentally similar. Given this similarity, this section conveys a suggestion, and simultaneously sounds a note of caution, about the focus and emphasis entailed in these alternative assumptions.

 

Our basic point in the discussion that follows is that, while the assumption of conditional independence is rooted in an analogy to experiments, the specification assumption more directly reflects the situation of a researcher seeking to analyze observational data. For this reason, we find the specification assumption to be more helpfulat the same time that we recognize the underlying similarities between the two assumptions.

 

As discussed in greater depth in chapter 2, the assumption of conditional independence builds on an analogy involving a counterfactual understanding of causation and treats every causal inference as a partial approximation of an ideal experiment. For the purpose of explicating the contrast with the specification assumption, in this section we briefly summarize conditional independence. We begin by discussing the basic thought experiment behind the idea of conditional independence, which serves as the foundation for introducing the assumption of ‘‘independence of assignment and (potential) outcomes.’’ We use this assumption in defining conditional independence, and we then discuss why it is particularly relevant for observational studies. In comparison with the discussion in chapter 2, our goal here is particularly to discuss the range of issues that are highlighted by these conceptualizations, rather than to present the more general framework they represent.

 

The assumption of conditional independence posits that each case can be understood as having a value (which may or may not actually be observedhence, this is in effect a hypothetical variable) on an outcome variable, Yt, that reflects the outcome that case would experience if given an experimental treatment; and likewise a value (which, again, may or may not be observed) on a second variable, Yc, that reflects the outcome the case would experience if it were the control in an experiment. The causal effect of the treatment relative to the control for this case is the (hypothetical) difference between its values on these two variables.

 

In the real world, even in randomized experiments, the value of only one of these variables can actually be observed for each case at any point in time. Through some process (i.e., through randomization in experiments, or, in an observational study, through a real-world process that may or may not be known to the researcher), any given case is, in effect, assigned either the treatment, or the control. A given case cannot simultaneously be assigned to both. For example, an individual can either be exposed to a political message, or not be exposed to it; or a democratic country can either use proportional representation to elect its officials, or use some other electoral method.

 

Because we cannot empirically observe what would have happened to the same individual or country at any one point in time both with and without the treatment, causal inference routinely relies on real-world comparisons of cases that receive the treatment with other cases that do not receive the treatment. The comparison of these observed treated cases with the observed control cases substitutes for the hypothetical comparison of each case with and without the treatment. Comparing two real-world groups of cases that do and do not receive the treatment yields a good causal inference, provided that these two groups are similar in the sense that both have the same mean values of the (hypothetical) variable Yt, and also the same mean values of the (hypothetical) variable Yc. With a large enough sample, randomization of assignment, as in a well-designed experiment, ensures that this condition will be met.

 

With observational data, however, this standard, which is called independence of assignment and outcome, is usually not met. Furthermore, there is no way to test whether independence is satisfiedbecause only Yt or Yc, but not both, is observed for each case. Although we can calculate the mean value of Yt for the cases that are actually assigned to the treatment, we cannot do so for the cases assigned to the control. Similarly, although we can calculate the mean value of Yc for the cases assigned to the control, we cannot do so for the cases that are assigned to the treatment. Consequently, we cannot know if the treatment cases would have had the same average on Yc (if they had been assigned to the control) as the cases that were actually assigned to the control. Further, we cannot establish whether the control cases would have had the same average on Yt (if they had been assigned to the treatment) as the cases that were actually assigned to the treatment. In short, no test will allow us to establish whether the standard of independence holds for a given set of cases.

 

The assumption of conditional independence becomes relevant if this criterion of independence is not met. Conditional independence means that there is another variable or set of variables, which serve as ‘‘statistical controls,’’ such that by controlling foror conditioning onthese variables, the treatment group and the control group come to have the same mean values on both Yt and Yc. If the researcher uses quantitative techniques that control for these variables, such as stratification, conditional independence is thereby satisfied and an important criterion for good causal inference has been met. In effect, by introducing statistical controls into the analysis and then assuming conditional independence, the researcher turns the observational study into something akin to an experiment. However, it is obviously vital to remember that the assumption of conditional independence, like the assumption of independence, is hard to test.

 

Unlike conditional independence, which is rarely mentioned in econometrics textbooks, the specification assumption is frequently discussed in econometric and statistical work on regression analysis. The specification assumption has the major advantage that it starts with what is typically the actual situation of the researcherthat is, having an explanatory model of unknown usefulnessand then specifies the criteria that must be met to move in the direction of causal inference. The name of this assumption refers directly to this process of specification.

 

Thus, the starting point for the specification assumption is not the metaphor of an experiment, but rather the model that researchers use to organize their hypotheses. In the simplest case, this model consists of a dependent variable and a set of independent variables in a single regression equation. More generally, it may explicitly include an equation for the process of assignment to treatment, as well as for the outcome variable. The specification assumption focuses attention on what must be trueconcerning the relationships between the included explanatory variables and the unobserved error terms in the modelin order to make unbiased inferences about the strength of the associations predicted by these relationships.

 

In the context of a regression model, the specification assumption is the claim that the included independent variables are statistically unrelated to the error term that derives from a (hypothetical) comparison between the regression model and the true causal equation. One major threat to the specification assumption is omitting variables that ought to be includedand therefore relegating the effects of those variables to the error term, sometimes producing missing variable bias (the central, direct concern of conditional independence). A second major threat is including variables that are endogenous, that is, are statistically related to the part of the dependent variable that is not caused by the included variables. Including such variables that have a direct connection with the error term yields endogeneity bias. When a model has either of these problems, the estimated causal effects of the included variables will be biased because the included variables will stand in for (or proxy for) either missing variables or the error term.

 

A further benefit of discussing these issues in terms of the specification assumptionin addition, as noted above, to focusing attention more directly on the actual situation of the researcheris that this term is directly linked to other standard methodological labels: model specification, specification error, specification analysis, the specification problem, misspecification, and specification searches.

 

While we believe that the framework of the specification assumption brings basic issues of causal inference into sharper focus, it also has a major limitationwhich it shares with the assumption of conditional independence. Both assumptions are hard to test, and no analyst can ever prove that an observational study meets either assumption. Leamer’s inferential monsters may always be lurking beyond the researcher’s immediate field of vision. This is one of the reasons why, in order to supplement correlationbased causal inference, scholars turn to alternative sources of inferential leverage such as experiments or causal-process observations.

 

To reiterate the point made at the start of this section, our argument here is neither that the assumption of conditional independence is misleading in any fundamental sense, nor that meeting the specification assumption solves all problems of causal inference. Rather, we believe that the analogy behind conditional independence may focus too much attention on control variables as a solution to problems of causal inference based on observational data. By contrast, the specification assumption focuses more directly on problems of endogeneity and misspecified relationships among measured variables, as well as other inadequacies of our causal models.

 

Taken together, our observations about these five distinctions considered in this section help to spell out the perspective on causal inference that we have adopted, which clearly differs from that of KKV. We now turn to some additional distinctions that help to develop further our overall argument about sources of leverage in causal inference: qualitative versus quantitative research, cases versus observations, and data-set observations versus causalprocess observations.

 

FOUR APPROACHES TO THE QUALITATIVE VERSUS QUANTITATIVE DISTINCTION

 

Debates about sources of leverage for eliminating rival explanations in causal inferenceand obviously also about tools for descriptive inferenceare routinely framed in terms of the relative strengths and weaknesses of qualitative and quantitative research. Yet this distinction needs to be disaggregated if it is to play a useful role in thinking about research design. In conjunction with this distinction, we do not find two neatly bounded categories, but rather four overlapping categories (see table 9.1). However, notwithstanding this complexity, it is still useful for many purposes to use the dichotomous labels of qualitative versus quantitative.

 

Level of Measurement

 

One distinction concerns the level of measurement. Here we find ambiguity regarding the cut-point between qualitative and quantitative, and also contrasting views of the leverage achieved by different levels of measurement. Some scholars label data as qualitative if it is organized at a nominal level of measurement and as quantitative if it is organized at an ordinal, interval, ratio, or other ‘‘higher’’ level of measurement (Vogt 1999: 230). Alternatively, scholars sometimes place the qualitative-quantitative threshold between ordinal and interval data (Porkess 1991: 179). This latter cutpoint is certainly congruent with the intuition of many qualitative researchers that ordinal reasoning is central to their enterprise (Mahoney 1999: 116064). With either cut-point, however, quantitative research is routinely associated with higher levels of measurement.

 

Higher levels of measurement are frequently viewed as yielding more analytic leverage, because they provide more fine-grained descriptive differentiation among cases. However, higher levels of measurement depend on complex assumptions about logical relationshipsfor example, about order, units of measurement, and zero pointsthat are sometimes hard to meet. If these assumptions are not met, such fine-grained differentiation can be illusory, and qualitative categorization based on close knowledge of cases and context may in fact provide more leverage. In any case, careful categorization is a valuable, indeed essential, analytic tool.

 

Size of the N

 

A second approach is to identify the qualitative-quantitative distinction with the contrast between small-N and large-N research. Here we will treat the question of the ‘‘N’’ as a relatively straightforward matter involving the number of observations on the main dependent variable that the researcher seeks to explain, understood at the level of analysis that is the principal focus of the research. In a subsequent section, we will explore the complex issues that can arise in establishing the N.

 

The N involved in a paired comparison of Japan and Sweden, or in an analysis of six military coups, would routinely be identified with the qualitative tradition. By contrast, an N involving hundreds or thousands of observations would routinely be identified with the quantitative tradition. Although there is no well-established cut-point between qualitative and quantitative in terms of the N, such a cut-point might be located somewhere between ten and twenty.

 

However, some studies definitely break the methodological stereotypes: that is, those with a larger N that in other respects adopt a qualitative approach; as well as those with a relatively small N that in other respects adopt a quantitative approach. Examples of qualitative studies which have a relatively large N include Rueschemeyer, Stephens, and Stephens’s (1992) Capitalist Development and Democracy (N=36), Tilly’s (1993) European Revolutions, 14921992 (hundreds of cases), and R. Collier’s (1999) Paths toward Democracy (N=27). Wickham-Crowley’s (1992) Guerillas and Revolution in Latin America focuses on twenty-six cases: he carries out a qualitative/narrative analysis, based on detailed discussion of thirteen cases, and he analyzes thirteen additional cases using dichotomous/categorical variables and Boolean methods.

 

Some studies that rely heavily on statistical tests in fact have a smaller N than these qualitative studies. Examples are found in the literature on advanced industrial countries: a study with an N of eleven focused on the impact of partisan control of government on labor conflict (Hibbs 1987); and studies with an N of fifteen focused on the influence of corporatism and partisan control on economic growth (Lange and Garrett 1985, 1987; Jackman 1987, 1989; Hicks 1988; and Hicks and Patterson 1989; Garrett 1998). Likewise, quantitative research that seeks to forecast U.S. presidential and congressional elections routinely employs an N of eleven to thirteen (e.g., Lewis-Beck and Rice 1992; J. Campbell 2000; Bartels and Zaller 2001). Choices about the N are thus at least partially independent from choices about other aspects of a qualitative or quantitative approach.

 

 

Table 9.1. Four Approaches to the Qualitative-Quantitative Distinction

 

Approach Defining Distinction Comment

 

1. Level of Cut-point for qualitative vs. Lower levels of measurement require Measurement quantitative is nominal vs. ordinal fewer assumptions about underlying

 

scales and above; alternatively, logical relationships; higher levels nominal and ordinal scales vs. yield sharper differentiation among interval scales and above. cases, provided these assumptions

 

are met.

 

2. Size of Cut-point between small N vs. A small N and a large N are the N large N might be somewhere commonly associated with

 

between 10 and 20. contrasting sources of analytic leverage, which correspond to the third and fourth criteria below.

 

3. Statistical In contrast to much qualitative Statistical tests provide explicit, Tests research, quantitative analysis carefully formulated criteria for

 

employs formal statistical tests. descriptive and causal inference; a characteristic strength of quantitative research. Yet this again raises question of meeting relevant assumptions.

 

4. Thick vs. Central reliance on detailed Detailed knowledge associated with Thin Analysisa knowledge of cases vs. more thick analysis is likewise a major

 

limited knowledge of cases. source of leverage for inference; a characteristic strength of qualitative research.

 

a. This distinction draws on Coppedge’s (1999) discussion of thick versus thin concepts. See also note 22 in the text below.

 

 

 

Scholars decide on the N according to many different criteria, including the availability of analytically relevant data and a concern with the alternative sources of inferential leverage associated with a small N and a large N. The third and fourth criteria for qualitative versus quantitative, presented below, address these alternative sources of leverage.

 

Statistical Tests

 

The third approach focuses on the use of statistical tests. An analysis is routinely considered quantitative if it employs statistical tests in reaching its descriptive and explanatory conclusions. By contrast, qualitative research typically does not employ such tests. While the use of statistical tests is generally identified with higher levels of measurement, the two are not inextricably linked. Quantitative researchers frequently apply statistical tests to nominal variables. Conversely, qualitative researchers often analyze data at higher levels of measurement without utilizing statistical tests. For example, in the area studies tradition, a qualitative country study may make extensive reference to ratio-level economic data.

 

Statistical tests are a powerful analytic tool for evaluating the strength of relationships and important aspects of the uncertainty of findings in a way that is more difficult in qualitative research. Yet, as with higher levels of measurement, statistical tests are only meaningful if complex underlying assumptions are met. If the assumptions are not met, alternative sources of analytic leverage employed by qualitative researchers may in fact be more powerful.

 

Thick versus Thin Analysis

 

Finally, we distinguish between ‘‘thick’’ and ‘‘thin’’ analysis. Qualitative research routinely utilizes thick analysis, in the sense that analysts place great reliance on a detailed knowledge of cases. Indeed, some scholars consider thick analysis the single most important tool of the qualitative tradition. One type of thick analysis is what Geertz (1973) calls ‘‘thick description,’’ that is, interpretive work that focuses on the meaning of human behavior to the actors involved. In addition to thick description, many forms of detailed knowledge, if utilized effectively, can greatly strengthen description and causal assessment. By contrast, quantitative researchers routinely rely on thin analysis, in that their knowledge of each case is typically far less complete. However, to the extent that this thin analysis permits them to focus on a much larger N, they may benefit from a broader comparative perspective, as well as from the possibility of using statistical tests. Whereas the precision and specificity of statistical tests are a distinctive strength of quantitative research, the leverage gained from thick analysis is a characteristic strength of qualitative research.

 

The distinction between thick and thin analysis is closely related to Ragin’s (1987) discussion of case-oriented versus variable-oriented research. Of course, qualitative researchers do think in terms of variables, and quantitative researchers do deal with cases. The point is simply that qualitative researchers are more often immersed in the details of cases, and they build their concepts, their variables, and their causal understanding in part on the basis of this detailed knowledge. Such researchers seek, through their in-depth knowledge of cases, to carefully rule out alternative explanations until they come to one that stands up to scrutiny. Detailed knowledge of cases does sometimes play a role in quantitative research. Indeed, some quantitative research employs thick analysis. However, in-depth knowledge is far more common in qualitative research and much less common among quantitative researchers, who tend to rely on statistical tests.

 

Drawing Together the Four Criteria

 

As this section illustrates, there is no single, sharp distinction that consistently differentiates qualitative and quantitative researchand that unambiguously sorts out the most important sources of inferential leverage. We would certainly classify as qualitative a study that places central reliance on nominal categories, focuses on relatively few cases, makes little or no use of statistical tests, and places substantial reliance on thick analysis. By contrast, a study based primarily on interval- or ratio-level measures, a large N, statistical tests, and a predominant use of thin analysis is certainly quantitative. Both types of study are common, which is why it makes sense, for many purposes, to maintain the overall qualitative-quantitative distinction.

 

However, an adequate discussion of inferential leverage requires careful consideration not only of these polar types, but also of the intermediate alternatives. For example, a particularly strong form of inferential leverage may be gained by combining statistical tests with thick analysis, bringing together their complementary logics in what may be called ‘‘nested inference.’’ This relationship between qualitative and quantitative methods is very different from that proposed by KKV, because with nested inference the characteristic strengths of each approach supplement and enhance research based on the other approach.

 

CASES VERSUS OBSERVATIONS

 

Well-understood definitions of ‘‘case’’ and ‘‘observation’’ are essential in discussing sources of inferential leverage in qualitative and quantitative research, yet finding adequate definitions of these terms is a serious challenge. Indeed, the question ‘‘what is a case?’’ is the title of an entire book (Ragin and Becker 1992).

 

Cases

 

We understand a case as one instance of the unit of analysis employed in a given study. Cases correspond to the political, social, institutional, or individual entities or processes about which information is collected. For example, the cases in a given study may be particular nation-states, social movements, political parties, trade union members, or episodes of policy implementation. The number of cases is conventionally called the ‘‘N.’’

 

It is productive to think about cases in relation to a ‘‘rectangular data set’’that is, a matrix or uniform array of data in which the rows correspond to cases and the columns correspond to variables. The pieces of data aligned in a single row in the data set pertain to a particular case, and the number of rows corresponds to the number of cases (the N). The pieces of data aligned in a single column in the data set pertain to a particular variable, and the number of columns corresponds to the number of variables. The information in a rectangular data set may be either quantitative or qualitativethat is, it may consist of scores on variables at any level of measurement.

 

Observations

 

We now present a definition of the term observation that serves to underscore the importance of this second, horizontal slice. ‘‘Observation,’’ of course, has a commonsense meaning: it is an insight or piece of information recorded by the researcher about a specific feature of the phenomenon or process being studied. This usage is widespread, and it is found, for example, in KKV (57). In the language of variables, an observation in this sense is a single piece of data that constitutes the value of a variable for a given case. The commonsense meaning also includes other kinds of information that might not conventionally be thought of as a score on a variablefor example, information about context that makes the phenomenon under study intelligible and that helps the researcher avoid basic mistakes in interpreting it.

 

A fundamentally different meaning of observation, which is standard in quantitative analysis, refers to a row in a rectangular data set. According to this meaning, an observation is the collection of scores for a given case, on the dependent variable and all the independent variables (KKV 117; also 53, 209). In other words, an observation is ‘‘all the numbers for one case,’’ that is, all the scores within any given row of the data set. In relation to this definition of observation, a ‘‘case,’’ which also corresponds to a row in the data set, should be understood as the larger setting from which the numbers in each row are drawn.

 

The second definition may initially seem counterintuitive for scholars not oriented toward thinking about rectangular data sets and matrix algebra. Whereas the commonsense meaning of observation refers only to one score, this second meaning involves two or more scores. A useful way of clarifying this second usage is to think about it as a ‘‘data point,’’ which in a two-dimensional scatterplot corresponds to the scores of the independent and dependent variables. The data point is an observation whose meaning depends on simultaneously considering the scores for both variables. The cluster of information contained in a data point plays a central role in causal inference by focusing our attention simultaneously on the scores for the independent and dependent variables. This same idea can be extended to the analysis of more than two variables (as in scatterplots with three or more dimensions), and the purpose of this second definition of observation is to highlight that central inferential role. As with the rectangular data set, the data entailed in an observation of this type may be either quantitative or qualitative.

 

This second meaning of observation serves a useful methodological purpose. For example, it can clarify the meaning of the well-known ‘‘manyvariables, small-N problem’’ (Lijphart 1971: 68591). In debates on methodology, increasing the number of observations is routinely understood as a basic solution to this problem. Obviously, the content of this recommendation depends on our definition of an observation. For instance, if we score the cases on an additional variable, we add observations in the sense of the ordinary language usage noted abovethat is, we introduce one new piece of data for each case. However, adding a variable generally makes the many-variables, small-N problem worse, because it reduces the degrees of freedom. In this sense, increasing the number of observations does not help the problem concerning the degrees of freedom.

 

By contrast, using the second definition of observation, it makes sense to say that increasing the number of observations addresses the many-variables, small-N problem. Adding observationsin the sense of adding ‘‘all the numbers’’ for one or more new casesincreases the number of rows in the matrix.

 

This usage thus clarifies a basic piece of methodological advice. At various points in the present volume, we argue that ‘‘increasing the number of observations,’’ as KKV frequently recommends, may not always be a good idea. However, taking one position or the other on this issue makes little sense as long as there is ambiguity about whether one is referring to adding ‘‘pieces of data’’ or adding cases to the analysis.

 

Given that it is confusing when the same term carries two meanings, we adopt the following usage. When we mean observation in the first, commonsense usage discussed above, we refer to a score, or to a piece of data or information. To highlight the second meaning of observation, we propose the expression ‘‘data-set observation.’’

 

DATA-SET OBSERVATIONS VERSUS CAUSAL-PROCESS OBSERVATIONS

 

We thus introduce the label ‘‘data-set observation’’ to refer to observation in the sense of a row in a rectangular data set. At the same time, we do not want to lose sight of the critical role played in causal inference by information that is not part of a row in a data set. We therefore introduce the expression ‘‘causal-process observation’’ to emphasize the role such pieces of information play in causal inference (table 9.2). Whereas data-set observations lend themselves to statistical tests within the framework of what we have called ‘‘thin analysis,’’ causal-process observations offer an alternative source of inferential leverage through ‘‘thick analysis,’’ as discussed above.

 

A causal-process observation is an insight or piece of data that provides information about context or mechanism and contributes a different kind of leverage in causal inference. It does not necessarily do so as part of a larger, systematized array of observations. Thus, a causal-process observation might be generated in isolation or in conjunction with many other causal-process observationsor it might also be taken out of a larger data set. In the latter case, it yields inferential leverage on its own. In doing so, a causal-process observation may be like a ‘‘smoking gun.’’ It gives insight into causal mechanisms, insight that is essential to causal assessment and is an indispensable alternative and/or supplement to correlation-based causal inference.

 

Table 9.2. Data-Set Observation versus Causal-Process Observation
 
Data-Set Observation Causal-Process Observation
 
Corresponding Standard quantitative/statistical Ordinary language meaning. Thus, Root meaning. Thus, all the scores for a a piece of data or information; a Meaning of given case; a row in a rectangular datum. ‘‘Observation’’ data set.
 
Contribution The foundation for correlation- The foundation for process-oriented to Causal based causal inference. Provides causal inference. Provides informaInference the basis for tests of overall tion about mechanism and context.
 
relationships among variables.

 

 

Part of the contrast between data-set observations and causal-process observations is that these two expressions utilize different root meanings of the term ‘‘observation’’ (table 9.2). Because the idea of ‘‘observation’’ is so closely tied in the minds of many quantitatively oriented scholars to data in a rectangular matrix, we might have chosen the expression ‘‘causal-process information.’’ However, we deliberately introduce the expression ‘‘causalprocess observation’’ to emphasize that this kind of evidence merits the same level of analytic and methodological attention as do ‘‘data-set observations.’’

 

While we can distinguish these two types of observations, we also find connections between them. For example, a scholar who has discovered a fruitful causal-process observation in one caseinvolving, for example, a causal mechanism that links two variablesmight then proceed to systematically score many cases on this same analytic feature and add the new scores to an existing collection of data-set observations. Thus, the discovery of a causal-process observation can motivate the systematic collection of new data. Alternatively, a researcher who has done an analysis based on data-set observations may turn to causal-process observations to provide evidence about causal mechanisms. Thus, inference may be strengthened by movement in either direction.

 

The idea of causal-process observations is intended to make explicit the source of leverage in causal inference that lies at the heart of a long tradition of within-case analysis in qualitative research, a tradition discussed above by Rogowski and Tarrow and also in the online chapters by Collier, Mahoney, and Seawright, Munck, and McKeown. As discussed in the first of the online chapters, this tradition dates back at least to the 1940s and has, over the years, employed a number of different labels in the effort to pinpoint the distinctive analytic leverage offered by this approach. Recent writing on ‘‘mechanisms’’ is a valuable extension of this tradition.

 

Although the role of causal-process observations in qualitative research may be fairly obvious, their contribution to quantitative work should be underscored. Goldthorpe (2001), developing a line of argument that explicitly builds on the work of statisticians, pinpoints this contribution in his important article ‘‘Causation, Statistics, and Sociology.’’ He uses the label ‘‘generative process’’ in referring to the linkage mechanisms that play an essential role in giving causal interpretations to quantitative associations. Goldthorpe contrasts this focus on generative processes with attempts to demonstrate causation through experiments or regression models.

 

This idea of causation [that] has been advanced by statisticians does not . . . reflect specifically [quantitative] thinking. It would appear to derive, rather, from an attempt to specify what must be added to any [quantitative] criteria before an argument about causation can convincingly be made. (Goldthorpe 2001: 8)

 

This procedure assumes that in quantitative analysis, an association is created by some ‘‘mechanism’’ operating ‘‘at a more microscopic level’’ than that at which the association is established. In other words, these authors would alike insist . . . on tying the concept of causation to some process existing in time and space, even if not perhaps directly observable, that actually generates the causal effect of X on Y and, in so doing, produces the [quantitative] relationship that is empirically in evidence. . . . [This mechanism can] illuminate the ‘‘black boxes’’ left by purely [quantitative] analysis. . . . (Goldthorpe 2001: 9)

 

We see a sharp contrast between (a) Goldthorpe’s assertion that inference based on causal-process observations does not involve the approach of what we are calling mainstream quantitative methods; and (b) KKV’s approach, which explicitly seeks to subordinate this form of causal inference to its quantitative framework. KKV argues, in discussing the inferences drawn from ‘‘process tracing’’ (226), ‘‘historical analysis,’’ and ‘‘detailed case studies’’ (86), that these inferences must be treated through the framework for inference discussed throughout their book (8587; see also 22628). King, Keohane, and Verba reemphasize this point in chapter 7 above (111, 12122 this volume). Yet KKV’s framework is designed for analyzing data-set observations and not causal-process observations, and the book’s recommendations therefore effectively treat causal-process observations as if they were data-set observations.

 

Our point, by contrast, is that causal-process observations offer a different approach to inference. Causal-process observations are valuable, in part, because they can fill gaps in conventional quantitative research. They are also valuable because they are an essential foundation for qualitative research. One goal of the present discussion is to strengthen the methodological justification for that foundation. Because inferences based on dataset and causal-process observations are fundamentally different, one promising direction of research is to combine the strengths of both types of observation within a given study. In the present volume, Tarrow presents an invaluable inventory of practical suggestions for how this may be accomplished. We would call attention to two of Tarrow’s techniques, which he labels ‘‘sequencing qualitative and quantitative research’’ and ‘‘triangulation.’’ These utilize the distinctive strengths of alternative tools for data collection and inference. Tarrow (107 this volume) cites research on Poland’s Solidarity Movement as an example of the kind of fruitful exchange that may take place between analysts using data-set observations and others relying on causal-process observations. Tarrow also points to the complementarities that result when elements of both approaches are combined in a given study.

 

In sum, both data-set observations and causal-process observations can play a role in both qualitative and quantitative research. The rich causal insights that qualitative researchers may gain from thick analysis can often be supplemented by systematic cross-case comparison using data-set observations, statistical tests, and thin analysis. Similarly, the correlation-based inferences that quantitative researchers derive from data-set observations can often be enhanced by causal-process observations.

 

Examples of Causal-Process Observations

 

Three brief, schematic illustrations of causal-process observations will help to clarify their contribution to causal inference. Because we seek to underscore the contrast with data-set observations, we present examples of studies in which both data-set and causal-process observations are employed.

 

The first example focuses on the use of causal-process observations to discredit the findings of a time-series cross-sectional regression analysis, based on data-set observations. In an article that became an important part of the political debate after the 2000 U.S. presidential election, John R. Lott (2000) used regression to conclude that at least 10,000 votes for Bush were lost in the Florida panhandle because the media declared Gore the winner in Florida shortly before the polls had closed in this region, which, unlike the rest of the state, is on Central Standard Time. Brady (chap. 12, this volume) employs causal-process observations, focused on the actual events of election day, to demonstrate that this inference is implausible. Brady shows that the maximum number of votes that Bush could have lost was 224, and that the actual loss was probably just a few dozen votes. Brady’s causalprocess observations draw on diverse sources of data to establish several pertinent facts: the number of last-minute voters, the proportion of this group of voters exposed to the media, the further proportion who would specifically have heard media predictions of the outcome, and the likely impact of this prediction on their vote. Although he could have addressed this question through a broader analysis based on data-set observations, Brady is convinced that he got better answers using causal-process observations focused sharply on what actually happened that day in the Florida panhandle.

 

Another example is Susan Stokes’s (2001) analysis of the dramatic economic policy shifts toward neoliberalism initiated by several Latin American presidents between 1982 and 1995. These presidents had campaigned strongly against neoliberalism. Yet, shortly after being elected, they abruptly embraced neoliberalism. Stokes’s question is whether the presidents opted for neoliberalism on the basis of (a) considered views about the consequences for the economy and the functioning of the state in their countries if they failed to implement neoliberal reform, or (b) a narrower rent-seeking calculation regarding short-term economic or social payoffs from powerful market actors. Stokes systematically compares thirty-eight LatinAmerican presidents, some of whom switched and some of whom did not. She scores them on a series of explanatory variables, as well as on the outcome variable, that is, the adoption of neoliberal policies, thus using dataset observations. This approach, employing both a probit model (93101) and more informal comparative analysis, yields evidence favoring the first explanation, that is, that the choice was based on the conviction that neoliberalism would solve a series of fundamental national problems.

 

Stokes supplements this large-N analysis by examining a series of causalprocess observations concerning three of the presidents, who abruptly switched from populist campaign rhetoric to neoliberal policies after winning the election. In this small-N analysis, her inferential leverage derives from the direct observation of causal links. In one of these analyses, Stokes offers an intriguing step-by-step account of how Peruvian President Fujimori decided to abandon the more populist rhetoric of his campaign and adopt a package of neoliberal reforms (2001: 6973). Stokes shows that, just after Fujimori’s electoral victory, a sequence of encounters with major international and domestic leaders exposed him to certain macroeconomic arguments, and these arguments convinced him that Peru’s economy was headed for disaster if neoliberal reforms were not adopted. Causal-process observations thus provide valuable evidence for the argument that Fujimori’s decision was driven by this conviction, rather than by the rent-seeking concerns identified in the rival hypothesis.

 

A final example of the distinctive contribution of causal-process observations comes from Nina Tannenwald’s (1999) analysis of the role played by normative concerns in U.S. decisions about the use of nuclear weapons. Tannenwald hypothesizes that decisions about nuclear weapons have been guided by a ‘‘nuclear taboo,’’ that is, a normative stigma against nuclear weapon use, which she hypothesizes to have been a powerful influence on U.S. decision making during the decades since the invention of nuclear weapons. She frames her discussion around the important competing hypothesis that decisions about nuclear weapons were guided exclusively by considerations associated with deterrence theory.

 

Tannenwald uses a small-N, qualitative test based on data-set observations to evaluate the hypothesis that the nuclear taboo has had a causal impact on U.S. decision making. In comparing U.S. decisions about nuclear weapons during World War II, the Korean War, the Vietnam War, and the Gulf War, Tannenwald controls for deterrence, since none of these conflicts involved an opponent with the capacity for nuclear retaliation. Because nuclear weapons were only used during World War II, when the broad tradition of negative world public opinion about such weapons had not yet formed, Tannenwald’s data-set observations are compatible with the nuclear taboo hypothesis. This comparison of four different wars thus provides some initial evidence in favor of Tannenwald’s argument. However, the N is only four, so the comparison yields relatively little analytic leverage.

 

To gain additional leverage, Tannenwald devotes most of her analysis to the historical record, in search of evidence regarding the actual priorities of key political leaders during decisions about nuclear weapon use in each crisis. Since the nuclear taboo hypothesis implies that decision makers would be both aware of and explicitly concerned about such a taboo, causal-process observations focused on decision-making processes during each war can provide a useful test of the hypothesis. If the historical record shows that decision makers actually discussed constraining effects of a nuclear taboo, then Tannenwald has found important evidence in favor of the hypothesis.

 

In fact, Tannenwald finds many such statements in accounts of the relevant decision-making processes. To cite a few representative examples, when discussing the Korean War, Tannenwald presents documentary evidence that key U.S. decision makers thought the use of nuclear weapons would be a disaster in terms of world public opinion (1999: 444) and, in the words of one prominent decision maker, ‘‘offensive to all morality’’ (1999: 445). In parallel top-level debates on the potential use of nuclear weapons during the Vietnam War, one key meeting reached the conclusion that ‘‘use of atomic weapons is unthinkable’’ (1999: 454) for normative reasons.

 

Of course, this evidence could be accounted for in other ways than by the nuclear taboo hypothesis. For example, the statements she quotes might be strategic misrepresentations of political leaders’ real agendas, or the beliefs and priorities of these leaders may in some way have been irrelevant to the decisions that they ultimately adopted. However, to the extent that researchers find alternative accounts such as strategic misrepresentation less plausible, Tannenwald’s causal-process observations provide valuable support for her argument.

 

In discussing these three examples, we certainly do not claim to have discovered a new type of evidence for use in political and social research. Such evidence is obviously familiar to scholars who use process tracing, withincase analysis, and related techniques. Our goals in this discussion are, first, to argue that these many forms of analysis employ a similar kind of evidence; and, second, to give this type of evidence, based on causal-process observations, a methodological status parallel to that of data-set observations.

 

Further, these three examples illustrate an important complementarity between data-set observations and causal-process observations. In all three examples, the causal-process observations focus on ideas or priorities that must be held by actors in order for the hypothesis associated with the dataset observations to be correct. They identify indispensable steps in the causal process, without which the hypothesis does not make sense.

 

In the following section, we explore the analytic leverage that derives from these two types of observations.

 

Implications of Contrasting Types of Observations

 

The distinction between data-set observations and causal-process observations helps to clarify several methodological issues. These include differences between qualitative and quantitative research; the implications of adding different kinds of data for the N, for degrees of freedom, and for inferential leverage; the consequences of missing data; the tools of causal inference employed in quantitative analysis; and advice about increasing the number of observations. These issues will now be explored in turn.

 

Qualitative versus Quantitative

 

Large-N quantitative researchers may routinely use large numbers of data-set observations and many fewer causal-process observations. By contrast, small-N qualitative researchers may use few data-set observations and a great many causal-process observations. These qualitative researchers use causal-process observations, as we put it above, to slowly but surely rule out alternative explanations until they come to one that stands up to scrutiny. This is a style of causal inference focused on mechanisms and processes, rather than on covariation among variables.

 

At the same time, we do not wish to narrowly identify the qualitative versus quantitative distinction with the causal-process versus data-set distinction. The two types of observations, used together, can provide strong inferential leverage in both traditions of research. For example, within the framework of Alexander George’s ‘‘method of structured, focused comparison,’’ which has played a central role in defining the comparative case-study tradition, researchers ask ‘‘a set of standardized, general questions of each case’’ (1979a: e.g., 62), producing a uniform collection of data-set observations based on qualitative data. Conversely, as Goldthorpe and others have argued (see above), causal-process observations can make a valuable contribution to mainstream quantitative research. The label ‘‘nested inference,’’ noted above, is intended to highlight this two-way contribution.

 

Table 9.3. Adding Different Forms of Data: Consequences for Causal Inference
 
Consequences for Causal Inference
 
For Degrees Adding Data For the N of Freedom For Inferential Leverage
 
Adding Increases Increases Greater degrees of freedom increase Data-Set the N degrees of leverage; yet leverage may be reduced if the Observations freedom addition of new observations violates
 
measurement and causal assumptions
 
Adding Usually does Usually does New information about causal patterns may Causal- not affect not affect increase leverage; and if observations are Process the N degrees of drawn from original set of cases, there is less Observations freedoma risk of violating assumptions underlying
 
measurement and causal inference
 
Adding Does not Decreases Fewer degrees of freedom reduce leverage; Variables affect the N degrees of yet leverage is increased if key missing
 
freedom variables are added
 
a There is no effect, unless focusing on causal-process observations leads the analyst to modify either the model being estimated, or the data set.

 

 

 

Adding Observations and Adding Variables: Consequences for the N, Degrees of Freedom, and Inferential Leverage

 

The distinctions offered above may help refine the frequently repeated advice to add observations as a means of strengthening causal inference. We would frame this topic more generically as ‘‘adding data,’’ which can include adding data-set observations, causal-process observations, and new variables. These three alternative ways of adding data have different consequences for the N, for degrees of freedom, and for inferential leverage (table 9.3).

 

Consequences for the N are summarized in the left-hand column of table 9.3. The N is the number of cases, which corresponds to the number of data-set observations, that is, the number of rows in a rectangular data set. As noted, this idea applies equally to quantitative and qualitative data. The key distinction here is that increasing the number of data-set observations increases the Nwhereas adding causal-process observations often does not affect the N. Given the extensive discussion of ‘‘increasing the number of observations,’’ this distinction is helpful. Finally, adding variables, which may incorporate many additional pieces of data into the analysis, adds columns to the rectangular data set but does not increase the N.

 

The second issue concerns the consequences of adding data for the degrees of freedom (see middle column in the table). Degrees of freedom merit attention, because to the extent they are greater, the researcher has more capacity to adjudiate among rival explanations, within the framework of analyzing data-set observations. Other things being equal, the more data-set observations (i.e., the larger the N) vis-à-vis the number of parameters to be estimated (which usually corresponds to the number of explanatory variables), the greater the degrees of freedom. Adding causal-process observations does not usually increase the N or affect the degrees of freedom. If the researcher adds data in the sense of adding variables, this typically reduces the degrees of freedom. This is because the N remains unchanged, while the number of parameters about which inferences are to be made has increased.

 

Another question concerns the overall consequence for inferential leverage of adding different forms of data (right column in table 9.3). Degrees of freedom is a useful concept, but it does not capture all relevant aspects of inferential leverage. For example, it is true that adding data-set observationsthat is, adding casescan often increase inferential leverage by increasing degrees of freedom. However, a loss of inferential leverage may occur if adding cases extends the analysis to new domains where prior conceptualizations are inappropriate, measurement procedures are invalid, or causal homogeneity is lacking.

 

Moving down the right column in the table, we see that if the researcher makes insightful use of causal-process observations, this can increase inferential leverage. Finally, adding variables decreases the degrees of freedom and can therefore decrease inferential leverage. However, if relevant missing variables are added to the model, inferential leverage thereby increases because missing-variable bias decreases.

 

As an example of how adding different forms of data affects inferential leverage, let us consider a comparative study with an N of twenty-four, focused on explaining change in electoral systems. One hypothesis is that such change occurs when (a) public protest over political corruption increases sharply, (b) electoral reform is seen as a salient response, and (c) legislators have the constitutional authority to rapidly introduce electoral reform (Shugart, Moreno, and Fajardo 2001: 35, 2334). From this starting point, the researcher might add data-set observations to the study by finding additional episodes of potential electoral change. The N and the degrees of freedom are thereby increased; other things being equal, the scholar has gained inferential leverage. However, other things are not equal if concepts and indicators do not fit the new cases, or if causal homogeneity is violated. To the extent that these problems arise, leverage for causal inference may actually be reduced.

 

Alternatively, the researcher might add causal-process observations to strengthen causal inferences about the original four episodes of potential electoral reform. For example, the researcher might carefully examine critical moments in the crystallization or collapse of public protest, or turning points in the electoral reform process. Nonetheless, in terms of data-set observations, the N is still twenty-four. The degrees of freedom have not changed, yet inferential leverage may have increased.

 

Finally, the investigator might add data by introducing new explanatory variablesfor example, the structure of the party systemas part of the uniform array of scores on the dependent and independent variables. Clearly, the N has not increased, and, with more explanatory variables, the degrees of freedom will typically be reduced. On the other hand, if the original model was underspecified and the structure of the party system is, in fact, a key missing variable, then inferential leverage is strengthened by adding this variable, which may counteract the effect of the reduced degrees of freedom.

 

This example illustrates how adding data to an analysis can mean three different things, and that degrees of freedom, although a valuable concept, captures only one aspect of inferential leverage. This conclusion stands out clearly in table 9.3, where for all three rows the consequences for overall inferential leverage are different, and often more ambiguous, than they are for the degrees of freedom. In order to evaluate advice to ‘‘increase the number of observations’’ as a means of strengthening research design, we must adopt a multifaceted view of the types of data that may be added and of their varied contribution to improving inference.

 

Implications for Research Design

 

These arguments, as summarized in table 9.3, have implications for research design. KKV repeatedly makes a case for increasing the N, but we should recognize that researchers often have good reasons for focusing on a small N. Therefore, advice to increase the N may be misplaced. For instance, the researcher may have made an enormous investment in gaining expertise on a few cases. This expertise can provide the researcher with access to a broad array of causal-process observations, which in turn can sometimes yield greater leverage for valid inference than additional cases about which the investigator knows far less. Alternatively, this scholar may have serious doubts about whether the causal patterns in these cases will be found in other casesthat is, doubts about causal homogeneity and the generality of findings. In discussions of method and theory, the problem of generalization, and specifically of overextending findings, is both an old theme (Weber 1949: 7276; Bendix 1963; Walker and Cohen 1985) and a recently renewed concern (Elster 1999: chap. 1). Given this potential problem, along with the issues of measurement validity that can arise in moving to new contexts, the analyst might be well advised to stick to a small N.

 

By contrast, adding causal-process observations does not pose this problem of overextending the analysis, because the focus typically remains on the original cases. Such research seeks to deepen the knowledge of causal processes and mechanisms in these cases, rather than extend the study to additional cases. The challenge a researcher faces when adding causal-process observations is to know which details to collect, when enough details have been collected to make an inference, and how to increase the likelihood that this inference is valid. The literature on case studies and withincase analysis would do well to address these issues in greater depth.

 

To conclude, although the advice to increase the number of data-set observations is sometimes valuable, it may simply be distracting for researchers who have deliberately focused on explaining a small number of important outcomes. These researchers may find that collecting relevant causal-process observations is more helpful. Further, for quantitative researchers, causal-process observations can be a valuable supplement to large-scale data sets.

 

Missing Data

 

A distinction should also be made about the implications of missing data for these two types of observations. With data-set observations, missing data can be a serious issue. Indeed, the idea that data-set observations involve a uniform array should be understood as encompassing the norm that the data set should preferably be complete, and that a problem of missing data requires close attention (Griliches 1986; Greene 2000: 25963).

 

Almost by definition, the issue of missing data does not arise in the same way for causal-process observations. The inferential leverage derived from causal-process observations does not depend on having complete data across a given range of cases and variables. Thus, one or a few causal-process observations may provide great leverage in making inferences. For example, Stokes’s analysis of presidential policy switches, discussed above, derives analytic leverage from observations of the decision-making processes involved in only three of the thirty-eight cases that she considers. Her close analysis of these three cases obviously does not ‘‘prove’’ her hypothesis for all thirty-eight episodes, but it does increase the plausibility of her overall conclusions by offering telling evidence about three episodes. Likewise, data-set observations can potentially compensate for gaps or inadequacies in causal-process observations.

 

Standard Quantitative Tools versus Careful Analysis of Causal-Process Observations

 

The distinction between data-set observations and causal-process observations offers a new basis for thinking about the application of standard quantitative tools to different kinds of research. We have elaborate quantitative procedures for evaluating inferences made with data-set observations. By contrast, causal-process observations force us to make complex judgments about inference and probability without explicit guidance from quantitative tools. It is precisely the emphasis on standard quantitative tests that leads KKV to make what we view as a major mistake: subordinating causal-process observations to a conventional quantitative framework (see again 8587, 22628).

 

A small number of causal-process observations, that seek to uncover critical turning points or moments of decision making, can play a valuable role in causal inference. Making an inference from a smoking gun does not require a large N in any traditional sense. However, it does require careful thinking about the logic of inference and a rich knowledge of context, which may in turn depend on many additional causal-process observations. The several chapters in the present volume that discuss tools for qualitative analysis have suggested points of departure for reasoning about how these inferences take place.

 

CONCLUSION: DRAWING TOGETHER THE ARGUMENT

 

In chapters 8 and 9, we have expressed reservations about KKV’s positions on causal inference, descriptive inference, and related methodological questions. KKV in effect treats causal inference as fairly straightforward, provided the researcher follows the quantitative template. We would instead argue that adequate causal inference is difficult. To the extent that KKV addresses challenges to causal inference, it treats these issues as in effect depending on the power of quantitative tests. Thus, the book focuses on increasing the number of observations, estimating uncertainty, and the closely related and misleading idea thatas KKV puts itdeterminate research designs (i.e., designs with a sufficiently large N and a lack of perfect multicollinearity) are the ‘‘sine qua non’’ of causal inference.

 

This emphasis on determinate research designs obscures basic challenges in making what we prefer to call ‘‘interpretable’’ causal inferences: the challenges of ruling out an unknown number of alternative explanations and dealing with hard-to-test assumptions. Effective causal inference requires bringing to bear as many different kinds of evidence as possible, including evidence from qualitative research. Yet in KKV’s approach, the contribution of qualitative evidence is undervalued because it is inappropriately assessed in terms of the size of the N and quantitative tests, which misrepresents its distinctive contributions.

 

With regard to descriptive inference, KKV devotes a chapter to this topic. However, the book’s discussion focuses primarily on relatively straightforward questions, such as how to generalize from a sample to a population and how to productively organize and summarize descriptive detail. Yet descriptive inference raises broader, more complex issues that require far more attention. Causal inferences are only reasonable if measurement is valid. Measurement validity, in turn, depends on careful attention to conceptualizationa topic for which KKV’s advice points in the wrong directionand on the plausibility of each decision taken in the measurement process. Issues of conceptualization and measurement are more fundamental than the conventional problem of generalizing from a sample to a population; indeed, such issues must be addressed even if researchers make no attempt to generalize their claims.

 

For many other methodological questions, we are again convinced that KKV adopts positions that are somewhat simplistic: for example, the book’s arguments about appropriate techniques for case selection and against testing deterministic causal models, along with the failure to recognize that techniques of within-case analysis yield a different kind of evidence than do conventional quantitative data. These are complex issues and must be addressed within a methodological framework that extends well beyond that of KKV and of mainstream quantitative methods.

 

In the present volume, we have sought to develop this broader framework and have argued that it yields a more positive perspective on qualitative tools for descriptive and causal inference. Part of this argument derives from what we have called the statistical rationale for qualitative research. Specifically, we have invoked the statistical idea that important gaps in causal inference based on the quantitative analysis of data-set observations can be filled by evidence derived from qualitative, causal-process observations. Inference based on qualitative data routinely employs different assumptions than quantitative inference, and correspondingly it provides an alternative source of analytic leverage. Such leverage can serve to improve not only qualitative research, but also quantitative research.

 

Similarly, with regard to descriptive inference, we have argued that reasoning about measurement found in psychometrics and mathematical measurement theory points to concerns to which qualitative researchers are routinely more attentivesuch as the foundational role of paired comparisons in the logic of measurement, as well as concern with issues of domain and context. The present volume has sought to show how these qualitative and statistical traditions can help lay a stronger methodological foundation for progress in the social sciences.

 

Running through this discussion have been the themes of diverse tools and shared standards. From one perspective, these ideas might seem contradictory: a strong set of shared standards might rule out all but a single, best package of tools. We are convinced that this contradiction does not arise in the social sciences for a simple reason. In light of the current state of methodological knowledge, scholars face many trade-offs in pursuing good descriptive and causal inference. Given these trade-offs, there is no such thing as a universally best set of tools. Rather, the existence of trade-offs requires a sustained recognition that diverse analytic tools are needed in social research.

 

BALANCING METHODOLOGICAL PRIORITIES: TECHNIFICATION AND THE QUEST FOR SHARED STANDARDS

 

In concluding this volume, we would like to reflect on the overall mix of concerns and priorities that are most productive in advancing both methodology and substantive research. We find ourselves in a period when increasingly technical approaches to methodology and theory have growing influence in the social sciences. Whether they involve new procedures for statistical estimation or new tools for deductive inference, these innovations unquestionably help us to understand political and social reality.

 

Yet this trend toward technification can impose substantial costs. It can lead to replacing a simple and appropriate tool with an unnecessarily complex one. It can sometimes distance analysts from the detailed knowledge of cases and contexts that is an invaluable underpinning for any inference, whether derived through complex research procedures or simpler tools. Technification can also devolve into a form of intellectual obscurantism in which research ceases to be driven by important substantive questions and interesting intellectual agendas.

 

In some circumstances a sophisticated, technical solution is indeed more powerful. However, at other times it is better to adopt an alternative solution based on simpler tools. As qualitative methodologists routinely emphasize, these simpler tools can place scholars in closer contact with the cases being studied, sometimes enabling analysts to discover unanticipated causal patterns. Further, when highly technical tools are employed, they cannot be a substitute either for careful thinking about the process that produced the data, or for crafting goodand often elegantly simpleresearch designs that allow one to rule out alternative explanations. This careful thinking often relies on simple forms of data analysisemploying, perhaps, a scatterplot, or a two-by-two tableand on crafting a parsimonious model that undergirds the research design.

 

Scholars should recognize that simpler analytic tools can sometimes contribute more to achieving the shared standards of valid descriptive and causal inference and refining theory. We believe that the greatest promise for progress in social science lies in an eclectic view of methodology that recognizes the potential contributions of diverse tools to meeting these shared standards.

 

------------------------------------

1. Snyder (1984/85: 9192), in contrast to KKV (79), explicitly makes the elimination of rival explanations one of his criteria for the scientific method.

2. Important problems of causal inference also arise in experiments. External validity is a recurring issue, and obstacles to internal validity can arise as well. Nonetheless, problems of causal inference are far more severe in observational studies.

3. A second legacy of Campbell’s work has been the emergence, under the broad heading of quasi-experiments, of a renewed emphasis on ‘‘natural experiments,’’ in which the mechanisms through which cases receive a value on the main explanatory variable are demonstrably unrelated to the error term. Hence, some of Campbell’s threats to validity (see below) are at least partially averted. Unfortunately, it is often hard to find research contexts in which this criterion is met.

4. Rubin (1980) developed the ‘‘stable-unit-treatment-value assumption’’ (SUTVA) as a formalization of one situation in which observational studies can be analyzed as if they were experiments. This initial move in the direction of discussing causal inference in observational studies is perhaps especially valuable as a statement of the difficulties involved in such inference.

5. In this sentence, we refer to quantitative/large-N versus qualitative/small-N to accommodate the combined usage in the following quotation from KKV. For further discussion of these distinctions, see 17880 in this chapter.

6. Important examples include Liu (1960), Leamer (1983), Dijkstra (1988), Manski (1995), McKim and Turner (1997), and Berk (2004). See also Lucas (1976); Cox (1977); Copas and Li (1997); Lang, Rothman, and Cann (1998); and Scharfstein, Rotnitzky, and Robins (1999). Within political science, work that reflects this broader statistical perspective includes Achen (1986, 2000, 2002), Bartels (1991), and Wallerstein (2000). Within sociology, relevant examples are Lieberson (1985), Goldthorpe (2001), and Nı́ Bhrolcháin (2001).

7. Campbell and Stanley (1963); Campbell and Ross (1968); Cook and Campbell (1979).

8. The authors of KKV (102 n. 13) state that they adopt a ‘‘philosophical Bayesian’’ approach; yet Bayesian analysis plays no discernible role in the book’s recommendations.

9. This distinction, of course, involves quite different issues from the contrast between deterministic and probabilistic causation discussed in chapter 8.

10. KKV (122) uses the term ‘‘multicollinearity’’ in discussing this problem. The definition of multicollinearity that KKV offers is, however, stronger than most definitions of the term in statistics (see, e.g., Vogt 1999: 180). Therefore, we have used the term ‘‘perfect multicollinearity’’ in discussing this issue.

11. Perfect multicollinearity is a problem of insufficient data, in the sense that the analyst lacks data that can distinguish between the effects of two (or more) explanatory variables. Adding such data by finding cases in which the explanatory variables are not perfectly correlated would, of course, eliminate the perfect multicollinearity.

12. Unidentifiability also involves other important issues that KKV does not discuss. In structural equation modeling, problems of unidentifiability arise in several different situations. This problem arises if all the variables are endogenous because they appear as both independent and dependent variables within the same system of equations. In this case everything affects everything else, and there is no way of finding a ‘‘prime mover’’ to pin down causal relationships. It also arises if, for a particular endogenous variable of interest, there is no exogenous (i.e., truly independent) variable that affects only the endogenous variable directly (and there is no other identifying information). In this case the researcher has no way to isolate the endogenous variable’s impact on the other endogenous variables. These aspects of unidentifiability are key challenges in using statistical tools to address endogeneity and selection bias (Achen 1986: 3839; Greene 2000: 66376).

13. At a later point, KKV (150) does soften this statement by discussing ways in which a determinate research design can produce invalid inferences.

14. See, for example, Stone’s (1985: 689) discussion of interpretability as a central characteristic of statistical models.

15. Thus, if a researcher who is running bivariate regressions successively regresses a purely random dependent variable on each of one hundred purely random independent variables, on average five of the resulting bivariate relationships will be statistically significant at the .05 level. This is true by the definition of significance tests.

16. More precisely, as noted in chapter 2, this standard in fact involves mean independence of assignment and outcome, and the standard of conditional independence of concern here is mean conditional independence of assignment and outcome.

17. Regression analysis employs assumptions that some readers may view as similar to the assumption of conditional independence, in that these assumptions stress the importance of control variables in causal inference. At a general level, this understanding is probably adequate; however, it is important to remember that analytic techniques (e.g., stratification versus regression) differ, sometimes substantially, in the details of the assumptions they depend on.

18. See, e.g., Greene (2000: 21920); Kennedy (1998: chaps. 3 and 5); Mirer (1995); Darnell (1994: 36973); Gujarati (1988: 5760, 166, 17882); and Wonnacott and Wonnacott (1979: 41319). Treatments by political scientists include Achen (1982: chap. 5; 1986: 12, 27); and Hanushek and Jackson (1977: 7986). For a highly accessible statement, see Vogt (1999: 27172). Stone (1993) discusses the relationships among the specification assumption (which he calls ‘‘no confounding’’), conditional independence, and mean conditional independence (which he calls ‘‘no mean effect’’).

19. The specification assumption as defined here is sometimes confused with the much weaker assumption that the expectation of the residuals in a regression analysis is zero, conditional on the included variables. This second assumption, which is not the specification assumption, focuses on whether the included right-hand side variables successfully capture all predictive information that these variables provide about the dependent variable. For example, the heights of sisters can provide an excellent prediction of their brothers’ heights even though the correlation is causally spurious. Because no causal connection is implied by this assumption, researchers can always meet this standard without introducing additional right-hand side variables (although they may have to add nonlinear transformations of the included variables).

By contrast, the specification assumption means that there is no statistical relationship between the included independent variables and any excluded variables that causally affect the dependent variables. Often, meeting this assumption would require analysts to include more independent variables. Thus, in a regression equation that predicts brothers’ heights from sisters’ heights, the specification assumption fails because there is a correlation between the sisters’ heightsthe included independent variableand the parents’ heights, excluded variables that causally affect brothers’ heights. Only by including these missing variables can the researcher meet the specification assumption.

20. Obviously, the unit of analysis, as well as the number of cases being studied, may change in the course of research.

21. We intend the present usage of ‘‘statistical tests’’ somewhat broadly, including techniques of parameter estimation as well as tools of statistical inference.

22. This distinction draws on Coppedge’s (1999) discussion of thick versus thin concepts. Neither our distinction nor that of Coppedge should be confused with Geertz’s (1973) distinction between ‘‘thick description,’’ which focuses on the meaning of human behavior to the actors involved, as opposed to ‘‘thin description,’’ which is not concerned with this meaning. With the expression ‘‘thick analysis,’’ we mean research that focuses closely on the details of cases. These details may or may not encompass subjective meaning. In this sense, Geertz’s thick description, and also constructionism, is a specific type of what we call thick analysis.

23. This should not be taken to imply that researchers pursuing the goal of thick description must always use tools of thick analysis. For example, survey researchers may seek to gain insights into the subjective meaning of respondents’ behavior, at the same time that they may have a selective and in some ways superficial overall level of knowledge about each respondent.

24. This term is adapted from Coppedge’s (2001) ‘‘nested induction’’ and from Lieberman’s (2003a) ‘‘nested analysis.’’

25. KKV (5253, 11718, 21718) makes a parallel distinction between case and observation. While the book mainly uses observation in the sense of data-set observation, see also KKV (57), which refers to observation as a score.

26. The term ‘‘data point’’ is also sometimes used informally to mean the score for a given variable on a given case (Vogt 1999: 71). However, for any scholar who has worked with scatterplots, the meaning given in the text above more directly conveys the intuitive idea of a data point in a scatterplot.

27. Knowledge about the place of a causal-process observation within a larger data set can certainly influence how a scholar interprets this observation. Yet that is a different matter from relying on covariation within the data set to make causal inferences. And of course, causal-process observations are routinely studied in conjunction with an analysis of data-set observations based on such covariation.

28. Among many authors, see Elster 1999: chap. 1; McAdam, Tarrow, and Tilly 2001: chaps. 13; Tilly 2001.

29. Goldthorpe (2001: 89) cites various authors who have embraced this perspective, including Hill (1991 [1937]); Simon and Iwasaki (1988); Freedman (1991, 1992a, b); Cox (1992); and Cox and Wermuth (1996). See also Rosenbaum (1984).

30. In this and the following block quotation, the word ‘‘statistical’’ has been replaced (in brackets) by the word ‘‘quantitative.’’ The goal is to make clear the extent to which Goldthorpe’s argument converges with the argument of the present volume. Specifically, Goldthorpe is using ideas from statistical theory to argue that findings from the branch of applied statistics that we are calling mainstream quantitative methods analysis must be supplemented by qualitative insights.

31. Goldthorpe goes on to point out that these efforts to establish causation ‘‘can never be taken as definitive’’ and must always be open to further empirical testing. ‘‘[F]iner-grained accounts, at some yet deeper level, will in principle always be possible’’ (2001: 9). Note that the quotation in the text above is in part Goldthorpe’s summary of arguments made by these statisticians, but Goldthorpe clearly intends this as a statement of his own position.

32. See also Bennett and George (1997a); Wallerstein (2001); and APSA-CP (2003).

33. King, Keohane, and Verba (12122 this volume) conclude their chapter by endorsing a related concept of triangulation.

34. For other examples in which the contributions of these two kinds of observations are juxtaposed, see Tarrow’s chapter above (especially 10510 this volume). Of course, many case-study researchers carry out extended analyses based on causalprocess observations without relying in any substantial way on data-set observations.

35. It is important to note that degrees of freedom, and also inferential leverage in general, are not properties of the data, but rather of the researcher’s model in relation to the data. Adding a variable to an analysis decreases the degrees of freedom if the rest of the model is not changed. Yet, it could, for example, increase the degrees of freedom if it leads to a reconceptualization of the model as a sequence of causal steps in which the number of parameters estimated is smaller at each step.

36. However, the degrees of freedom could, once again, change if these causalprocess observations lead the researcher to modify the statistical model being tested.

37. Except under the conditions specified in note 36 above.

38. See again the cautionary observation in chapter 1 above (21 n. 3 this volume).

39. See Achen (2000, 2002) and also Diaconis (1998). For a broader statement on these tensions in the discipline of political science, see Keohane 2003.

 

반응형

관련글 더보기