Week 7: Scale & Causal Relations

Given the break from abstraction provided by week 6’s review of exemplars, we dove back into abstraction for the final two weeks of the Theory section of the course.  For week seven we first covered the concept of scale (usually called “levels” in political science), and then finally discussed casual relationships.  A strong case can be made for discussing both of these topics prior to, or simultaneously with, conceptualization.  Such is the Gordian Knot problem.

Scale: Micro, Meso, Structural Levels of Conceptual Aggregation

The first task for seminar in week seven was to have them pull up the video represented in the screenshot below.   It’s old, and thus not visually cool, but it is a very cool illustration the point that “reality,” and the concepts we use to describe it and explain causal relations at, and across, that scale changes depending on the scale at which we choose to work.  Note the word choose.  It implies that there is no given scale.  Indeed, we must select a scale at which to conceptualize, and thus we ought to declare the scale at which our conceptualization occurs.  Our present practice is to leave the scale implicit, and this weakens our theorizing rather considerably.


Contemporary practice in our field involves the usage of vague terms such as “level of analysis,””unit of analysis,” and similar phrases.  First of all, I have no idea what “analysis” is.  Conceptual?  Theoretical?  Empirical?  Any/all of the above?  Second, while some political scientists distinguish “level” as conceptual and “unit” as empirical, many treat them as    I use the term “unit of observation” for empirical (measurement) work, and we’ll get to that in the fourth section of the course.  But we are still talking about theory here–the empirical will wait.  This week we are trying to get them to abandon “level of analysis” for “conceptual scale.”

We have stressed that conceptualization should occur over two dimensions in the social sciences: space and time.  Both space and time have well developed conceptual scales, but we do not take advantage of them in our conceptual and theoretical work.  As such, we are leaving some very valuable tools in the shed.



Metric Spatial Units

Now, just because we can choose to use continuous conceptualizations of dimensions in space and time does not mean that we must adopt such precision.  Indeed, it is far from obvious to me in the social sciences why anyone might want to conceptualize at a milli- or micro-meter spatial scale.  We can probably do quite well with three to five or six ordinal distinctions, one of which is likely to be what we might call the “human” or individual scale (deca-meter in the figure above).  A meso (group scale) is higher up, and we may want to define that spatially over villages, counties, states, or other spatial units defined by political or other socially defined boundaries.  To be sure, thinking of space as distance conceived over the metric scale would be an unhelpfully limiting convention to adopt.  I thus raise it not as a proposal for the way to proceed, but instead as an illustration of one way we can proceed and emulate.


Temporal Units

For time, on the other hand, we can likely work with the established units.  In addition to those listed in the figure above we have fortnights, quarters, decades, scores, centuries, millenia, and so on.  However, we might also think about time as defined by sequential interactions (i.e., abandon fixed units of time, and define the temporal units over socially constructed interactions such as “moves” or “turns” as I do in my 1998 AJPS article).  The well defined conventional units for conceptualizing space and time, then, are options to emulate.

This is a hobby horse of mine: every article and book should declare the spatial and temporal scale at which each concept is conceived.  Doing so is a joint conceptual and theoretical task.  Our failure to do so weakens our (individual and collective) theorizing, and leads to dumb mistakes that follow from poor habits.  I return to this below, but let me now turn to the reading, which focuses on how to construct theories that work across multiple scales (and make a case for the value of doing so).

I assigned chap one from Schelling’s (1978) Micromotives and Macrobehavior, pp. 1-23 of James Coleman’s (1990) Foundations of Social Theory, Randall Collins (1988) article “On the Micro Contribution to Macro Sociology,” and pp. 1-3 of Achen & Shively’s (1995) Cross-Level Inference.  I have them read three pages written by a political scientist, an article by a Nobel prize winning economist, and two papers by sociologists.  Why?  Outside of Achen & Shively I am unfamiliar with similarly strong presentations of theorizing across scales by political scientists.

Schelling explains that his book is about

a kind of analysis that is characteristic of a large part of the social sciences, especially the more theoretical part.

That kind of analysis explores the relation between the behavior characteristics of the individuals who comprise some social aggregate, and the characteristics of the aggregate.  This analysis sometimes uses what is known about individual intentions to predict the aggregates…

There are easy cases, of course, where the aggregate is merely an extrapolation from the individual… (p. 13).

Put another way, some causal relationships operate at multiple scales, or perhaps are even “scale free.”[2]   But interesting cross-scale theorizing tells a causal story about why concepts and assumptions at various scales combine to logically imply observable phenomena at a given scale.

Of greater interest are models where we assume people

are responding to each other’s behavior and influencing each other’s behavior.  People are responding to an environment that consists of other people responding to their environment, which consists of people responding to an environment of people’s responses…

These situations, in which people’s behavior or people’s choices depend on the behavior or the choices of other people, are the ones that usually don’t permit any simple extrapolation to the aggregates.  To make that connection we usually have to look at the system of interaction between individuals and their environment, that is, between individuals or between individuals and the collectivity (pp. 13-4).

Such stories are the stock in trade of works such as Dahly (1958), Gurr (1970), Tilly (1978) and the della Porta (1995) book I assigned in week six.  Note that those are not formal game theoretic works.  The Keiweit (1991) and Cox & McCubbins (2005) books are constructed via game theory, but the Roy (1950) article, Ostrom (1991) book, and Liu (2011) article fit the Schelling description, but are not formal.

If you are familiar with the drum I have been beating in my work (from 1991 through today) that the literatures in which I work will improve should they shift attention from thinking about the impact of structures on outcomes (e.g., deaths over a given threshold) to studying the impact of structures on the behavioral choices of dissidents and states, you can appreciate why I am such a fan of Schelling.

Of course, the students in the course are in their first semester, so they struggle to make efficient use of such abstract material.  To address this I illustrated with some thought experiments and discussion.  I began by claiming that most political scientists work at what we might call the human spatial scale.[1]  We tend to refer to that as the individual scale (by which we mean a single human being).  At the meso (or group) scale we think about collections of people.  And at the structural (or macro) scale we think about norms, institutions, and aggregate outcomes (electoral outcomes, economic output, people killed, bills passed, etc.).  These three “levels of analysis” are often distinguished, but we have an incomprehensible tendency to treat them as mutually exclusive “levels” at which to theorize.  Noooooooooooooooooooooo!


Mr. Bill (SNL).

OK, to get conversation started I asked them to name concepts that might influence the likelihood that a sitting judge on a court in the US might retire rather than stand for re-election, and when naming that concept, tell us the scale at which they were conceiving that concept.  And I observed that we were conceiving of the decision (retire, run) at the individual scale.   Somebody said: age, individual.  Another offered: partisan composition of the electoral district, structure.  A third suggested: family (e.g., children whose lives the judge was missing out on), group.  Excellent: from the perspective of the individual whose choice we are modeling her age is an individual scale concept, her family is a group scale concept, and the partisan composition of her district is a structural scale concept.

I then asked them to think about the reading from Gurr (1970) the week prior and asked what scale he was working at for the object of explanation.  Somebody said: civil strife.  That’s a macro scale concept.  Bingo.  And I noted that though we would not be discussing measurement for several weeks, that Gurr’s initial empirical work collected data at the “country–semi-decade” spatial–temporal unit of observation (it turned out several of them had read that in a Comparative core).

I then asked them to identify the scale at which he begins his conceptualization and assumptions.  Somebody ventured: the individual, psychological scale.  Great, Gurr assumes that aggression is an innately satisfying response to frustration among homo sapiens.

Then I asked: so how the hell does Gurr start with an assumption about innately satisfying responses to produce hypotheses about the amount of civil strife we’ll observe in Sri Lanka during the late 1970s v Italy during the early 1960s (and so on)?  And something seemed to click for several students.  I then asked: what about national income?  Does Gurr make a case for why national income might impact civil strife via the innately satisfying response of aggression to frustration?

Someone suggested that the distribution of income well might.  I then reminded them of the Roy model and talked about the 1% v 99% slogan of the Occupy Movement.  I then asked them to imagine that Gurr had explicitly assumed that peoples’ responses to information about an increase in the proportion of income/wealth accruing to the 1% was normally distributed over a dimension ranging from no change in their perceived gap between expected and realized income for the 99% to a huge spike perceived gap between expected and realized income for the 99%.

How might Gurr then deduce an implication about an increase in the concentration of income/wealth upon the level of Relative Deprivation in a country?  When nobody spoke I prompted: how about the mean response?  Given the assumption peoples’ responses being normally distributed, might the mean be a good representation of the most common response?  They agreed it was, and agreed that the mean given an increase in concentration was an increase in RD, and that an increase in RD would, ceteris paribus, raise the expected level of civil strife.  We had thus clarified Gurr’s theory and made more explicit the cross-scale theorizing in his book.

I closed that discussion by observing that assumptions about probability distributions is a generically available tool for theorizing across scales in the social sciences.  It is not the only tool, but it is one to consider.

I returned to the judge deciding between retirement and running for re-election and asked whether we had done any of the cross-level theorizing that Schelling, Coleman, Collins and Achen & Shively had discussed.  Were the concepts we identified at the three scales operating with one another, or in isolation?  I then explained how what I like to call our “OLS regression hangover” helps explain so much of the ad hoc theorizing that was hegemonic during the 1980s through the early 2000s.  The linear representation permits us to “throw another shrimp on the barbie,” as it were: we add another X to the equation by stating (or citing) some verbal account of why the object of explanation (Y) should co-vary with X.  While Blalock invested most of his career , it has taken Judea Pearl’s recharacterization of the problem via cyclic graphs (and overclaims about casual “identification”) to get us to pay attention to what Blalock and macroeconomists were discussing during the late 1950s, 1960s and 1970s.  But I digress…[3]

The relevant problem is our poor habits thinking explicitly about scale and conceptualization, and thus cross-scale theorizing.  Using levels rather than scale, Achen & Shiveley write

Theoretical disjunction poses two kinds of problems for social analysis.  One problem is that of theoretical consistency.  Hypotheses valid at the microlevel often have readily apparent and intuitively plausible macrolevel analogues, and yet the macropropositions may be shown to be incoherent nonsense (Green 1964).  Avoiding false analogies across level is obviously prerequisite…  Thus theoretical consistency in aggregation is of fundamental important to social sciences (pp. 2-3).

If you are familiar with one of the conventional critiques of Gurr’s Why Men Rebel, this passage may strike you.  I underscored this passage to my students, and then told them how happy reading it makes me.  Despite the fact that Gurr (1970) does an excellent job of starting with individual level assumptions and deducing macro-level hypotheses, the book is routinely criticized for doing precisely that.  People frequently critique it for not testing hypotheses at the individual level.   Sure, Gurr could have developed individual level implications and put forward such hypotheses.  But that was not the project he pursued.  He was interested in explaining cross-national differences in civil strife.  To do so he constructed a theory with psychological microfoundations and made a cross-level theoretical case (which, yes, could have been strengthened, as I note above).  I have never understood why people find that confusing.

Achen & Shiveley continue.

Most of the literature on aggregation bias has been concerned, however, not with theoretical consistency but with statistical issues – “cross-level inference” or “aggregation bias” or “ecological inference.”  Here the concern is typically with using macrolevel data to infer microlevel relationships.

A common boneheaded error that occurs in our discipline is an inaccurate charge of “ecological fallacy” for a study that uses macro-level measures to study macro-level hypothesized relationships that were produced from cross-scale theorizing where the author “explicitly derive[d] the macrolevel models from microlevel counterparts… demonstrat[ing] theoretical consistency across levels” (Achen & Shiveley, p. 3).  I have repeatedly experienced this fallacious charge, sometimes when presenting my work, and many more times as an audience member at a presentation.  Indeed, were one to swing a dead cat in the lobby of The Palmer House during an APSA or MPSA meeting, she would strike at least one person who has made, and is likely to again make, this error.  We need to read and internalize Schelling (1978), etc. and stop the madness.

The only explanations I can offer are the poverty of our engagement of scale, conceptualization more generally, and the fact that we tend to teach students about the “ecological inference problem” not from the context of a Schelling or Achen & Shiveley’s discussion of theoretical consistency, but from the context of ecological regression building off of Robinson’s classic 1950 article.

Finally, the discussion about scale permitted me to walk them through an issue recently debated in the discipline: the recent work on genetics and partisan political attitudes (e.g., Alford, Funk & Hibbing 2005).  The object of explanation–attitudes–are manifest at the human scale, but the explanatory concept–a genetic allele, if I follow properly–is conceived at a much finer scale (I have no idea how that would be well described).  Given the discussion to date, I asked the students, how might we well describe that work, and should the early efforts be published in influential general journals like APSR?

They had not read the article, so I suggested the work makes no effort to construct a cross-scale argument about why certain allele structures are probabilistically associated with particular partisan  attitudes, but merely demonstrate that they are, using a design (twin studies) that gives us considerably greater confidence that the relationship would replicate than we would have had they found the relationship in a random sample containing measures of peoples’ alleles and their partisan attitudes.  I reminded them that according to this course that gives the finding a credible claim to being considered a stylized fact.  As Clarke & Primo remind us, there is a model underpinning that relationship’s claim to what we call a stylized fact, and importantly, it is both novel and unexpected given existing theory.  A novel stylized fact that is unexpected definitely warrants a claim to the pages of a widely read journal as long as there are people who care about the object of explanation in that stylized fact.  A non-trivial portion of political scientists do care about partisan attitudes.  On those grounds, I argued, the work belonged in a journal like APSR.  Critiques about the known likelihood of finding spurious relationships in large datasets are accurate, but to judge the value of such a study on that criterion alone makes no sense from the perspective of the course we are offering.  The becomes even more clear in week eight where we discuss how to assess theory.

Types of Causal Relations: Deterministic, Probabilistic

In addition to scale, in week seven we also discuss causal relations.  I assign the “Causal Analysis” chapter of Daniel Little’s (1991)  Varieties of Social Explanation (pp. 13-38) and a brief primer I wrote back in 2006: “Necessary, Sufficient, and Probabilistic Causal Claims.”  The key here is to help them understand the difference across these types of causal claims: deterministic (whether bivariate or conjunctural via Boolean logic) and probabilistic.  They are simply two distinct types of causal claims, and neither should be privileged ex ante.

I recognize that my views on this are unorthodox, and explain that to them.  I note that the seminar focuses on probabilistic causal claims, but encourage them to explore necessary and sufficient deterministic claims if they are drawn to them.  I recommend Charles Ragin’s books (The Comparative Method and Fuzzy Set Social Science), but tell them that–jarringly–Ragin, who understands probabilistic causal claims and statistical inference, fails to recognize this crucial distinction, as does all the work on the so-called “comparative method.”  I explain that Ragin’s empirical tools are super valuable for exploring necessary and/or sufficient condition deterministic relationships, or the less deterministic, somewhat probabilistic “fuzzy set” variant.  Indeed, I taught those books and software years ago at Florida State, and used them prior when working on my dissertation and again when doing the early empirical work that led to my 1998 AJPS article.

I have neither the energy nor the space to defend these assertions (nor, I suspect, does the reader likely have the patience to read such a defense).  I have been planning to write on the topic since my initial frustrations engaging the “comparative method” literature in considerable depth in the summer of 1987.  My frustration has never abated, but the defense requires elaboration of the entire integrated approach to scientific inquiry that we develop in this course.  Perhaps some readers can piece it together on their own from the disparate parts in these posts.  But I told my students I would be happy to meet with them over a coffee or a beer and discuss it, should they wish.

That said, I did share this with them.  Nec/Suff causal claims (and Mills’s Methods of Difference) cannot generalize to explain an outcome conceptualized as whole numbers, real numbers, integers, or over a continuous space.  This type of causal process can only be used to account for assignment to values of a nominal or ordinal concepts that have a limited number of values (perhaps, 4, 6, 8…).

Why did Mills, and those who have followed his lead, fail to appreciate this large limitation?  First, the study of probability distributions and statistics was not yet well developed when Mills was writing.  That theory provides a denotative vocabulary needed to generalize about research design as well as to understand probabilistic causation with falsifiability (e.g., null hyp testing).  In the absence of such a vocabulary Mills was left to connotation.  Naturally, he began with the simplest situation–binary conceptualization of an outcome)–and developed his Method for that limited set of nominal / ordinal outcomes with a limited number of values.


Update [12:02 pm (MST; 19:02 GMT)]: I added the final two paragraphs of the post, which I had in notes, but had left out of the initial post.

[1] While spatial scale is well defined in the metric measurement scale, I do not anticipate our discipline is likely to make strong usage of the fine grained distinctions that scale permits, and thus do not encourage its adoption at that level.  Much more crude, ordinal scale values that are verbal (yes, I confess, vague) are likely to be serviceable.

[2] Yay, perhaps another physicist will publish a paper on the scale free power law distribution of deaths from war, terror attacks, or whatever in Nature or Science.  

[3] I will return to this issue several times during the research design weeks of the course.

About Will H. Moore

I am a political science professor who also contributes to Political Violence @ a Glance and sometimes to Mobilizing Ideas . Twitter: @WilHMoo
This entry was posted in Theory & Inference Course. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s