In yesterday’s post I identified an implicit conceptual assumption that mars the vast majority of work using events data to study contentious politics, and pointed out that one way forward is the use of latent measurement models. Today I offer a different salvation: a set of assumptions that not only delivers us from the morass implied by abandoning the assumption that burdens contemporary analysis, but also ties very nicely to most any theory of contentious politics that explicitly works with actors making choices.
The Conceptual Move to Information Sets
Given that counting the number of [fill in your favorite event here] is not possible, latent measurement models are one option. However, a reasonably simple, and powerful, conceptual move is also readily available to us. I have been aware of the problem I described yesterday since the late 1980s, and through the late 1990s just lived with it, publishing several studies that pretended the problem did not exist. Why? I could not think of any solutions. Indeed, I have only become aware of latent measurement models in the last 4-5 years. But in the early 2000s what I call the “information set conceptual move” occurred to me, and I put it to use in Poe, Davenport & Moore (2003), Moore & Shellman (2004, 2006, 2007) and several working papers since. The move should have occurred to me during my work on Moore (1995), which builds explicitly upon work by Mike McGinnis and John Williams (1988, 1989) on the US-Soviet superpower rivalry, but did not.
In brief, the move consists of invoking the following assumptions (or some similar set):
1. Actors select levels (or types, etc.) of cooperation and conflict to direct toward a target.
2. The actors select that level (or type, etc.) from a finite set of options.
3. To make the choice in assumption 1 from assumption 2 they rely upon an information set which they use to form beliefs about the (past or expected) behavior of other actors, as well as their own past behavior.
4. That information set can be partitioned into private and public subsets [the former containing information available only to some subset of actors, and the latter available to all actors].
5. The public information set is contained in media reports.
6. The private information available to each actor is normally distributed over any dimension of interest [and can thus be modeled as a portion of the error term in any multiple regression].
The most important of these assumptions, for our purposes, is number 5. Indeed, others may be able to think of ways to invoke little more than number 5 and make the “absence of ground truth” problem disappear. But I have found it useful to invoke the full set of six.
Why is this move powerful? There are two reasons. First, once it is invoked the questions about validity change markedly. If we are trying to assess the validity of a count of event X based on specific sources and a particular coding scheme and our concept is the “ground truth” value of that count, we are stuck in latent (unobservable) land. However, if we are interested in the choices that dissidents and states make about, say, tactics in their competition with one another, then we can turn our attention to conceptualizing the information set upon which they will draw to make those choices. Media reports are a publicly available source of information. We can readily embrace them as containing the information we need to create unbiased, valid measures of the public portions of information sets available to different actors. That’s a big deal!
Second, this move links your conceptualization of events directly to any theoretical framework that has actors making decision based on information sets. I will assume that the reader is familiar with a plethora of such frameworks, from rationalist theories of utility maximization through psychological theories of decision making.
Naturally, no conceptual move, no matter how cool, solves all problems. Assumption 6 is what one would call, ahem, a strong assumption. But I contend that it not as strong an assumption as the one we have implicitly been invoking as we cast about for the combination of sources that will produce “ground truth” event counts. Further, explicit strong assumptions point us rather directly at opportunities for theory development. And just because I have yet to think of helpful ways in which to relax assumption 6 does not mean you cannot.
To summarize, from where I sit we have two very viable options for moving forward as we think about events data and the study of contentious politics. One embraces the challenge of latent (unobservable) data. The other is a theory driven solution that makes the problem irrelevant. Here’s to hoping that lots of folks embrace both options, for we have much to learn.
 I have since learned of two working papers that are directly relevant: Lowe (2013) “Measurement Models for Event Data” and Fariss (2014) “Uncertain Events: A Dynamic Latent Variable Model of Human Rights Respect and Government Killing with Binary, Ordered, and Count Outcomes.”
 It turns out that this is a set of assumptions that I have been making throughout my career (e.g., see ), but have yet to catch on.
 See, also, their 2001 book. One can re-read my pre-2003 work that uses events data as if I had invoked these assumptions. That is, the solution was there for the taking, I just failed to appreciate it. Invoking the assumptions would not change my hypotheses. Doing so would only put those studies on a more solid foundation.
 Naturally, one can add any number of additional assumptions to build one’s specific theory. For example, any given source of reports (e.g., The New York Times or Amnesty International publications) is available to an audience that will consume the information contained therein with some probability. I recommend assuming that the probability of consumption of a randomly selected news item is normally distributed across the potential readership of the source, but you are free to make any assumption you find appropriate.
 You will presumably also want to make an assumption about goals pursued (whether utility maximization, minimizing psychological stress, or what have you), but I leave that to you.