Economists studying spatial connections are excited about a growing body of increasingly fine spatial data. We’re no longer studying country- or city-level aggregates. For example, many folks now leverage satellite data, so that their unit of observation is a pixel, which can be as small as only 30 meters wide. See Donaldson and Storeygaard’s “The View from Above: Applications of Satellite Data in Economics“. Standard administrative data sources like the LEHD publish neighborhood-to-neighborhood commuting matrices. And now “digital exhaust” extracted from the web and smartphones offers a glimpse of behavior not even measured in traditional data sources. Dave Donaldson’s keynote address on “The benefits of new data for measuring the benefits of new transportation infrastructure” at the Urban Economics Association meetings in October highlighted a number of such exciting developments (ship-level port flows, ride-level taxi data, credit-card transactions, etc).
But finer and finer data are not a free lunch. Big datasets bring computational burdens, of course, but more importantly our theoretical tools need to keep up with the data we’re leveraging. Most models of the spatial distribution of economic activity assume that the number of people per place is reasonably large. For example, theoretical results describing space as continuous formally assume a “regular” geography so that every location has positive population. But the US isn’t regular, in that it has plenty of “empty” land: more than 80% of the US population lives on only 3% of its land area. Conventional estimation procedures aren’t necessarily designed for sparse data sets. It’s an open question how well these tools will do when applied to empirical settings that don’t quite satisfy their assumptions.
Felix Tintelnot and I examine one aspect of this challenge in our new paper, “Spatial Economics for Granular Settings“. We look at commuting flows, which are described by a gravity equation in quantitative spatial models. It turns out that the empirical settings we often study are granular: the number of decision-makers is small relative to the number of economic outcomes. For example, there are 4.6 million possible residence-workplace pairings in New York City, but only 2.5 million people who live and work in the city. Applying the law of large numbers may not work well when a model has more parameters than people.
Felix and I introduce a model of a “granular” spatial economy. “Granular” just means that we assume that there are a finite number of individuals rather than an uncountably infinite continuum. This distinction may seem minor, but it turns out that estimated parameters and counterfactual predictions are pretty sensitive to how one handles the granular features of the data. We contrast the conventional approach and granular approach by examining these models’ predictions for changes in commuting flows associated with tract-level employment booms in New York City. When we regress observed changes on predicted changes, our granular model does pretty well (slope about one, intercept about zero). The calibrated-shares approach (trade folks may know this as “exact hat algebra“), which perfectly fits the pre-event data, does not do very well. In more than half of the 78 employment-boom events, its predicted changes are negatively correlated with the observed changes in commuting flows.
The calibrated-shares procedure’s failure to perform well out of sample despite perfectly fitting the in-sample observations may not surprise those who have played around with machine learning. The fundamental concern with applying a continuum model to a granular setting can be illustrated by the finite-sample properties of the multinomial distribution. Suppose that a lottery allocates I independently-and-identically-distributed balls across N urns. An econometrician wants to infer the probability that any ball i is allocated to urn n from observed data. With infinite balls, the observed share of balls in urn n would reveal this probability. In a finite sample, the realized share may differ greatly from the underlying probability. The figure below depicts this ratio for one urn when I balls are distributed across 10 urns uniformly. A procedure that equates observed shares and modeled probabilities needs this ratio to be one. As the histograms reveal, the realized ratio can be far from one even when there are two orders of magnitude more balls than urns. Unfortunately, in many empirical settings in which spatial models are calibrated to match observed shares, the number of balls (commuters) and the number of urns (residence-workplace pairs) are roughly the same. The red histogram suggests that shares and probabilities will often differ substantially in these settings.
Granularity is also a reason for economists to be cautious about their counterfactual exercises. In a granular world, equilibrium outcomes depend in part of the idiosyncratic components of individuals’ choices. Thus, the confidence intervals reported for counterfactual outcomes ought to incorporate uncertainty due to granularity in addition to the usual statistical uncertainty that accompanies estimated parameter values.
See the paper for more details on the theoretical model, estimation procedure, and event-study results. We’re excited about the growing body of fine spatial data used to study economic outcomes for regions, cities, and neighborhoods. Our quantitative model is designed precisely for these applications.