How does city density affect the proportion of workers who commute by public transportation or walking?

Matt Yglesias links to a chart that purports to show a strong association between density and public transportation use. I don't think the chart is evidence of anything, though. I can't tell how the cities were chosen (Memphis and Milwaukee are in, Houston and Miami are out), so the cities may have been cherry-picked. Also, the authors define cities by their political boundaries rather than their urbanized areas or metropolitan areas, making density comparisons meaningless.

But this piqued my curiosity. I decided to run an analysis using the 32 largest urbanized areas in the U.S. (those with more than 1.3 million residents). I regressed the percentage of workers commuting by public transportation or walking on standard density and on weighted density. For good measure, I also threw in a regression on the ratio of standard density to weighted density.

Conclusions:

- There is virtually no association between standard density and the percentage of workers commuting by public transportation or walking.
- There is a robust association between weighted density and the percentage of workers commuting by public transportation or walking, although the association is weaker when New York is omitted.
- There is an even stronger association between the ratio of weighted density to standard density and the percentage of workers commuting by public transportation or walking. Again, the association is weaker when we omit New York.

I used 2000 Census figures for the urbanized area densities. I used the Census Bureau's 2005 American Community Survey (which is a sample) for the data on mode of commuting. The survey asked workers to identify their "principal" method of commuting.

Standard density

The scatterplot below tells the story. The association between standard density and the percentage of workers who commute by public transportation or walking is very weak (but statistically significant). R^{2} = .14, which means an urban area's standard density explains only 14% of the variation in the percentage of workers commuting by public transportation or walking. I've shown the trend line (generated by Excel), but its slope (.002) is essentially arbitrary: The 95% confidence interval is (.00014, .0039), which again illustrates there is no meaningful association between the two.

Weighted density

Weighted density is the density at which the average person lives. (See this string of posts for a detailed explanation.) Technically, it is the average density of the urbanized area's census tracts, with each tract weighted by its percentage of total population.

Unlike standard density, there is a strong (and statistically significant) association between weighted density and the proportion of workers commuting by public transportation or walking. R^{2} is 0.73, which means that weighted density explains 73% of the variation in the proportion of commutes by public transportation/walking. The slope of the regression line is 0.0011 (95% CI = (.00084, 0.0013)), which means that an increase in weighted density of 1,000 ppsm is associated with an increase of 1.1% in the percentage of workers commuting by public transportation or walking.

I should note that New York, which is an outlier in population, weighted density, and in the proportion of commutes by public transportation, has a disproportionate influence on this regression. When New York is omitted, the R^{2} drops to 0.42. This is a much less robust association (although still statistically significant).

Ratio of weighted density to standard density

The ratio of weighted density to standard density is an interesting statistic. Standard density treats the population as uniformly spread throughout the urbanized area. Weighted density recognizes that population is "clumpy." The ratio of the two gives an indication of the degree of clumpiness. A high ratio suggests the city has a dense core and sparse suburbs. A low ratio suggests that the city's population is uniformly distributed. For example, Miami has a relatively high standard density and a relatively high weighted density, but the ratio of the two is only 1.55, which means the population is more or less uniformly dense, with few very dense concentrations. Boston has a low standard density and a high weighted density; its ratio is 3.32, which indicates that much of its population is concentrated in high-density areas.

We should expect this ratio to be a good indicator of mass transit use. Older, northeastern cities developed dense cores, which were well-served by mass transit before the automobile era. Newer cities, with a more uniformly distributed population, were built around the automobile.

And, indeed, this is the case. R^{2} = 0.77. The slope of the regression line is 6.5, which means that a 1-point increase in the ratio is associated with a 6.5% increase in the percentage of workers commuting by public transportation or walking. The 95% confidence interval is (5.2, 7.9). The association, again, is highly statistically significant.

New York's effect on the association, again, is quite significant. When New York is omitted, the R^{2} drops to 0.45.

Correlation versus causation

Correlation does not imply causation. These associations are interesting (at least to me), but I suspect they are being driven by the underlying city form -- older cities with dense, urban cores versus younger, auto-centric cities.

Perhaps the best use of the data is to spot outliers. Portland, for example, is not particularly dense under either the standard or weighted metric. But it has a relatively high proportion of mass-transit users. D.C. is likewise an outlier. This suggests that investments in rail systems increase mass transit use relative to cities of similar densities, although this is hardly conclusive (and LA is a possible counter-example).

I have posted an Excel table with the weighted densities, standard densities, ratios, and public transportation/walking data below the jump.

I have uploaded the underlying data in Excel format to Google Docs.

All charts and tables are licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.