Political polarization of the U.S. Senate: Is the data fooling us?

Earlier this month, The Economist, Yahoo! News, and several other respectable news outlets ran articles talking about some great network visualizations apparently showing the "political polarization of the U.S. Senate." There, they argued that these visualizations show how the Senate has evolved from a fairly cohesive unit in 1989 into a dysfunctional group divided along party lines in 2013. If you look at the visualizations of voting behavior in the U.S. Senate below, you'd probably agree with their conclusions.

The supposed "political polarization" of the U.S. Senate
Visualization c/o Renzo Lucioni

I was a little skeptical. There were a couple issues with how the visualizations were presented:

The divide over time wasn't as clear as the news articles made it appear, which made me suspect that they were cherry picking.
All connections between Senators who voted on less than 100 bills together were arbitrarily removed, which could have produced the sudden "polarization" effect we were seeing.

Renzo was kind enough to point me to his script that collected all the data for these visualizations, so it was easy enough for me to collect the data myself and run my own analyses. I've provided the data online free to download on figshare.

Were the news outlets cherry picking to prove their point?

To address the first concern, I measured the modularity of the networks over time. This modularity score basically gives us a measure of how much the Senators are divided up into disconnected groups, or political parties in this case. In the graph below, I called this measure "divisiveness."

Divisiveness of the U.S. Senate, quantified
The x-axis is time, and the y-axis is the modularity score

It's quite amusing to see that the first two network visualizations that The Economist showed -- that were meant to show the Senate as a fairly united unit -- were at the two points of lowest divisiveness in the entire time period (1989 and 2002). Similarly, the final network visualization that they showed -- that was meant to show a divided Senate -- was at a point when divisiveness was second highest (2013).

Coincidence? For the choice of the 1989 and 2002 visualizations, I don't think so.

It's fairly clear that the news outlets were cherry picking the visualizations to prove their point.

Does the data really show a divided Senate?

My second concern was that, by removing all connections between Senators who voted on fewer than 100 bills together, Renzo could have produced the sudden appearance of "divisiveness" in the Senate when the Senate has always been divided. If you look at the "divisiveness, quantified" graph above, the Senate has always been fairly divided since the early 1990's. If that's the case, why do Renzo's visualizations look fairly cohesive early on, but suddenly divided around 2013?

This interactive visualization on a Yahoo! News article illustrates my concern best: If you cut many connections, the Senators form into clusters based on political affiliation. If you don't cut any connections, then the divide is much less clear.

Cutting many connections produces a divided Senate in 2013
Screen shot taken from this visualization

Cutting fewer connections produces a much less divided Senate in 2013
Screen shot taken from this visualization

Moving the cutoff threshold around easily changes whether it looks like the Senate is divided or not. In fact, if you play around with that interactive visualization enough, you'll see that you can eventually produce a politically divided Senate for every year by removing enough connections. Or, if you want, you can produce a politically united Senate by leaving more connections.

So what does this tell us? Politicians have always been more likely to vote along party lines, even in 1989. Perhaps they've been slightly more inclined to do so this year than in 1989, but it's not as extreme as Renzo's visualizations suggest. The political parties only appear fairly cohesive in 1989 and completely divided in 2013 because of an arbitrary cutoff, and we could tell a completely different story if we chose a different arbitrary cutoff.

Always be skeptical of data visualizations

The point of this post wasn't to rail against Renzo nor any of the news outlets that hyped his network visualizations. Instead, I hope you found this to be a cautionary tale to always question how data visualizations are made. Data visualizations can often be manipulated to demonstrate any point the author pleases, and it's easy to accept what the visualization claims when it agrees with our intuitions.

Mark Twain once famously wrote:

"There are three kinds of lies: lies, damned lies, and statistics."

I'd like to expand on his quote by adding one more kind of lie:

"There are four kinds of lies: lies, damned lies, statistics, and data visualizations."