Predicting and Dissecting the Seats-Votes Curve in the 2006 U.S. House Election (with Andrew Gelman and Jamie Chandler). 2008. PS: Political Science & Politics. 41(1):139-145.
Abstract: The Democrats' victory in the 2006 election has been compared to the Republicans' in 2004. But the Democrats actually did a lot better in terms of the vote. The Democrats received 54.8% of the average district vote for the two parties in 2006, whereas the Republicans only averaged 51.6% in 1994. The 2006 outcome for the Democrats is comparable to their typical vote shares as the majority party in the decades preceding the 1994 realignment. Nevertheless, the size of the Democrats' victory in the 2006 House elections has obscured the sizable structural disadvantages they faced heading into the elections. In this paper we document the advantages the Republicans had, examine how and to what extent the Democrats overcame it, and offer predictions as to whether the results of the 2006 election leveled the electoral playing field for 2008. Our calculations showed that the Democrats needed at least 52% of the vote to have an even chance of taking control of the House of Representatives.
More formally, prior to the election we estimated the seats-votes curve for 2006 by constructing a model to predict the 2006 election from 2004, and then validating the method by applying it to previous elections (predicting 2004 from 2002, and so forth). We found that the Democrats in 2006 were always destined to receive fewer seats than their corresponding average vote share. They were able to gain control of the House by winning the largest average district vote by either party since 1990. Has the 2006 election removed the Republicans' structural advantages? While Republicans continue to win more close races, a preliminary analysis of the 2008 election suggests that the switch in incumbency advantage from the Republicans to the Democrats may nevertheless level the electoral playing field.
Click here to download a pdf copy of the paper.
A pre-election version of the paper is available here.
Replication Information
Introduction
With
the data and code described below, researchers can replicate our
results and use the data for further study.
Note that all the files referenced below, including
csv versions of the datasets, can be found in this
zip
file.
Datasets
We used three datasets in the paper: a
district-level dataset containing information on every election in each House
election from 1946 to 2004; an aggregate-level dataset containing information
on the total number of votes and seats gained by each party in the same
elections; and a dataset containing information on each district that we used
to make predictions for the 2006 election.
a)
Individual
House Races Data, 1946-2004
This dataset, which was given to us by Gary Jacobson, contains various information on every House race from 1946-2004, such as the vote share of the Democratic candidate and incumbency status; complete coding information is available here. We modified and recoded this data using this Stata do-file. Coding information for the updated dataset, which we use for the analysis that appears in the paper, is available here.
b)
Aggregate House
Data, 1946-2004
This
dataset,
which was compliled based on data available from the
Clerk of the
House, contains aggregate information (in terms of seats and votes) for
every House election from 1946-2004.
Coding information is available
here.
c)
Individual
House Race Data for Predicting 2006
This dataset contains information about the 2006 election, including incumbency status lagged vote leading up to the election, along with information about the winner and vote margins in the 2006 election. Data on 2004 vote shares and incumbency status was based on Jacobson’s data. Data on incumbency status and retirements was taken from various news sources in the months leading up to the election (see paper for references). And the 2006 election results were supplied to us Walt Borges, who gathered the official certified results of every state, which we then confirmed independently. Coding information is available here.
Statistical
Code
All statistical analysis that appears in the paper
was conducted using R. Complete, annotated code is available
here.