Electoral Security Project

OUTLIER DETECTION

___________________________________________

2004 Election Outcome

We have run our algorithm on the data from 2004. Broadscale multi-county multi-state
statistical anomalies above rates of previous elections addressable by our algorithm were not detected.
On the other hand, open questions remain about manipulations directed at specific counties
in specific states. Two plots for Florida below illustrate these questions.

In the first figure, we plot for each county the change in percentage Republican presidential vote
from 2004 compared with 2000, the "swing", against the percentage Republican vote in 2004
(counting only democratic and republican votes in each election). The symbols are
size coded for total county vote (symbol size ~ log10(vote) ), and color coded for
voting technology: red for electronic voting, black for optical scan precinct based.
A somewhat surprising positive swing, more Republican leaning vote, is seen in the
large most democratic-leaning electronic-voting counties (Broward and Palm Beach).
The statistical sample is too small to make any statements, but this does warrant further investigation.

The second figure, with symbols coded the same as in the first plot, show percent change in turnout
from 2004 cmpared with 2000, again versus Republican voting percentage. We see a downward
trend in turnout with democratic leaning in the electronic-voting red circles, a trend not seen
with the black optical scan circles. Again, small statistical samples preclude making any
strong conclusions. These anomalies do, however, suggest more thorough investigations be made.

swing florida 2004

At this point, more sophisticated and information intensive analysis of the results are needed, something which
goes beyond the scope of our work. Please see other sites concerned with these issues for further information.

Other Websites concerned with election monitoring, outlier detection, and voting technology:

http://elections.fas.harvard.edu

http://www.verifiedvoting.org

_____________________________________________________________________________
PRE-ELECTION BACKGROUND:

Introduction

What is Electoral Security?

We are concerned with electoral security- how protected is our electoral process from manipulation?
Our elections reflect our democracy in action.
Any possibility that they will be undermined threatens our society and its values at its very core.
Yet have we done enough to make our electoral process secure? The answer is, no!

What Can Be Done?

A number of analysts with different political party affiliations have raised concern about
the integrity and security or our electoral process. The potential vulnerability of our electoral process
is such that even external actors could potentially interfere in the outcome of our election.
This is intolerable and a profound threat to our democracy. This concern has deepened
with the use of electronic voting machines which may be particularly prone to external manipulation.
In light of this threat a group of concerned scholars have developed a series of statistical methods
for detecting possible anomalies on election night.
These methods would then allow us to pinpoint which counties require further investigation
into reasons for the irregularity.

Our group here at Columbia University has developed a method
based on looking at changes in voting patterns on a county by county basis over time.
The variable we track is the fraction of Republican votes over the total number of votes
for Democrats and Republicans (We could have just as easily used Democratic votes-this is an arbitrary choice).
Past election data reveal a pattern of changes in this variable over time, a kind of electoral signature for each county.
This change is then compared to the voting patterns of a set of counties with similar electoral signatures.
This allows us to see whether changes in any specific county are anomalous relative to others.
We have developed a program that allows for easy detection and visual representation
of potentially anomalous voting by county in a state. Our hope is that this will help us pinpoint
potential cases where the integrity of our electoral system may be at risk.
It should be stressed that this program does not verify whether any irregularity has actually taken place.
Instead it merely gives a fast way to detect where attention needs to be focused.

**Not all types of manipulation are detectable with this algorithm, and a lack of a signal does not indicate a lack of any problem!**

An example of this kind of methodology, applied to the past three elections in a state, Florida is this case, is shown below.
Further explanation of what is plotted, and the algorithm which goes into the plot, can be found below.

The plot below shows Florida for the 1992, 1996 and 2000 elections.
On election night, we will make a similar plot for 2004 for this and other states as the election results are reported.

What is plotted?

For each election we measure the difference in the fraction of Republican vote
in each county relative to a prediction of a matching county set,
where the matching county set is chosen from counties which have behaved
similarly in previous elections (the database used here goes back to 1968).
We plot a histogram of the distribution of these differences in the bottom panel,
and a map of the state in the top panel color-coded by
the anomaly, the absolue value of the differences.
In the map,
less than 6% anomaly is colored green,
between 6% and 9% anomaly is colored dark green,
between 9% and 12% anomaly is colored yellow,
above 12% anomaly is colored red;
if there is missing data for a county in a state, it is colored grey.
Note that on this color scale, the bright colors are rare. We have chosen
the colors such that in the historical record they are rarely triggered.
The intervals in the distribution represent standard deviations,
with a national average of approximately 3% standard deviation.
A map unprecedented in it brightness would indicate an outlier relative
to the historical previous elections, and be cause for further investigation.

Other states:

Link to other states

Contact Information:

http://www.columbia.edu/~jk2002

How the analysis works

The best approach for checking the accuracy of electoral results is to use a variety of
methodologies. The Electoral Security Project involves a consortium of researchers looking
from a variety of points of view at the results. Here, we present results from one particular
algorithm, which can detect some types of manipulation. Not all types of manipulation
are detectable, and a lack of a signal does not indicate a lack of a problem. A detection of
a signal, on the other hand, does raise the warning flag to look in further more specific detail.

Technical Aspects

The particular algorithm used here by our group at Columbia University
involves looking at changes in votes from one election to the next at the county level.
The measure is the fraction of the Republican vote relative to the Republican plus Democratic vote
(this is a more stable measure to elections when there were larger shares for third parties).
These changes are compared against the changes in other counties in a state.
Counties which have behaved similarly in the past to a given county are found from
historical data, and grouped as a matching set for each county. The results from a given election
of the matching set for each county is used to predict the expected change in vote for each county
for that same election, and the difference between the reported vote and this predicted vote
is then tabulated for each county.
This gives a distribution of differences for each county in a state between predicted and reported results.
Past elections are analyzed with the same algorithm to give a baseline distribution to compare against.
Increases significantly above this baseline variability would be cause for further detailed examination
by other means.

The fraction of the two-party measure we use is particularly sensitive to 'flipping' a vote
from one party to another, a method of cheating which preserves the total number of votes cast.
A different algorithm a group in our consortium is running is sensitive to nullifying a vote,
an operation this algorithm would not pick up. This points to the virtue of running many
different algorithms. It also points out that passing this particular test does not mean that
other forms of cheating, or cheating below our detection threshold, have not occurred.

We are setting the algorithms and all parameters used before the election, so that the results are indeed
a legitimate prediction, rather than postdiction. The exact algorithm and parameters will be distributed
before the election to independent parties to formalize the prediction aspect of this analysis.
This does not imply other analysis made afterward in a retrospective fashion do not have
validity. It instead emphasizes the statistical forecasting nature of the analysis:
the results are not 'tuned' to the current election in any way in this predictive mode.

We have set our algorithm to use easily available data so that it can run election night.
This has the advantage of allowing rapid response. Other algorithms using more
detailed data are being used by other groups in the consortium. They have the advantage
of being more able to accommodate mitigating factors which might explain anomalies.

We have intentionally devised an algorithm which is simple and easy to follow,
and neglected many adjustments which could be made, in order to keep it simple
and not needing additional information. If outiers are detected with this algorithm,
additional more sophisticated analysis and studies which take into account other pertinent
information should then be applied.

One important aspect of our algorithm is that we have the ability to look at distributions of
anomalies taking into account different technologies and different machines. This enables a
technology, manufacturer, and machine specific check on the results. Given the inherent
vulnerabilities in the computer software involved in the machines, this is a key area
we will examine.

We have run our algorithm on presidential election results from 1968-2000.
Results for the same analysis we will be using on the 2004 data are shown
for the 2000, 1996, and 1992 elections in the example shown above.
(The use of historical data to construct the matching set limits how far back we
can go in analyzing past elections before we run in to the earliest election
in 1968 in our dataset. Each analysis uses the same number of prior
elections, so a constant number are used and any prior ones not needed ignored).

Other Websites concerned with election monitoring, outlier detection, and voting technology:

http://elections.fas.harvard.edu

http://www.verifiedvoting.org