An Introduction to Spatial Epidemiology

First off, credit where credit is due, these notes are taken in large measure from some wonderful original sources, in particular the excellent ``Applied Spatial Data Analysis with R'' by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio. Buy the book. It is worth it. I stole even more code from one of Dr. Bivand’s course websites.

Here are a complete set of notes on the nuts and bolts of conducting spatial epidemiological analyses in R. The first chapter deals with reasons why an epidemiologist may or may not want to incorporate spatial analyses into their work. If spatial analysis makes sense, I make the case for using R to conduct it, and spend a little time going over what spatial data are, and the tools R provides for dealing with them. The next set of notes details how to install the tools you’ll need and read in spatial data so that you can analyze it in R, with some detailed instructions on getting the open-source GIS program GRASS installed on a Mac or Linux system (I stopped using PC’s a few years ago. Sorry.) I then have a set of notes on how areal neighbors are defined, either using contiguity definitions, graph-based definitions or distance-based definition. After that, there is a set of notes about defining weights for the neighbors you’ve defined, and testing for spatial autocorrelation. There are then some notes on modeling areal data, and finally, about applying Bayesian hierarchical approaches to spatial data. Throughout, I demonstrate the methods using a set of data on traumatic brain injury among children in New York City. I finish up with a brief description of spatial point processes, with a lot of material from Adrian Baddely’s online notes.

My intent is to present a relatively brief, hopefully non-jargony overview of how practicing epidemiologists can apply some open-source, but extremely powerful, spatial analytic tools. For the short-term, the material is directed toward an epidemiology methods class that I will be teaching starting the spring 2012 semester. For the longer term, I hope to provide some guidance to practicing epidemiologists who may not have had much opportunity to incorporate spatial methods into their usual practice. I try to define terms and concepts as simply as possible or at least simply enough so that I understand them. I try to provide enough examples and code so that someone with at least a master's degree level of training in epidemiology can start applying these tools almost immediately to their own data and problems.

The material is unapologetically based on the kinds of issues and topics which interest me, and which I've spent some time working on. You will find a lot of references to trauma and injury. You will find a lot of ecological or areal analyses. You will not find much, if anything, about geostatistics. This is the application of spatial statistics to interpolate and predict values and includes topics like kriging. It is simply not something with which I’ve had to deal in my in my epidemiological work. Similarly, I consider point process analysis another highly specialized topic that I've not had to apply in my practice. If confronted with such data, I would likely seek help and collaboration from a `real’ medical geographer or spatial analyst. It is though a topic to which epidemiologists are likely to see reference, so I've tried to perform due diligence and included some information and an example later in these notes.

Spatial Analysis Book Chapter

If you are interested in a one-stop overview of spatial analyses, here is a single draft book chapter on the subject intended for public health and substance use researchers. It will help you become familiar with some of the available data analytic techniques, each of which comes with advantages and drawbacks. In the chapter, my co-authors and I (of whom Angela Bucciarelli really did most of the heavy lifting) discuss three cluster detection tools and their associated software applications. We then present a Bayesian hierarchical approach, briefly reviewing its theoretical underpinnings, commonly used models, and how inferences may be drawn a sample-based posterior distribution. We demonstrate the use of each approach on a set of substance abuse mortality data, comparing the results across the four tools. Our empiric illustration, considers the role of neighborhood-level socioeconomic status (SES) in explaining opiate-related overdose deaths in New York City. We end with a discussion of the implications of the choice of technique and software on interpreting spatial analyses of substance abuse and conclude that the choice of a method will be driven by the question to be answered, data and software availability and the intended audience or context in which the research is being conducted.