VB and VBA Code
Below are three different methods for querying a website or online database for the purpose of returning information to a spreadsheet. These codes are particularly useful when you need to collect thousands or 10s or 100s of thousands of observations
The first function (GetPrisoners) returns to an Excel spreadsheet the total number of prisoners for each of the 65,255 census tracts in the United States. The code uses the Census Bureau’s unique identifier for each tract, which is the state and county FIPS number flowed by the four digit tract number, to return the data. These codes are available in a text file which can be easily opened in Excel.
Next I used the U.S. Census American Fact Finder tool to produce a web table for any one census tract with the variable I’m looking for: population of Group Quarters by type. I copy the web address into my code and look for the unique identifier in the insanely long web address. To test, I sub in the unique identifier for other census tracts, paste the new web addresses in my browser, and voila – tables of populations of group quarter by type for the correct census tracts pop up.
Now all I need to do is write a code that will loop through each of the 65,255 cells, take the unique identifier, insert it into the web address, go get the table from that web address, return it to a new spreadsheet, extract only the number of prisoners in that census tract from the table, paste that number in the row for the appropriate census tract, delete the sheet created to paste the table and repeat. The code below does all that.
The one other thing that the code below does is empty Internet Explorer’s “Temporary Internet Files” after every 21 web queries. This is a rather odd – but effective – work around for a Microsoft glitch that causes Explorer to crash after 50 queries of sites with ridiculously long web addresses. Here’s a link to a forum with a better explanation from several developers who encountered the same problem along with a more in-depth explanation of my fix for the problem.
Sub GetPrisoners()
MyNum = 0
MyNum2 = 0
Dim myRange As Range
Set myRange = Worksheets("CensusTracts").Range("D7856:D7884")
For Each C In myRange ' Iterate through each Cell
C.Select 'Select each cell
Run (Retrieve) 'Runs the Retrieve Function below
MyNum = MyNum + 1 'Keeps a count of the number of iterations
MyNum2 = MyNum2 + 1
If MyNum = 21 Then 'On the 21st iteration, the function triggers a shortcut that deletes internet files
MyNum = 1
SendKeys "^%(d)" 'This line will cause you problems unless you have a desktop shortcut that deletes IE files whenever you hit ctrl+d
End If
If MyNum2 = 250 Then 'This saves my progress every 250 iterations
MyNum2 = 1
ActiveWorkbook.Save
End If
Next
Beep 'This makes a beeping noise when the code is done running
End Sub
MyNum = 0
MyNum2 = 0
Dim myRange As Range
Set myRange = Worksheets("CensusTracts").Range("D7856:D7884")
For Each C In myRange ' Iterate through each Cell
C.Select 'Select each cell
Run (Retrieve) 'Runs the Retrieve Function below
MyNum = MyNum + 1 'Keeps a count of the number of iterations
MyNum2 = MyNum2 + 1
If MyNum = 21 Then 'On the 21st iteration, the function triggers a shortcut that deletes internet files
MyNum = 1
SendKeys "^%(d)" 'This line will cause you problems unless you have a desktop shortcut that deletes IE files whenever you hit ctrl+d
End If
If MyNum2 = 250 Then 'This saves my progress every 250 iterations
MyNum2 = 1
ActiveWorkbook.Save
End If
Next
Beep 'This makes a beeping noise when the code is done running
End Sub
Function Retrieve()
Set CountyCell = ActiveCell
Set TractCell = CountyCell.Offset(ColumnOffset:=-1)
MyTract = TractCell.Value
Set PrisonCell = CountyCell.Offset(ColumnOffset:=1)
MyAddress = 'This is the address; note the MyTract variable which changes the address for each itteration
"http://factfinder.census.gov/servlet/CTTable?_bm=y&-context=ct&-ds_name=DEC_2000_SF1_U&-mt_name=DEC_2000_SF1_U_P037&-tree_id=4001&-redoLog=true&-all_geo_types=N&-_caller=geoselect&-geo_id=14000US" & MyTract & "&-dataitem=DEC_2000_SF1_U_P037.P037003&-search_results=01000US&-format=&-_lang=en"
ActiveWorkbook.Worksheets.Add
ActiveSheet.Name = "Census"
Set shTable = Workbooks(1).Worksheets("Census")
Set TblResults = shTable.QueryTables _
.Add(Connection:="URL;" & MyAddress, _
Destination:=shTable.Cells(1, 1))
With TblResults
.WebFormatting = xlNone
.WebSelectionType = xlSpecifiedTables
.WebTables = "4" 'This returns table 4, which is the one i need
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.Refresh BackgroundQuery:=False
End With
County = Worksheets("Census").Range("A3")
Prisoners = Worksheets("Census").Range("B3")
CountyCell.Value = County
PrisonCell.Value = Prisoners
Application.DisplayAlerts = False
Worksheets("Census").Delete
Application.DisplayAlerts = True
End Function
Set CountyCell = ActiveCell
Set TractCell = CountyCell.Offset(ColumnOffset:=-1)
MyTract = TractCell.Value
Set PrisonCell = CountyCell.Offset(ColumnOffset:=1)
MyAddress = 'This is the address; note the MyTract variable which changes the address for each itteration
"http://factfinder.census.gov/servlet/CTTable?_bm=y&-context=ct&-ds_name=DEC_2000_SF1_U&-mt_name=DEC_2000_SF1_U_P037&-tree_id=4001&-redoLog=true&-all_geo_types=N&-_caller=geoselect&-geo_id=14000US" & MyTract & "&-dataitem=DEC_2000_SF1_U_P037.P037003&-search_results=01000US&-format=&-_lang=en"
ActiveWorkbook.Worksheets.Add
ActiveSheet.Name = "Census"
Set shTable = Workbooks(1).Worksheets("Census")
Set TblResults = shTable.QueryTables _
.Add(Connection:="URL;" & MyAddress, _
Destination:=shTable.Cells(1, 1))
With TblResults
.WebFormatting = xlNone
.WebSelectionType = xlSpecifiedTables
.WebTables = "4" 'This returns table 4, which is the one i need
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.Refresh BackgroundQuery:=False
End With
County = Worksheets("Census").Range("A3")
Prisoners = Worksheets("Census").Range("B3")
CountyCell.Value = County
PrisonCell.Value = Prisoners
Application.DisplayAlerts = False
Worksheets("Census").Delete
Application.DisplayAlerts = True
End Function
Note: the other two methods and additional code will be up when i get some free time.