/* SHORT TITLE:  Description of Files with 2001 Census counts */

This study reports the final counts for India Census 2001.
It contains sets of files for each of the following geographies:

  State files 
  District files 
  Administrative Level 3   

plus an additional set of files, based on the administrative level 3 files, 
modified to match up with the corresponding spatial data file in GIS applications.

The text below gives details on the following
 -Missing values that occur in a 38 of the 5610 administrative level 3 areas and
  impacting counts for 3 of 35 states and 9 of 593 districts.
 -About the geography and the 3 sets of files (admin_level3, district, state).
 -Administrative Level 3 files that are an alternative to main set when mapping.


Missing values on these files
-----------------------------
For a few administrative level 3 geographies there either no data or limited 
data was available and these missing values do impact the district and 
stae files.  The counts, in terms of all of India, are relatively small. 
Below are details of what is missing and how  missing data affects the files.

1. There are three sudistricts where the Registrar's office was unable to 
produce census counts: subdistricts 0001-0002-0003 in Senapati district (01) 
Manipur state(14). There are complete counts for 3 other subdistricts in 
Senapati.  After publishing the final counts, the following estimates were 
made for these subdistricts (all three are rural areas).

                     total    male    female
         Paomata     27,065  13,649  13,416  
         Purul       30,912  15,589  15,323 
         Mao Maram   69,131  37,080  32,051

These estimates are NOT incorporated into the counts on any files in this study.  
The impact of their exclusion is:
 -on the 3 subdistrict files all counts for these subdistricts are recorded as missing values
 -on the district files, 
  urban file, counts are complete (Senapti is all rural)
  total and rural files, Senapti counts exclude data for these subdistricts 
 -on the state files,
  urban file, counts are complete (Senapti is all rural)
  total and rural files, Senapti counts exclude data for these subdistricts


2. For all subdivisions in Meghalaya state (17) counts were available only 
for selected variables.  The data variables that are missing are counts 
that report by male-female breakdown and all (except total population) counts for
an urban-rural.  The total population where Meghalya state is 2,318,822.
 -on the 3 subdistrict files, when data are not available, counts are recorded as missing values
 -on district files and on the state files, 
  total -for variables, where data was missing for subdistricts, counts are recorded as missing,  
  urban & rural - only population totals reported, all other variables are reported as missing,

3. For all 3 subdistricts in Chandauli district (66) Uttar Pradesh state (09) counts 
were available only for selected variables. The total population in Chanduali district 1,643,251.
  -on the 3 subdistrict files, when data was not available, counts are recorded as missing values
  -on district files, 
   total -for variables, where data was missing for subdistricts, counts are recorded as missing,  
   urban & rural - for Chandulai, only population totals reported, all other variables = missing,
  -on state files 
   total -for variables, with missing subdistrict data, there is an undercount = missing Chandulai data, 
   urban & rural files -all variables (except total population) have and undercount =  missing Chandulai 



About the admin_level3 files
----------------------------

In the hierarchy of reporting India Census, entries on this file are for the 
third level down (below the state and district) used for reporting counts 
from the 2001 Census.  Administrative areas at this level are either 
subdistricts or towns.  [Note: cities (more the 1,000,000) are usually made 
up of a number of subdistricts/towns.]   The designation "subdistrict" 
includes a number of different types of administrative entities and in 
each entry the type is noted in the admtype variable. 

The geographic codes assigned to each entry are:
      state
      district    
      town   (set to zero for subdistrict entries)
      subdistrict  (set to zero for towns)
      state-district (a combination  of the two codes)
      subdistrict2  (the subdistrict code shortened from 4 to 2 digits)

There is also a geokey that is a constructed code created for each entry that 
matches to an entry on the India subdistrict spatial data boundary file that is 
in Columbia's spatial data collection.
      geokey = state+district+subdistrict(4) when there is a boundary
	     = spaces when there is no matching boundary (145 entries)

Town codes were not used when constructing geokey.  When there is a boundary 
for a town the zero values assigned to subdistrict code are picked up in the 
geokey and in all but one case this creates a unique geokey for the entry.  
The exception is in Ponicherry District (Pondicherry state) where there are 
two towns for which we have boundaries. To generate a unique geokey for the 
town of Ozhukarai, the value 0011 was used in the geokey instead of 0000.

For all levels (admin_levle3, district and state), you can use geokey to 
join the data in the file to spatial files in Columbia's spatial data 
At administrative level 3, our spatial data file does not exactly match 
the entries in the admin_level3 files. Using the admin_level3 files will 
result in unmatched records on both thee spatial files and census file 
(142 towns with a population of 19,006,571).  A set of files at the 
administrative level 3 have been created for mapping applications.  See 
below for information about the admin_level3_bound file.



About the district files
------------------------
The district level files contain counts for the 593 districts used for 
reporting 2001 Census data.  

The geographic codes assigned to each entry are:
      state
      district    
      state-district  (a combination  of the two codes)

There is also a geokey that is a constructed code created for each entry 
that that matches to an entry on the corresponding India district spatial 
data boundary file that is in Columbia's spatial data collection.
      geokey = state+district


About the state files
------------------------
The state level files contain counts for the 35 states used for reporting 
2001 Census data.  

Each state is assigned a unique two digit code.

There is also a geokey that is a constructed code created for each entry 
that that matches to an entry on the corresponding India district spatial 
data boundary file that is in Columbia's spatial data collection.
      geokey = state




About the admin_level3_bound file (ask in EDS before using this file)
--------------------------------
You can use the admin_level3 for mapping but, for some entries on this 
file, there are no corresponding entries on the boundary file. The mismatches 
occur because admin_level3 includes both towns and subdistricts but the 
boundary file only subdistricts.  Also there are boundaries for 3 areas that 
appear to have no population and therefore are not on the admin_level3 files.

Starting with the admin_level3 file, the following updates and additions 
were made to create the admin_level3_bound file.

1. There are 3 entries for areas listed only on the boundary files 
and not on the Census files (and therefore not on the admin_level3 files.)  
Zeros have been assigned to the count variables for these three areas.
   -subdistrict 00 in Kachchh district (01) Gujarat state (23)
   -subdistrict 00 in district 00 in Jammu & Kashmir state (01), 
   -Nagari Hills (99) in Chittor district (23) Andhra Pradesh state (28)

2. The 3 town entries, which comprise all of Hyderabad district (05) 
Andhra Pradesh state (28), were not picked up from admin_level3 because 
there were no boundaries for these towns. Instead the entry for Hyderabad 
district was used.  The town and sub-district codes are both set to zero.

3. In four states: West Benegal, Tripura, Nagaland, and Karanatka the
admin_level3 entries can be either towns or subdistricts.  No boundaries 
are available for the towns.  These towns (142 towns with a population of 
19,006,571) are not on the main admin_level3_bound file.  Since the counts
for these towns will not be picked up for mapping, the entries removed 
from admin_level3_bound and , instead, are listed separately in 
file admin_level3_no_bound.xls.