13 July 2008
Note this is an incomplete document. I have yet to confirm that this works 100%, but here are the things I noted while installing PostGIS' TIGER Geocoder. First and foremost, the source code is available via
svn co http://svn.refractions.net/postgis/trunk/extras/tiger_geocoder. A lot of this information is an incomplete copy of what's available from : http://postgis.refractions.net/support/wiki/index.php?PostgisOnUbuntu
What follows are the instructions as present in the source code. I'll interpolate the bits that were a little unclear and required assistance for me so you can benefit from them.
sudo apt-get install postgresql-8.3 postgresql-contrib-8.3 postgresql-8.3-postgis. Invariably, you'll need some extra tool support so:
sudo apt-get install gdal-bin python2.5-gdal(Obviously, you should have Python installed, too.). Interestingly, my installation has thus far been a failure since I chose to use Postgres 8.3. If you're on Gutsy or prior, you may have better luck (in fact, I'm almost 100% sure you will) with Postgres 8.2, against which the geocoder code seems to have been tested. I can't guaranteee these instructions are the same, however. They should be.. I can't imagine any of these dependencies not existing in the version of the distro that's just 6 months old :-), but the versions may be different. Anyway: take away is that you should install 8.2 but be cautious. The rest of these instructions are referencing my 8.3 installation, so please change them as necessary.
\i /usr/share/postgresql-8.3-postgis/lwpostgis.sql \i /usr/share/postgresql-8.3-postgis/spatial_ref_sys.sql
This looks something like:psql -U postgres your_database_name < /usr/share/postgresql/8.3/contrib/fuzzystrmatch.sql . There's all sorts of worthy things in the contrib modules, so take your time and do some research and install anything you think useful.
wget -r http://www2.census.gov/geo/tiger/tiger2006se
import/load_tiger.sh.. What this will do is load all the TIGER data from http://www2.census.gov/geo/tiger/tiger2006se/ into the database in a sort of raw unfettered way so that the processing to setup the Geocoder works. Here are the modifications I made before running it:
#!/bin/bashYou see? All I change was the value for MYDATABASE and the value for pointing tot he directory where I had downloaded all the data fromthe census site. Also, I modified tigerimport.sh itself. Whenever I saw lines starting with ogr2ogr, I modified the connection paramaters as appropriate:
for file in /home/jlong/Desktop/gis/www2.census.gov/geo/tiger/tiger2006se/*/*.ZIP; do
./tigerimport.sh append MYDATABASE $file $FIRST
if [ -z "$FIRST" ]; then
ogr2ogr $PRE $OGROPTS $SRS "PG:dbname=$DATABASE user=MYUSEr password=MYPW host=127.0.0.1" $TIGER CompleteChain $POST. For my setup, I needed to specify the host. Which I guessed was done using a 'host=127.0.0.1' parameter. That seemed to work. Simialarly, before you start importing everything, you may run into encoding issues down the line after this step. You can 'undo' it afterwards. This was necessary as the client's talking to my database were trying to send non UTF8 data to my database which was UTF8.
ALTER DATABASE lemanger SET client_encoding=latin1;. Alright, finally, run the script above once you're confident everything's changed the way it needs to be! Warning On my personal workstation with a 3GHz Pentium 4 (Not dual core, but still, no slouch) and 2 GB of RAM, this process took 28 hours and 30GBs or so of space. This is not a 'grab a cup of coffee' situation. This is, 'go visit the family ... in the next state... and stop all other concurrently running programs on the workstation for a day' situation. In total -- by the way -- so you don't get in over your head, I needed upwards of 100 GB of free space to complete this process. Some of that space is eventually recoverable, because it's temporary. But still, if you want to preflight this process with just one state, that may be advisable if you have a small hard disk. Another thing to note by the way is that a 'State', in this case, is a state of the United States and includes in its definition 59 entities that aren't from the 50 proper US 'States', including Puerto Rico, for example. Very cool.
That got me a listing of the zip files that I wanted. The next trick was to download those zip files:
cat cs_listing.html |grep -i shp |grep geo|cut -f2 -d\"| while read url;
done > base_of_zip_files.txt
That will take a few minutes. I couldn't get the data all in one for anything but the counties dataset. So, I had to download all the data and forge from them one aggregate data. It helps to be meticulous and create a 'boundry' folder inside of which you create separate 'cs', 'co', and 'pl' folders for these datasets.
cat base_of_zip_files.txt | while read url;
Basically, TABLE_NAME is the name of the table (one of the three I mentioned above, dependant on which dataset you're dealing with.) and the rest is just the information needed to get into your database. By the way, if you didn't do the client encoding step I mentioned above, this is where it may bite you in the butt.
find . -iname "*ZIP"|while read l;
mkdir -p x;
rm -rf x ;
mkdir x ;
BASE=`echo *.shp | cut -f1 -d.` ;
ogr2ogr -append -f PostgreSQL -a_srs EPSG:4269 "PG:dbname=DATABASE host=127.0.0.1 \
user=USER password=PASSWORD" $BASE.shp $BASE -nln TABLE_NAME -lco DIM=2
CREATE INDEX completechain_geom_idx ON completechain
USING GIST (wkb_geometry public.gist_geometry_ops);
roads_local.sql. I think this one is the one that does a derivation from a dataset produced by a join on one of the bigger tables, and so it too takes an incredibly long time. 5 hours, or so.
create_geocode.sqlto load the functions