13

I've got hundreds of shapefiles from various different GIS projects that I want to start consolidating into a single database platform, currently attempting this with Postgres / PostGIS.

Hardly any of the data is standardized - meaning that it's a lot of the same data types, but the particular attribute names / types don't match.

Where should I start tackling this? Should I develop a standard model to migrate each shapefile into first (e.g. Hydro_line, transport_line, Hydro_poly standards, etc)?

An alternative is to just import each shapefile into Postgres individually, so each shp becomes a table in the database, but I'm not sure about this in terms of performance and organization. Feels sort of like delaying the inevitable...

Any advice on dealing with this daunting task?

3 Answers 3

7

Have a look at Spatial ETL softwares (Extract - Transform - Load), they are dedicated to such tasks. The most known is FME from Safe, but some open source (partial) alternatives are now available, like SDI (Spatial Data Integrator) and GeoKettle.

3
  • 2
    I asked for a comparison in a previous question, so if you go this route, please do a write-up. gis.stackexchange.com/questions/5049/spatial-etl-comparisons Commented Jan 15, 2011 at 20:37
  • I grabbed a trial version of FME, and installed both SDI and GeoKettle. I'll try them out and see if I can make sense of them. FME looks like a soup-to-nuts solution, but I'll have to get over the learning curve first :).
    – colemanm
    Commented Jan 17, 2011 at 18:39
  • 1
    @colemanm- What did you end up doing on this? Which product did you find most useful? Commented May 10, 2011 at 20:19
6

Hallo

I would import it to PostGIS first. There is tools to load multiple shapes to individual tables. QGIS spit extension is one. The new graphical shp2pgsql in PostGIS trunk or experimental binaries is another alternative. Or you could just write a batch script with shp2pgsql.

I would start there, import everything to a schema called original or something like that. Then from that I would structure the data. Merging together in tables where suitable and so on.

The nice thing about doing it like that is that if you save all the queries you use to do those transformations you have a great documentation about the history of your data. It is also very easy to redo it if needed. Once you are ready with your organizing work you dump a backup of your schema "original" and put away somewhere.

I think this is a structured and clean way of doing it. And as said before, you will get a very solid documentation of what field changed name to what new name, and what original tables are merged into that big new one and so on.

In FME and software like that you can of course also save what you have done, but beside being very slow compared to internal database queries it is not that universal way of document what is done as sql-queries. They will be usable and readable as long as there is text files and relational databases.

I use to end up with textfiles looking something like:

-- A query to merge all roads in Norway

Create table road_tables.all_roads as
SELECT id as roadid, status, the_geom from original.big_roads
union all
SELECT rid as roadid, condition as status, the_geom from original.small_roads;

and so on. This saved as text file has a great value after a few years.

Regards Nicklas

9
  • 1
    +1 I think this is a very good approach. Everything is done within Postgres, very transparent and easily reproduce-able if needed.
    – underdark
    Commented Jan 15, 2011 at 16:15
  • 1
    not a good recommendation for ESRI based GIS. Open source "only" this would be acceptable. ESRI has many more dependencies that would not be accessible through this method. direct connect to postgis is not allowed without either an interop, gis server, or arcsde.
    – Brad Nesom
    Commented Jan 15, 2011 at 18:02
  • 2
    @Brad Hmm, I wonder if that is an argument aginst doing things in a transparent reproducable and fast way or an argument against getting locked up by putting sde in between me and my data...;-) Commented Jan 15, 2011 at 18:57
  • 1
    @Brad: colemanm didn't mention ESRI, so the answer seems to be good.
    – underdark
    Commented Jan 15, 2011 at 20:09
  • I've done similar work to this with ESRI Sde featureclasses and SQL Server 2008 (w/ native geometry)--I created the featureclass first, then load with a series of insert statements. IIRC, I had to export the featureclass at the end to a new featureclass because I couldn't generate new objectids correctly. once i did that, business as usual. Commented Jan 15, 2011 at 20:48
4

My suggestion would be to pick 2-5 of your heavier used data layers (shapefiles) and migrate them to an rdbms.
Investigate and implement workfows for those data. Getting used to the liimitations and requirements of rdbms vs file based data.

Limitations include: required export, landing zone, coordsys, file type for collaboration.

There are many benifits to what you are proposing.
Side NOTE: (My grandpa told my parents to spend 6mnths looking for a house before buying) consider you are looking for a house (long term) for your data, you don't want to be paying for something 30 yrs from now that you don't like.

My recommendation would be to write down (digital or analog) a tree listing of your data sources and view them in a big picture, this should allow you to organize the data in more concise groupings.

There are methods within arcgis (my assupmtion: you haven't specified your preferred system) to integrate disparate data.

You might try some of this information if you are interested in learning good design practices...

geodatabase design overview
geodatabase documentaion
There are some similar liinks for arc 10 also.
Resource Center
arc10 geodatabase

Not the answer you're looking for? Browse other questions tagged or ask your own question.