Extracting nodes from a graph database

Question

I'm new to Postgres and PostGIS, but not to geospatial applications.

I have a table loaded up with graph data in the form of links (edges). The link database has about 60,000,000 rows. I am trying to extract the nodes to allow for quicker searching. For some of my proof-of-concept work I was able to use the link table for the searches, but there will be lots of duplicates, and there's no guarantee that either the source column or the target column contains all nodes.

This is Postgres & PostGIS, so I am also using the table to cache a geometry->geography conversion. Yes I do need to use geography fields. I'm also copying the geometry information "just in case".

Table creation SQL:

-- Recreate table and index
DROP TABLE IF EXISTS nodes;

CREATE TABLE nodes (node integer PRIMARY KEY, geog geography(POINT,4326) );

CREATE INDEX geogIndex ON nodes USING GIST(geog);

SELECT AddGeometryColumn('nodes', 'geom', 4326, 'POINT', 2);

-- Insert all unique nodes from the source column
INSERT INTO nodes (node,geog,geom)
       SELECT DISTINCT ON (source) source,geography( ST_Transform(geom_source,4326)),geom_source 
       FROM view_topo;

-- Insert any nodes in the target column that we don't have already
INSERT INTO nodes (node,geog,geom)
       SELECT DISTINCT ON (target) target,geography( ST_Transform(geom_target,4326)),geom_target 
       FROM view_topo
       WHERE NOT EXISTS( SELECT 1 FROM nodes WHERE nodes.node = view_topo.target);

VACUUM ANALYZE;

I left the first INSERT running overnight and it took about 2-3hrs to run. This resulted in about 40,000,000 unique nodes being added.

I have just enabled the second INSERT and the VACUUM ANALYZE. I am expecting it to take until at least lunchtime.

Luckily this is only a batch job that has to be executed once after I've loaded a new link table, but is there a better way? Is there a faster way?

Yes Postgres doesn't like the VACUUM ANALYZE in a script like that. However the rest appears to have worked okay. — winwaed, Commented Jan 21, 2011 at 0:40

Gaurav · Accepted Answer · 2011-01-27 04:46:33Z

5

Check out PostgreSQL's tips on adding a lot of data into a table. In particular, are you sure that you need that index before INSERTing all that data? It might speed things up if you create the index after all the data has been added to the table.

answered Jan 27, 2011 at 4:46

Gaurav

2161 silver badge4 bronze badges

1

\$\begingroup\$ Thanks. Yesterday I founded I needed an index on the geometry field - retroactively adding that took quite a while to execute. This initially made me doubtful, but then in conventional programming a create-store-access-destroy pattern on sorted data is usually a lot faster implemented with an unsorted container and as create-store-sort-access-destroy - ie. essentially what you are suggesting. Build it then sort it one big sort. \$\endgroup\$
– winwaed
Commented Jan 27, 2011 at 14:48

Add a comment |

user722user722 · Accepted Answer · 2011-01-27 14:00:43Z

4

WHERE NOT EXISTS is not fast SQL. How about a union of the two selects, inside a select distinct? I haven't got a postgresql instance to test this on, but maybe something like this: INSERT INTO nodes (node, geom, geom) SELECT DISTINCT ON (node), geom1, geom2 FROM (SELECT source AS node, geography( ST_Transform(geom_source,4326)) AS geom1, geom_source AS geom2 FROM view_topo UNION SELECT target AS node, geography( ST_Transform(geom_target,4326)) AS geom1, geom_target AS geom2 FROM view_topo)

answered Jan 27, 2011 at 14:00

user722

\$\begingroup\$ Thanks, I'm trying a UNION. I guess the NOT EXISTS relies on a nested SQL statement which would be executed for each row. \$\endgroup\$
– winwaed
Commented Jan 27, 2011 at 15:17
\$\begingroup\$ Out of memory! It sounds like your approach should be faster for smaller datasets but with 60M rows in my view_topo view, and about 8-9M unique rows to be inserted, my PC can't hack it. (4GB installed, but booted as 32 bit because PostGIS is currently only available for 32 bit) \$\endgroup\$
– winwaed
Commented Jan 27, 2011 at 15:35

Add a comment |

Stack Exchange Network

Extracting nodes from a graph database

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
sql
postgresql
or ask your own question.

Hot Network Questions

Extracting nodes from a graph database

2 Answers 2

Not the answer you're looking for? Browse other questions tagged sqlpostgresql or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
sql
postgresql
or ask your own question.