27

I'm currently building an instance on EC2 on which to import the entire Planet.osm snapshot of the whole Earth's worth of data for some projects we're working on. I've spun up a large Ubuntu x64 instance and attached plenty of separate storage on an EBS volume for the Postgres database and modified it to house the PGSQL data there.

Now the server is having trouble using osm2pgsql to import the snapshot... After a couple of attempts with different memory configs and whatnot, the process keeps outputting "Killed" after getting most of the way through; once it was killed while "going over pending ways" and the next time, after slightly adjusting the slim cache, it reached "processing ways" before crashing out. From what I've read, this is generally due to memory issues.

Here's my latest attempt to run the import:

osm2pgsql -v -U osm -s -C 4096 -S default.style -d osm /data/osm/planet-latest.osm.bz2

And here are the specs for a Large instance on EC2:

Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of local instance storage, 64-bit platform

My question is -- are there some good benchmark resources to determine the tuning requirements for osm2pgsql and Postgres? Speed of import isn't even that important to me, I'd just like to be able to make sure the process completes safely, even if it takes 4 or 5 days... I've read through Frederick Ramm's "Optimising the rendering chain" (PDF) document from last year's SOTM, but are there other good opinions / resources?

4
  • Wouldn't it be very expensive to do that on EC2?
    – Pablo
    Commented Feb 11, 2011 at 22:03
  • It's not cheap to keep it running, but the interim plan is to spin it up, generate a tileset then shut it down and use that set for a while until we need to apply updates. It's still a lot cheaper than buying a massive server...
    – colemanm
    Commented Feb 14, 2011 at 15:25
  • 1
    Interesting! I've never tried this on my old XP-Home-Box. Does it really work? I'm asking because it was written to convert extracts from Geofabrik or Cloudmade not for the entire planet. The planet seems to be invalid XML. How did you solve this problem?
    – user2325
    Commented Mar 14, 2011 at 17:51
  • @Carsten In migrating your response to a comment form, I inadvertently deleted a comment by @jvangeld. Here it is: Hi Carsten, welcome to GIS.se. It is awesome when developers come here to help people with their programs. But your answer here would probably have been better as a comment to @winwaed's post. Again, it is great to have you here!
    – whuber
    Commented May 15, 2011 at 19:10

5 Answers 5

8

As the documentation say you may need more than 256gb of ram to do that.

I don't know much about EC2, but you can try the slim (--slim) mode or try Osmosis.

There is an interesting post: http://weait.com/content/build-your-own-openstreetmap-server It says, 'you must use slim mode'.

1
  • Yeah, I also understand that slim mode is required to apply diffs for updates.
    – colemanm
    Commented Feb 14, 2011 at 15:10
4

Due to the memory constraints I didn't even try to use osm2pgsql to load the planet.osm's routing data. Instead I used osm2po:

http://osm2po.de/

Most of the documentation is in German but with a bit of experimentation I managed to get it to work. Takes a few days on a dedicated Core 2 Quad (but it is only using one thread).

2

I came across the following while looking for something else http://aws.amazon.com/datasets/2844 - I'm not sure if it will help you out or not but it might be a starting point.

1
  • That could definitely work for right now, even though it's from 2009...
    – colemanm
    Commented Feb 14, 2011 at 20:53
2

Did you get a solution for your issue, other than using old pre-generated package? I seem to have very similar issue in EC2 instance. I'm using pbf planet from http://download.bbbike.org/osm/

time ./osm2pgsql -S default.style --slim -d gis -C 7000 --hstore /mnt/planet/planet-latest.osm.pbf
osm2pgsql SVN version 0.70.5
...(creating db tables)
Reading in file: /mnt/planet/planet-latest.osm.pbf
Processing: Node(741920k) Way(0k) Relation(0)Killed

real    276m47.695s

Update: it seems I found solution - after reducing asked memory to 6 GB (parameter -C 6000) the process works (at least has worked for several days now, will finish today I hope).

It seems that the m1.large instance with 7.5GB memory is slightly too little to fit all nodes to memory (which should require about 11GB nowadays). The osm2pgsql seems to require below 700MB extra to required memory, so with -C 7000 it is running just short of memory, but with -C 6000 (or possibly also -C 6500) it works.

Also I'd suggest using higher memory instance with at least 15GB RAM, it should make import much faster. Or even double extra large memory instance which would cost double, but should be able to do full planet import in non-slim mode within <5 hours (about 3-4 times faster than slim-mode). So it would be actually cheaper.

1

I got osm2pgsql to work on EC2 using less cpu and more RAM. It failed due to memory problems until I upped the instance to a high-memory extra large with 17 gigs of ram.

Not the answer you're looking for? Browse other questions tagged or ask your own question.