Friday, February 11, 2011

Extract POIs from OSM PBF ("Protocolbuffer Binary Format") dumps with python

  • I am a GIS n00b - this is my first attempt at handling OSM data.
  • This code would have never been written if it wasn't for Chris Hill's excellent parsepbf.py - blog post here.


I looked at OSM to obtain railway station locations in the country for an in-house project we are running.Parsing through their data dumps sounded like an easy job. I grabbed india.osm.bz2 and india.osm.pbf from Geofabrik. Uncompressing the bz2 file resulted in a 614MB xml whereas the pbf was just 26MB. Intrigued by the small file size of the pbf files ( I never read up on google protocol buffers before) I went to the OSM wiki to read up the format and see if any python libraries are available for this. I found Chris' parsepbf script and ran it with the pbf I had. Turns out running the the script without asking it to spit out osm xml was a bad idea - ended up eating all the memory on my machine [ no swap enabled ] and crashing the system.

I modified the parsepbf file to make as somewhat generic class for picking out nodes with specified tags.


Stats:

  • It took about 5 minutes to pickout all the railway stations on my linode ( 512MB ) VPS.
  • I think a speed up can be achieved by using multiprocessing (?)


Example usage:



Current code can be found here.

6 comments:

Chris Hill said...

Thanks for the mention.

I think I need to put out a warning on the raw code about gobbling memory. The main reason I created the code was to do exactly as you have done - customise it to extract data you are interested in.

Tao said...

Hi


I use your code to see if it works on the map of philadelphia. However, it always return the following error message:

Traceback (most recent call last):
File "", line 1, in
tags = foo.return_tags(refresh=True)
File "osmnodepbf.py", line 216, in return_tags
self.parse()
File "osmnodepbf.py", line 113, in parse
self.processDense(pg.dense,tag)
File "osmnodepbf.py", line 199, in processDense
self.tags[node["sky"]] = [node["svl"]]
KeyError: 'svl'



I tried it on some other maps and it keep prompting the same error

rose said...

Nice blog has been shared by you. it will be really helpful to many peoples who are all working under the technology.thank you for sharing this blog.

selenium training in chennai|

Adam lee said...

Nice Blog!

big data training in chennai

Careen joseph said...

Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
Besant technologies Marathahalli

Inigo joseph said...

Thanks a lot very much for the high your blog post quality and results-oriented help. I won’t think twice to endorse to anybody who wants and needs support about this area. rprogramming training in bangalore