- I am a GIS n00b - this is my first attempt at handling OSM data.
- This code would have never been written if it wasn't for Chris Hill's excellent parsepbf.py - blog post here.
I looked at OSM to obtain railway station locations in the country for an in-house project we are running.Parsing through their data dumps sounded like an easy job. I grabbed india.osm.bz2 and india.osm.pbf from Geofabrik. Uncompressing the bz2 file resulted in a 614MB xml whereas the pbf was just 26MB. Intrigued by the small file size of the pbf files ( I never read up on google protocol buffers before) I went to the OSM wiki to read up the format and see if any python libraries are available for this. I found Chris' parsepbf script and ran it with the pbf I had. Turns out running the the script without asking it to spit out osm xml was a bad idea - ended up eating all the memory on my machine [ no swap enabled ] and crashing the system.
I modified the parsepbf file to make as somewhat generic class for picking out nodes with specified tags.
Stats:
- It took about 5 minutes to pickout all the railway stations on my linode ( 512MB ) VPS.
- I think a speed up can be achieved by using multiprocessing (?)
Example usage:
Current code can be found here.
2 comments:
Thanks for the mention.
I think I need to put out a warning on the raw code about gobbling memory. The main reason I created the code was to do exactly as you have done - customise it to extract data you are interested in.
Hi
I use your code to see if it works on the map of philadelphia. However, it always return the following error message:
Traceback (most recent call last):
File "", line 1, in
tags = foo.return_tags(refresh=True)
File "osmnodepbf.py", line 216, in return_tags
self.parse()
File "osmnodepbf.py", line 113, in parse
self.processDense(pg.dense,tag)
File "osmnodepbf.py", line 199, in processDense
self.tags[node["sky"]] = [node["svl"]]
KeyError: 'svl'
I tried it on some other maps and it keep prompting the same error
Post a Comment