- I am a GIS n00b - this is my first attempt at handling OSM data.
- This code would have never been written if it wasn't for Chris Hill's excellent parsepbf.py - blog post here.
I looked at OSM to obtain railway station locations in the country for an in-house project we are running.Parsing through their data dumps sounded like an easy job. I grabbed india.osm.bz2 and india.osm.pbf from Geofabrik. Uncompressing the bz2 file resulted in a 614MB xml whereas the pbf was just 26MB. Intrigued by the small file size of the pbf files ( I never read up on google protocol buffers before) I went to the OSM wiki to read up the format and see if any python libraries are available for this. I found Chris' parsepbf script and ran it with the pbf I had. Turns out running the the script without asking it to spit out osm xml was a bad idea - ended up eating all the memory on my machine [ no swap enabled ] and crashing the system.
I modified the parsepbf file to make as somewhat generic class for picking out nodes with specified tags.
Stats:
- It took about 5 minutes to pickout all the railway stations on my linode ( 512MB ) VPS.
- I think a speed up can be achieved by using multiprocessing (?)
Example usage:
Current code can be found here.