Open Spatial Data Formats
The goal of this series of articles on spatial data is to help you deal with spatial data in your open data app or website. In order to do this, all of your spatial data must be in a common spatial reference system (SRS). In my last post I discussed the last piece of the SRS puzzle, the geographic coordinate system. In this post I’m going to talk about the different file formats you may encounter when dealing with open spatial data sources. Understanding these formats will help you determine what kind of SRSs you’re dealing with. Only then do you know if you have the green light to map your data or if you need to convert some datasets to achieve a common SRS.
Much of the spatial data that you encounter will come from Geographic Information System (GIS) software. The heavyweight in this arena is ESRI. As such, their open data format, called a shapefile, is one you are quite likely to run into. (Bias alert: most of my career was spent working at ESRI, so I am quite fond of good ol’ shapefiles!) Now shapefile is a bit of a misnomer, in the sense that a shapefile is actually comprised of a series of files. For example, if you have a shapefile of land parcels called ‘parcels’, you will have files named parcels.dbf, parcels.shp, parcels.prj, etc. For our purposes, what we want to focus on in the .prj file, which is the projection file. This contains the projection and coordinate system information. Here is an example of a .prj fle:
PROJCS["City of Ottawa",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.01745329251994328]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",304800.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-76.5],PARAMETER["Scale_Factor",0.9999],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]
Lucky for us, it’s in plain text! So right away we can see the datum (North American 1983, or NAD83), and the projection is a Transverse Mercator with the given parameters of the projection included. We can compare these values with the other data sources we have to make sure that all are the same. And of course the map we want to draw has to have the same coordinate system as well! More on drawing maps in a later post…
ESRI .prj files follow the Open Geospatial Consortium (OGC) simple feature access standard. The gory details can be found here.
A newer contender in the spatial data field is Google, which of course has become familiar with everyone due to Google Maps and Google Earth. Their spatial data file format is KML (don’t mix up the last two letters or you’ll end up dealing with a Dutch airline). KLM, oops, KML files don’t explicitly specify a spatial reference system because – they are all in the same one! A KML data set will always use the latitude/longitude coordinate system, and the WGS84 datum. So no mystery to be solved there, if you have KML files you now know their SRS. KML file documentation can be found here .
If you’re dealing with web apps, chances are you’re familiar with JSON open data interchange format. Well GeoJSON is a JSON based format for representing spatial data. SRS information is optional in this specification. If it is missing, then you are to assume the default SRS, which is the same as KML files: latitude & longitude, WGS84 datum. If it does have a specification, that’s where things get a little dicey. The SRS tag is called ‘crs’, for Coordinate Reference System. The attributes of the crs tag can be either a link or a name. If it’s a link, you need to follow that link for the definition of the SRS. If it’s a name, well, there’s no standard on what that name can be. Creators of these files are encourages to use OGC standard names and a uniform resource name (URN) convention. With this URN, you can then use a lookup tool to see the SRS details (too many TLAs is you ask me). For example, say you have this name:
Take the last part of that, CRS84, and go to the URN resolver tool pointed to on this page, and enter it in the ‘Text Within label’ search field. Click ‘Find’ and you discover that lo and behold, it’s referring to the WGS84 latitude/longitude SRS!
Unfortunately you may run into names that are not using this convention, in which case internet search engines are your friend.
To recap, we now not only know what an SRS is, we can also examine our various spatial data sources to see what SRS they are using. If they are all the same, then we are good to go. But if not, then what? Well then we will need to convert some of them to a common SRS. In my next post I’ll cover how to do this conversion. Using open source software of course!