- Python Geospatial Analysis Cookbook
- Michael Diener
- 1601字
- 2021-07-30 10:13:23
Discovering projection(s) of a Shapefile or GeoJSON dataset
Remember that all data is stored in a coordinate system, no matter what the data source is. It is your job to figure this out using a simple approach outlined in this section. We will take a look at two different data storage types: a Shapefile and a GeoJSON file. These two formats contain geometries, such as points, lines, or polygons, and their associated attributes. For example, a tree would be stored as a point geometry with attributes, such as height, age, and species, Each of these data types store their projection data differently and, therefore, require different methods to discover their projection information.
Now a quick introduction to what a Shapefile is: a Shapefile is not a single file but a minimum of three files, such as .shp
, .shx
, and, .dbf
, all of which have the same name. For example, world_borders.shp
, world_borders.shx
and world_borders.dbf
make up one file. The .shp
file stores geometry, .dbf
stores a table of attribute values, and .shx
is the index table that connects geometry to an attribute value as a lookup table.
A Shapefile should come with a very important fourth text file called world_borders.prj
. The .prj stands for projection information and contains the projection definition of the Shapefile in a plain text format. As crazy as it sounds, you can still find and download tons of data being delivered today without this .prj
file. You can do this simply by opening this .prj
file in a text editor, such as Sublime Text or Notepad++, where you can read about the projection definition in order to determine the files coordinate system.
Note
The .prj
file is a plain text file that can easily be generated for the wrong coordinate system if you are not careful. The wrong projection definition can cause problems with your analysis and transformations. We will see how to correctly assess a Shapefile's projection information.
GeoJSON is a single file stored in plain text. The GeoJSON standard (http://www.geojson.org) is based on the JSON standard. The coordinate reference information is, in my experience, often not included and the default is WGS 84 and EPSG:4326, where the coordinates are stored in the x
, y
, z
format and in this exact order.
Note
Mixing x and y for y, x can happen and when it does, your data will most likely end up in the ocean, so always remember that order matters:
x = longitude
y = latitude
z = height
The following is how the GeoJSON CRS information looks if it is presented in FeatureCollection
:
{ "type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, …
Getting ready
The first thing is to head over to download that's a little over 150 MB. Inside the repository, you will find each chapter with the following three folders: /geodata/
to store data, /code/
to store code scripts that have been completed, and an empty folder called /working/
for you to create your self-written code scripts. The structure looks like this:
/ch01/ –------/code –------/geodata –------/working /ch02/ –------/code –------/geodata –------/working ...
The source of the data used in this recipe comes from the City of Vancouver, BC, Canada, which is located at http://data.vancouver.ca/datacatalogue/index.htm (Vancouver schools).
Tip
When downloading data from an Internet source, always look around for metadata descriptions about the projection information so that you know a little about your data's history and its source before you begin working with it. Most data today is publicly available in EPSG:4326 WGS 84 or EPSG:3857 Web Pseudo-Mercator. Data stemming from a government resource is most likely stored in local coordinate systems.
How to do it...
We are going to start with our Shapefile and identify the coordinate system it is stored in using the GDAL library that imports the OGR module:
Note
Note that we have assumed that your Shapefile has a .prj
file. If not, this process will not work.
- Create a new Python file named as
ch02_01_show_shp_srs.py
in your/ch02/working/
directory, and add the following code:#!/usr/bin/env python # -*- coding: utf-8 -*- from osgeo import ogr shp_driver = ogr.GetDriverByName('ESRI Shapefile') shp_dataset = shp_driver.Open(r'../geodata/schools.shp') shp_layer = shp_dataset.GetLayer() shp_srs = shp_layer.GetSpatialRef() print shp_srs
- Now save the file and run the
ch02_01_show_shp_srs.py
script from the command line:$ python ch02-01-show_shp_srs.py PROJCS["NAD_1983_UTM_Zone_10N", GEOGCS["GCS_North_American_1983", DATUM["North_American_Datum_1983", SPHEROID["GRS_1980",6378137,298.257222101]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",0], PARAMETER["central_meridian",-123], PARAMETER["scale_factor",0.9996], PARAMETER["false_easting",500000], PARAMETER["false_northing",0], UNIT["Meter",1]]
You should see the preceding text print out on your screen, showing information on the
.prj
projection.Note
Note that we could also simply open the
.prj
file with a text editor and view this information as well.Now, we will take a look at a GeoJSON file to see the projection information if it's available.
- Determining the coordinate system of a GeoJSON file is a little harder since we must make one of two assumptions where the first case is the standard case and most common:
- No CRS is explicitly defined inside the GeoJSON, so we assume EPSG:4326 WGS 84 is the coordinate system.
- CRS is defined explicitly and is correct.
- Create a new Python file named
ch02_02_show_geojson_srs.py
in your/ch02/working/
directory, and add the following code:#!/usr/bin/env python # -*- coding: utf-8 -*- import json geojson_yes_crs = '../geodata/schools.geojson' geojson_no_crs = '../geodata/golfcourses_bc.geojson' with open(geojson_no_crs) as my_geojson: data = json.load(my_geojson) # check if crs is in the data python dictionary data # if yes print the crs to screen # else print NO to screen and print geojson data type if 'crs' in data: print "the crs is : " + data['crs']['properties']['name'] else: print "++++++ no crs tag in file+++++" print "++++++ assume EPSG:4326 ++++++" if "type" in data: print "current GeoJSON data type is :" + data['type']
- The script is set up to run on the
geojson_no_crs
variable set in the GeoJSONgolfcourses_bc.geojson
file. The source of this data is OpenStreetMap, which is exported using the Overpass API that's located at http://overpass-turbo.eu/. Now, run thech02_02_show_geojson_srs.py
script and you should see this output for our first file:$ python ch02_02_show_geojson_crs.py ++++++ no crs tag in file+++++ ++++++ assume EPSG:4326 ++++++ current GeoJSON data type is :FeatureCollection
Tip
If no CRS is inside our GeoJSON file, we'll assume it has a projection of EPSG:4326. To check this, you will need to look at the coordinates listed inside the file and see if they fall within bounds, such as
-180.0000
,-90.0000
,180.0000
, and90.0000
. If so, we'll assume that the dataset is truly EPSG:4326 and open the data in QGIS to check this. - Now, go into the code and edit line 10, change the variable from
geojson_no_crs
togeojson_yes_crs
, and rerun thech02_02_show_geojson_srs.py
code file:$ python ch02_02_show_geojson_crs.py the crs is : urn:ogc:def:crs:EPSG::26910
You should now see the preceding output printed on screen.
How it works...
Beginning with the Shapefile, we've used the OGR library to help us quickly discover the EPSG code information of our Shapefile as follows:
- Begin with importing the OGR module as follows:
from osgeo import ogr
- Activate the OGR Shapefile driver:
shp_driver = ogr.GetDriverByName('ESRI Shapefile')
- Open the Shapefile with OGR:
shp_dataset = shp_driver.Open(r'../geodata/schools.shp')
- Access the layer information with
GetLayer()
:shp_layer = shp_dataset.GetLayer()
- Now we can get the coordinate information using the
GetSpatialRef()
function:shp_srs = shp_layer.GetSpatialRef()
- Finally, print the spatial reference system on screen:
print shp_srs
The GeoJSON file was a little harder to tackle when we used the Python JSON module to look for the crs
key and print out its value on the screen, if it existed.
Note
We could have simply replaced the first example code with the GeoJSON driver and we would get the same result. However, not all GeoJSON files include projection information. The OGR driver will always output WGS 84 as the coordinate system by default which, in our no_geojson_crs.geojson
example file, is wrong. This can lead to confusion for new users. The important thing to note is to check your data, have a look at the coordinate values, and see if they fit in a defined coordinate range of values. To explore codes, or if you enter a code that you have and want to see the area it covers on a live web map, refer to http://epsg.io.
First, we'll import the standard Python JSON module, and then set two variables to store both our GeoJSON files. Next, we'll open one file, the golfcourses_bc.geojson
file, and load the GeoJSON file into a Python object. Then, all we need to do is check to see whether the crs
key is in the GeoJSON; if it is, we'll print out its value. If not, we'll simply print to screen that crs
is not available and the GeoJSON data type.
The GeoJSON default CRS is WGS 84 EPSG:4326, which means that we are dealing with latitude and longitude values. The values must fall within the bounds of -180.0000
, -90.0000
, 180.0000
, and 90.0000
to qualify.
There's more...
Here are some projection definition examples for your reference:
- The code for the ESRI Well-Known Text (stored with Shapefile as
ShapefileName.prj
) is as follows:GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
- The code for the OGC Well-Known Text of the same coordinate system as EPSG:4326 is as follows:
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
- The code for the Proj4 format, which also shows
EPSG:4326
, is as follows:+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
See also
The web page at http://www.spatialreference.org is a great place to get coordinates for any projection you desire by simply selecting the destination coordinate system that you like, zooming in on the map, and then copying and pasting the coordinates. Later on, we will use the http://spatialreference.org/ API to get the EPSG definition to create our own .prj
file for a Shapefile.
- UNIX編程藝術
- 從零開始:數字圖像處理的編程基礎與應用
- Apache Spark 2.x Machine Learning Cookbook
- Mastering Selenium WebDriver
- Learning C++ Functional Programming
- 跟老齊學Python:輕松入門
- Java Web程序設計
- Python自然語言處理(微課版)
- Learning Laravel 4 Application Development
- Java Web開發技術教程
- Python 3破冰人工智能:從入門到實戰
- Responsive Web Design by Example
- 精通MySQL 8(視頻教學版)
- Web App Testing Using Knockout.JS
- Emotional Intelligence for IT Professionals