官术网_书友最值得收藏!

Discovering projection(s) of a Shapefile or GeoJSON dataset

Remember that all data is stored in a coordinate system, no matter what the data source is. It is your job to figure this out using a simple approach outlined in this section. We will take a look at two different data storage types: a Shapefile and a GeoJSON file. These two formats contain geometries, such as points, lines, or polygons, and their associated attributes. For example, a tree would be stored as a point geometry with attributes, such as height, age, and species, Each of these data types store their projection data differently and, therefore, require different methods to discover their projection information.

Now a quick introduction to what a Shapefile is: a Shapefile is not a single file but a minimum of three files, such as .shp, .shx, and, .dbf, all of which have the same name. For example, world_borders.shp, world_borders.shx and world_borders.dbf make up one file. The .shp file stores geometry, .dbf stores a table of attribute values, and .shx is the index table that connects geometry to an attribute value as a lookup table.

A Shapefile should come with a very important fourth text file called world_borders.prj. The .prj stands for projection information and contains the projection definition of the Shapefile in a plain text format. As crazy as it sounds, you can still find and download tons of data being delivered today without this .prj file. You can do this simply by opening this .prj file in a text editor, such as Sublime Text or Notepad++, where you can read about the projection definition in order to determine the files coordinate system.

Note

The .prj file is a plain text file that can easily be generated for the wrong coordinate system if you are not careful. The wrong projection definition can cause problems with your analysis and transformations. We will see how to correctly assess a Shapefile's projection information.

GeoJSON is a single file stored in plain text. The GeoJSON standard (http://www.geojson.org) is based on the JSON standard. The coordinate reference information is, in my experience, often not included and the default is WGS 84 and EPSG:4326, where the coordinates are stored in the x, y, z format and in this exact order.

Note

Mixing x and y for y, x can happen and when it does, your data will most likely end up in the ocean, so always remember that order matters:

x = longitude

y = latitude

z = height

The following is how the GeoJSON CRS information looks if it is presented in FeatureCollection:

{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
…

Getting ready

The first thing is to head over to download that's a little over 150 MB. Inside the repository, you will find each chapter with the following three folders: /geodata/ to store data, /code/ to store code scripts that have been completed, and an empty folder called /working/ for you to create your self-written code scripts. The structure looks like this:

/ch01/
–------/code
–------/geodata
–------/working
/ch02/
–------/code
–------/geodata
–------/working
...

The source of the data used in this recipe comes from the City of Vancouver, BC, Canada, which is located at http://data.vancouver.ca/datacatalogue/index.htm (Vancouver schools).

Tip

When downloading data from an Internet source, always look around for metadata descriptions about the projection information so that you know a little about your data's history and its source before you begin working with it. Most data today is publicly available in EPSG:4326 WGS 84 or EPSG:3857 Web Pseudo-Mercator. Data stemming from a government resource is most likely stored in local coordinate systems.

How to do it...

We are going to start with our Shapefile and identify the coordinate system it is stored in using the GDAL library that imports the OGR module:

Note

Note that we have assumed that your Shapefile has a .prj file. If not, this process will not work.

  1. Create a new Python file named as ch02_01_show_shp_srs.py in your /ch02/working/ directory, and add the following code:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    from osgeo import ogr
    shp_driver = ogr.GetDriverByName('ESRI Shapefile')
    shp_dataset = shp_driver.Open(r'../geodata/schools.shp')
    shp_layer = shp_dataset.GetLayer()
    shp_srs = shp_layer.GetSpatialRef()
    print shp_srs
  2. Now save the file and run the ch02_01_show_shp_srs.py script from the command line:
    $ python ch02-01-show_shp_srs.py
    PROJCS["NAD_1983_UTM_Zone_10N",
     GEOGCS["GCS_North_American_1983",
     DATUM["North_American_Datum_1983",
     SPHEROID["GRS_1980",6378137,298.257222101]],
     PRIMEM["Greenwich",0],
     UNIT["Degree",0.017453292519943295]],
     PROJECTION["Transverse_Mercator"],
     PARAMETER["latitude_of_origin",0],
     PARAMETER["central_meridian",-123],
     PARAMETER["scale_factor",0.9996],
     PARAMETER["false_easting",500000],
     PARAMETER["false_northing",0],
     UNIT["Meter",1]]
    

    You should see the preceding text print out on your screen, showing information on the .prj projection.

    Note

    Note that we could also simply open the .prj file with a text editor and view this information as well.

    Now, we will take a look at a GeoJSON file to see the projection information if it's available.

  3. Determining the coordinate system of a GeoJSON file is a little harder since we must make one of two assumptions where the first case is the standard case and most common:
    1. No CRS is explicitly defined inside the GeoJSON, so we assume EPSG:4326 WGS 84 is the coordinate system.
    2. CRS is defined explicitly and is correct.
  4. Create a new Python file named ch02_02_show_geojson_srs.py in your /ch02/working/ directory, and add the following code:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import json
    
    geojson_yes_crs = '../geodata/schools.geojson'
    geojson_no_crs = '../geodata/golfcourses_bc.geojson'
    
    with open(geojson_no_crs) as my_geojson:    
        data = json.load(my_geojson)
    
    # check if crs is in the data python dictionary data
    # if yes print the crs to screen
    # else print NO to screen and print geojson data type
    if 'crs' in data:
        print "the crs is : " + data['crs']['properties']['name']
    else:
        print "++++++ no crs tag in file+++++"
        print "++++++ assume EPSG:4326 ++++++"
        if "type" in data:
            print "current GeoJSON data type is :" + data['type']
  5. The script is set up to run on the geojson_no_crs variable set in the GeoJSON golfcourses_bc.geojson file. The source of this data is OpenStreetMap, which is exported using the Overpass API that's located at http://overpass-turbo.eu/. Now, run the ch02_02_show_geojson_srs.py script and you should see this output for our first file:
    $ python ch02_02_show_geojson_crs.py
    ++++++ no crs tag in file+++++
     ++++++ assume EPSG:4326 ++++++
    current GeoJSON data type is :FeatureCollection
    

    Tip

    If no CRS is inside our GeoJSON file, we'll assume it has a projection of EPSG:4326. To check this, you will need to look at the coordinates listed inside the file and see if they fall within bounds, such as -180.0000, -90.0000, 180.0000, and 90.0000. If so, we'll assume that the dataset is truly EPSG:4326 and open the data in QGIS to check this.

  6. Now, go into the code and edit line 10, change the variable from geojson_no_crs to geojson_yes_crs, and rerun the ch02_02_show_geojson_srs.py code file:
    $ python ch02_02_show_geojson_crs.py
    the crs is : urn:ogc:def:crs:EPSG::26910
    

    You should now see the preceding output printed on screen.

How it works...

Beginning with the Shapefile, we've used the OGR library to help us quickly discover the EPSG code information of our Shapefile as follows:

  1. Begin with importing the OGR module as follows:
    from osgeo import ogr
    
  2. Activate the OGR Shapefile driver:
    shp_driver = ogr.GetDriverByName('ESRI Shapefile')
    
  3. Open the Shapefile with OGR:
    shp_dataset = shp_driver.Open(r'../geodata/schools.shp')
    
  4. Access the layer information with GetLayer():
    shp_layer = shp_dataset.GetLayer()
    
  5. Now we can get the coordinate information using the GetSpatialRef() function:
    shp_srs = shp_layer.GetSpatialRef()
    
  6. Finally, print the spatial reference system on screen:
    print shp_srs
    

The GeoJSON file was a little harder to tackle when we used the Python JSON module to look for the crs key and print out its value on the screen, if it existed.

Note

We could have simply replaced the first example code with the GeoJSON driver and we would get the same result. However, not all GeoJSON files include projection information. The OGR driver will always output WGS 84 as the coordinate system by default which, in our no_geojson_crs.geojson example file, is wrong. This can lead to confusion for new users. The important thing to note is to check your data, have a look at the coordinate values, and see if they fit in a defined coordinate range of values. To explore codes, or if you enter a code that you have and want to see the area it covers on a live web map, refer to http://epsg.io.

First, we'll import the standard Python JSON module, and then set two variables to store both our GeoJSON files. Next, we'll open one file, the golfcourses_bc.geojson file, and load the GeoJSON file into a Python object. Then, all we need to do is check to see whether the crs key is in the GeoJSON; if it is, we'll print out its value. If not, we'll simply print to screen that crs is not available and the GeoJSON data type.

The GeoJSON default CRS is WGS 84 EPSG:4326, which means that we are dealing with latitude and longitude values. The values must fall within the bounds of -180.0000, -90.0000, 180.0000, and 90.0000 to qualify.

There's more...

Here are some projection definition examples for your reference:

  1. The code for the ESRI Well-Known Text (stored with Shapefile as ShapefileName.prj) is as follows:
    GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
  2. The code for the OGC Well-Known Text of the same coordinate system as EPSG:4326 is as follows:
    GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
  3. The code for the Proj4 format, which also shows EPSG:4326, is as follows:
    +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
    

See also

The web page at http://www.spatialreference.org is a great place to get coordinates for any projection you desire by simply selecting the destination coordinate system that you like, zooming in on the map, and then copying and pasting the coordinates. Later on, we will use the http://spatialreference.org/ API to get the EPSG definition to create our own .prj file for a Shapefile.

主站蜘蛛池模板: 金阳县| 永昌县| 无棣县| 灯塔市| 樟树市| 星子县| 赤水市| 女性| 延庆县| 阿克| 太湖县| 涡阳县| 遂昌县| 抚松县| 明星| 江口县| 建宁县| 沭阳县| 宜兰县| 察隅县| 寻乌县| 伽师县| 五大连池市| 广州市| 长丰县| 谢通门县| 栾川县| 托克逊县| 浏阳市| 神池县| 大邑县| 车险| 福安市| 盐源县| 通渭县| 时尚| 阳曲县| 玉龙| 内黄县| 西充县| 巍山|