SHP, Shapefiles

The most ubiquitous of living fossil GIS formats, ESRI's SHP format, also known as "shape format" or "shapefiles," has been used with ESRI's ArcView, a popular GIS package since the early 1990's. SHP format was published by ESRI in a written standard and over the years has become widely used for data interchange in GIS.    Despite the antique limitations of SHP format it has in recent years gained a second life as the native format used by some popular open source packages, most notably QGIS.

 

To import a drawing from SHP format:

 

  1. Choose File-Import from the main menu.

  2. In the Import dialog browse to the location of the file and double-click on the .shp file desired.  

  3. If the shapefile ensemble did not include a .prj file, manually specify the projection (coordinate system) used by the shapefile by opening the drawing and by launching Assign Initial Coordinate System in the Contents pane.

 

Shapefiles created by older systems that did not automatically add a .prj file to the shapefile ensemble have no way of telling Manifold what projection they are in, so immediately upon import we must specify the projection manually by by launching Assign Initial Coordinate System in the Contents pane.

 

Shapefiles unaccompanied by a .prj will often be in Latitude / Longitude projection.    To specify Latitude / Longitude projection as the initial projection we open the drawing and launch Assign Initial Coordinate System in the Contents pane.   Choose Latitude / Longitude projection and we are done.

 

To export a drawing to SHP format:

 

  1. Open the drawing in a drawing window.

  2. Choose File - Export from the main menu.  We can also right click on a drawing in the Project pane and choose Export in the context menu.

  3. In the Export dialog choose SHP Files in the Save as type box and specify a File name to use.

  4. Press Save.

 

When Manifold exports a shapefile it will always add a .prj file to the ensemble that specifies the projection the shapefile uses.   Manifold also creates a .mapmeta file that provides precise coordinate system information in JSON format.

About Shapefiles

A "shapefile" is not just one file but usually consists of three similarly named files with differing extensions: a .shp, a .shx and a .dbf file.  Even though there are three files involved almost all GIS people will refer to the set of three files using the singular term shapefile.   The .dbf file is a dBase database system format file that stores data attributes for the drawing.   The .dbf part of shapefiles is even older than ArcView and dates back to 1979.  

 

But for all the limitations of shapefiles the format remains ubiquitous in GIS.   SHP is not a bad choice for a least common denominator method of exchanging data if the data is simple enough to fit within the limitations of shapefiles.  On the plus side, SHP is widely supported and it is a reasonably fast format, faster for "in place" editing than other old vector formats such as DXF, MapInfo MID/MIF or  GML/KML.  

 

Manifold therefore reads and writes shapefiles, using a variety of strategies when exporting data into shapefiles to dumb down modern data to fit into the limitations imposed by SHP format.    

 

The main limitations of shapefiles are:

 

 

Many applications fail to honor the above limitations so the world is full of nonstandard "shapefiles" which cannot be read correctly by applications which adhere to the standard. Manifold honors the shapefile standard and deals with the above limitations as follows:

 

File size - Manifold exports shapefiles up to 4 GB in size.

 

Data types - On export, Manifold automatically will convert modern types into simplified representations that can be stored in a shapefile.   For example,  variable-length text data is exported as fixed-length text with 254 characters, since various third party programs do not seem to be able to handle memo fields.  Floating point types will be converted into text, Unicode into ANSI and so on.  The conversion can involve data loss, for example as will happen when truncating a long, variable-length text value into a fixed 254 character field.

 

File and Field Names - Manifold will automatically truncate field names into the limited forms allowed by shapefiles and will eliminate spaces and other disallowed characters.  For example, a field name called Highest Z-value (meter) in a Manifold drawing's table will be converted into a field called HighestZva in the shapefile's DBF.  Manifold allows longer file names.

 

Object types -  Manifold drawings can contain a mixture of areas, lines and points along with curvilinear objects.   When a Manifold drawing that contains a mix of areas, lines and points is exported to shapefile format Manifold creates three sets of shapefiles:  shapefiles for the areas, shapefiles for the lines and shapefiles for the points.  Curvilinear objects are interpolated into the area or line equivalents.  Multipoints are converted into single points.

 

When exporting Manifold drawings containing objects of only one type (only areas or only lines or only points) to shapefiles no postfixes will be appended to the filename. When Manifold drawings contain more than one type of object, Manifold will create a file with no postfix for the areas and will then create files with _lines and _points postfixes to indicate which shapefiles contain lines and points.

 

Dealing with the above limitations is not easy.  Even as well-respected a package as the open source GDAL/OGR library does not deal with them automatically.  As the GDAL  documentation notes: "ESRI shapefiles can only store one kind of geometry per layer (shapefile).  [...] Note that this can make it very difficult to translate a mixed geometry layer from another format into Shapefile format using ogr2ogr, since ogr2ogr has no support for separating out geometries from a source layer. "

Incompatibilities

In addition to the fundamental limitations designed into shapefiles there are various incompatibilities that arise when shapefiles are used in modern settings.  The most common are:

 

 

Manifold manages the above incompatibilities as follows:

 

DBF drivers - Manifold does not rely on a third party DBF driver.  Instead, Manifold uses a special, Manifold-written DBF driver within Manifold's SHP dataport that is used only for reading and writing shapefiles.  The Manifold DBF driver can work around non-standard variations of DBF to extract as much information as possible.  When writing DBF,  Manifold tries to create a least common denominator DBF that can be read by as many shapefile reading packages as possible.

 

Editing incompatibilities - Manifold allows editing shapefiles "in place," with edits managed to avoid surprises when popular GIS packages import any shapefiles created or edited by Manifold.    For example, objects deleted during "in place" editing of a shapefile with Manifold will also be considered deleted when that shapefile is opened by ESRI products or by shapefile-using packages that employ the GDAL/OGR library to interact with shapefiles.  

 

Projection incompatibilities - Manifold reads the most common PRJ variations with a focus on correctly utilizing PRJ files created by ESRI products.    When exporting, Manifold writes an ESRI-style PRJ for shapefiles and also creates a .mapmeta file for each shapefile that writes the coordinate system information for each shapefile in JSON format.

   

tech_ravi_sm.png

Tech tip: Even though the JSON metadata will provide a highly precise and very "open" description of the coordinate system used, and even though Manifold PRJ files for shapefiles will do a really good job of conveying coordinate systems as best as any PRJ can do, it is still wise to follow the advice that experienced shapefile users have offered for over 25 years: do not use shapefiles to publish data in coordinate systems other than Latitude / Longitude.   The wise shapefile author always publishes shapefiles only in Latitude / Longitude "unprojected" form using degrees as a unit of measure with a highly generic base ("datum") such as WGS84.  

 

There is no loss to doing so since any modern package that can read shapefiles can effortlessly re-project unprojected data into whatever coordinate system is desired.   There is no point introducing an interoperability risk from other coordinate systems when one can completely avoid such risk by publishing a shapefile using Latitude / Longitude projection.

Localization

Manifold text fields use Unicode, which is not supported by DBF files.  Manifold exports to a shapefile will use whatever .dbf codepage matches the Windows system language in use on the machine.

 

Importing a .dbf file (either by importing a table from a .dbf or by importing a drawing from a shapefile) will automatically translate text fields into Unicode.

Exporting Projected Shapefiles

Because SHP format does not capture projection information it is unwise to export projected drawings into SHP format. However, if for some reason we absolutely must export projected data we should keep in mind the raw nature of data in projected form and the options used to represent locations in projected coordinate systems.

 

For example, suppose we have a drawing in some metric projection that uses local offsets of 100, 100 and local scales of 10, 10. Suppose we have a point the coordinates of which are 1, 2 in this coordinate system. When exporting this drawing as a SHP, sometimes we may want the coordinate numbers locating the point in the SHP file to be 1, 2 and sometimes 110, 120.

 

The Manifold SHP exporter does not transform the coordinate numbers in any way, so Manifold will always export 1, 2 for the coordinates of the point. If desired, we can force Manifold to export 110, 120 by first re-projecting the drawing into the coordinate system using local offsets of 0 and local scales of 1.

 

Example: Export a Drawing to SHP Format  

Suppose we have a drawing called Monaco that contains a mix of points, lines and areas.   When we export the drawing to SHP we will create the following files.

 

For areas:

 

Monaco.dbf

Monaco.prj

Monaco.shp

Monaco.shp.mapmeta

Monaco.shx

 

For lines:

 

Monaco_lines.dbf

Monaco_lines.prj

Monaco_lines.shp

Monaco_lines.shp.mapmeta

Monaco_lines.shx

 

For points:

 

Monaco_points.dbf

Monaco_points.prj

Monaco_points.shp

Monaco_points.shp.mapmeta

Monaco_points.shx

 

The .prj file contains ESRI-style coordinate system information.  The .mapmeta files contain coordinate system information in JSON format.   For example, the Monaco.shp.mapmeta contains:

 

{ "CoordSystem": { "Base": "World Geodetic 1984 (WGS84)", "Eccentricity": 0.08181919084262149, "LocalScaleX": 0.0001, "LocalScaleY": 0.0001, "MajorAxis": 6378137, "Name": "Latitude \/ Longitude", "System": "Latitude \/ Longitude", "Unit": "Degree" } }

 

Notes

Longer file and field names - The original dBase package used no more than eight characters in field names and no more than eight characters in a file name plus the three letter extension.  Over the years so many applications, including dBase descendents, have used slightly longer names that the current consensus is field names should have no more than ten characters and that file names also can be longer.   Manifold therefore allows ten character for field names and significantly longer names for file names.  

 

Do not send only the .shp file - A "shapefile" consists of at least three files, and in modern times often four files if a PRJ file is written as well.  Despite the singular form of the word "shapefile" it is not just one file but at least three files.  When Manifold exports to "shp" format it creates five files: a .dbf, a .shp and a .shx file to make up the classic three files required by the ESRI shape format definition, plus a fourth .prj file specifying the coordinate system in not standard, but customary way, plus a fifth .mapmeta file unique to Manifold products that provides a very open JSON format description of the coordinate system.  When providing the result of our export to someone else, we must not forget to provide at least the four customary files and not make the beginner's blunder of providing just the .shp file.

 

Invalid Z values - Reading a 3D geometry value in a SHP file forces invalid Z values such as NaN or Inf to 0.

 

See Also

File - Export

 

Assign Initial Coordinate System

 

Change Coordinate System

 

Example: Import a Shapefile - ESRI shapefiles are a very popular format for publishing GIS and other spatial data.  Unfortunately, shapefiles often will not specify what projection should be used.  This example shows how to deal with that quickly and easily.

 

Latitude and Longitude are Not Enough

 

Shapefiles Strangely Out of Shape

 

Three Letter Extensions