SQL Example: Extract Airport Runways from an OpenStreetMap PBF

We write a simple SQL query using INNER JOIN to extract runway lines from an OpenStreetMap PBF of Cyprus, and to save those lines to a new drawing and table.

 

OpenStreetMap (OSM) has the tremendous virtue of providing vast amounts of free data worldwide, but it has the disadvantage of using  native data structures designed primarily to support the server-side, distributed internal functioning of OSM.  When published in what has become the de facto native export format for OSM, PBF, the data relationships can be difficult to understand in comparison to typical GIS data sets, and often require some processing to extract features of interest.

 

In this example we download a PBF with complete OSM data for the island of Cyprus.  We then extract runway lines from the tables.   We use Cyprus because the PBF for Cyprus is small, easily downloaded by anyone who wants to repeat this example, and the workflow shown in this topic can rapidly be done even on older and weaker computers.   

 

Faster workflow encourages experimentation, such as adapting the workflow in this topic to extract other features of interest.  Once we feel confident in our abilities and in the results of tests on small PBF files like that for Cyprus, we can tackle larger countries with multi-hundred MB PBF files that expand into projects that are many gigabytes in size, and which can take hours to import and many minutes, or even hours, to process.

Download PBF File

We visit the geofabrik.de website for European downloads, at https://download.geofabrik.de/europe.html

 

From that site we download the PBF file for Cyprus, from https://download.geofabrik.de/europe/cyprus-latest.osm.pbf

 

Websites and links change frequently.  If the links above no longer work, any good search engine will find resources for downloading OSM data as PBF files.

 

The Cyprus PBF file, called cyprus-latest.osm.pbf, is just over 15 MB and downloads quickly.   Thank you, Geofabrik, for providing this service!

Import

Drag and drop the cyprus-latest.osm.pbf file from Windows Explorer into the Project pane in Manifold.   Alternately, import using File - Import.

 

 

The main drawing is the cyprus-latest Drawing, using the table cyprus-latest.   Other tables provide supplementary information associated with the objects in that drawing.   

 

Current Manifold builds have a limit of 2 billion records per table, which can be exceeded when importing PBF files that provide OSM data for the entire planet at once.   To avoid that limit, Manifold imports the PBF with separate tags tables, with tags tables for addresses, buildings, sources, and tags table for everything else, called cyprus-latest Tags.  

 

Our task is to use the information in the cyprus-latest Tags table to find OSM id values for records for where the key is aeroway and the value is runway, and to then use a simple JOIN to get the object records for those id values from the cyprus-latest table.

Create a Query

In the main menu, we choose File - Create - New Query.

 

 

 

In the New Query dialog we enter Extract Runways as the name for our new query.   Press Create Query.

 

 

The new query appears in the Project pane.   Double-click the new query to open it in a Command window.  

 

The query opens with default, stub text.   We replace that text with the query text we would like to use.

 

 

Enter the following SQL text:

 

CREATE TABLE [cyprus runways Table] (

  [mfd_id] INT64,

  [id] INT64,

  [type] INT8,

  [version] INT32,

  [name] NVARCHAR,

  [Geom] GEOM,

  [value] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Geom_x] RTREE ([Geom]),

  PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([cyprus-latest Drawing])

);

 

CREATE DRAWING [cyprus runways] (

  PROPERTY 'Table' '[cyprus runways Table]',

  PROPERTY 'FieldGeom' 'Geom'

);

 

PRAGMA ('progress.percentnext' = '100');

 

INSERT INTO [cyprus runways Table] (

  [id], [type], [version], [name], [Geom], [value]

)

 

SELECT [cyprus-latest].id,  [cyprus-latest].type,

   [cyprus-latest].version, [cyprus-latest].name,

   [cyprus-latest].Geom, [cyprus-latest Tags].value

   FROM [cyprus-latest]

   INNER JOIN [cyprus-latest Tags]

     ON [cyprus-latest].[id] = [cyprus-latest Tags].[id]

   WHERE [cyprus-latest Tags].[key] = 'aeroway'

     AND [cyprus-latest Tags].[value] = 'runway'

     AND [cyprus-latest].[type] = 1;

 

The query generates a simple drawing and table, with only one additional field added from the tags table, the value field.  For discussion on how the query functions, see the How the Query Works section at the end of this topic.

 

We run the query by pressing the ! Run button in the main toolbar.

 

 

The results table reported in the Command Window is a single value of 20, meaning there were 20 records inserted into the new table and drawing the query created.

 

 

The new table and drawing appear in the Project pane.    To better see the drawing in context we create a map that uses the drawing as a layer.   

 

In the Project pane, we right-click the cyprus runways drawing and we choose Create - New Map from the context menu.

 

 

The New Map dialog shows layers available for use in a new map.  We enter Runways Map as a Name, we choose Bing streets as a base layer, and we check the boxes for the cyprus-runways and cyprus-latest Drawing layers.

 

Press Create Map.

 

 

A new Runways Map appears in the Project pane.  We double-click it to open it.

 

 

Zooming in to the airport by Nicosia, we see it has two runways.  We use the Style pane to style the runway lines as thick green lines with black borders on the lines.

 

 

If we want, we can drag and drop the cyprus-latest Drawing into the map to see the mass of objects from which we extracted the runway lines.

 

 

Turning off the cyprus-latest Drawing layer we can zoom out to see the various airport runways picked out from the entire PBF for Cyprus.

 

 

The runway lines at the Nicosia airport might have been more or less obvious by visual inspection, but the many other small airport lines scattered throughout Cyprus would have been very hard to find manually.   Above we see a small airport in a mostly unpopulated part of the island, that would have been very difficult to find manually.  We have added a Bing satellite image server layer for a satellite image background.

 

How the Query Works

Consider the query we wrote, to extract runway lines:

 

CREATE TABLE [cyprus runways Table] (

  [mfd_id] INT64,

  [id] INT64,

  [type] INT8,

  [version] INT32,

  [name] NVARCHAR,

  [Geom] GEOM,

  [value] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Geom_x] RTREE ([Geom]),

  PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([cyprus-latest Drawing])

);

 

CREATE DRAWING [cyprus runways] (

  PROPERTY 'Table' '[cyprus runways Table]',

  PROPERTY 'FieldGeom' 'Geom'

);

 

PRAGMA ('progress.percentnext' = '100');

 

INSERT INTO [cyprus runways Table] (

  [id], [type], [version], [name], [Geom], [value]

)

 

SELECT [cyprus-latest].id,  [cyprus-latest].type,

   [cyprus-latest].version, [cyprus-latest].name,

   [cyprus-latest].Geom, [cyprus-latest Tags].value

   FROM [cyprus-latest]

   INNER JOIN [cyprus-latest Tags]

     ON [cyprus-latest].[id] = [cyprus-latest Tags].[id]

   WHERE [cyprus-latest Tags].[key] = 'aeroway'

     AND [cyprus-latest Tags].[value] = 'runway'

     AND [cyprus-latest].[type] = 1;

 

 

A Manifold query can contain more than one statement ending in a semicolon ; character, with each statement being executed in sequence.  The query above contains four statements: two CREATE statements, a PRAGMA, and then finally an INSERT INTO.

 

The query has two main parts:  the first part creates a table and a drawing, and the second part populates the table (which automatically, of course, populates the drawing).    

 

Let us unpack the query to see how it works.   The first part is simple:

 

CREATE TABLE [cyprus runways Table] (

  [mfd_id] INT64,

  [id] INT64,

  [type] INT8,

  [version] INT32,

  [name] NVARCHAR,

  [Geom] GEOM,

  [value] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Geom_x] RTREE ([Geom]),

  PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([cyprus-latest Drawing])

);

 

CREATE DRAWING [cyprus runways] (

  PROPERTY 'Table' '[cyprus runways Table]',

  PROPERTY 'FieldGeom' 'Geom'

);

 

PRAGMA ('progress.percentnext' = '100');

 

The first two CREATE statements create a table and then create a drawing based on that table.   The table is created with six fields.   Two indexes are created, one on id and one on Geom.   The table is created using the same coordinate system as the cyprus-latest Drawing drawing.

 

Drawings in Manifold contain no data but just consist of a minimum of two properties that say which table they should use for their geometry and which field in that table contains the geometry.   Creating a drawing in Manifold in SQL is thus trivial.

 

The PRAGMA statement creates a progress bar that shows progress as the rest of the query proceeds.

 

The second, main part of the query finds records in the tags table that correspond to runways, joins them into the table with geometry to create a results table of objects that are runways, and then inserts those into the newly-created table:

 

INSERT INTO [cyprus runways Table] (

  [id], [type], [version], [name], [Geom], [value]

)

 

SELECT [cyprus-latest].id,  [cyprus-latest].type,

   [cyprus-latest].version, [cyprus-latest].name,

   [cyprus-latest].Geom, [cyprus-latest Tags].value

   FROM [cyprus-latest]

   INNER JOIN [cyprus-latest Tags]

     ON [cyprus-latest].[id] = [cyprus-latest Tags].[id]

   WHERE [cyprus-latest Tags].[key] = 'aeroway'

     AND [cyprus-latest Tags].[value] = 'runway'

     AND [cyprus-latest].[type] = 1;  

 

The core part of the query is a SELECT statement that accomplishes an INNER JOIN.   Within the INNER JOIN is the join condition where the id fields from each of the two joined tables must be the same, and using a WHERE clause to find only those tag records that are aeroways and are runways, and also only those records in the cyprus-latest table that have a type of 1, meaning that they are lines.  

 

OSM encodes the type of object usually,  but not always, using a type of 1 for lines. In some cases a type of 1 can also be an area.   We want only airport runway lines, not objects which have the same id and which are tagged as runways, but which may be points or areas.  Using a using a type of 1 is a good compromise.

 

The result of the SELECT with INNER JOIN fees the INSERT INTO, populating the new table (and thus the new drawing) that the query creates.

Optional:  Configure Indexes

As imported from the PBF, the tables we will be using do not have indexes on fields we will use in the join.   When doing a join without indexes on fields in the join, the internal Manifold query optimizer will build temporary indexes and will use those automatically.   That usually is fast enough that it is not worth the time to manually create indexes.

 

However, if we are working with large PBF data, or if we are doing repeated joins with that data, it may be worth it to take a few minutes to add indexes so that Manifold does not have to create temporary indexes with every join that we might do.    It is easy to add indexes with a few clicks in the Schema dialog.

 

We first will adjust the cyprus-latest table, and then we will adjust the cyprus-latest Tags table.

 

 

In the Project pane, we double-click the cyprus-latest table to open it.    Since this table is used to create a drawing, on import Manifold automatically adds a guaranteed unique mfd_id key field and an index on that.   However, the table does not have an index on the id field that we will use in the join operation later on in this topic.

 

We will add an index on the id field.   With the focus on the opened table, in the main menu we choose Edit - Schema.  

 

 

In the Schema dialog, we press the Add command button and choose Index.  

 

 

In the Index dialog, we specify id_x as the Name (a memorable name for an index on the id field), we choose regular index (btree) as the Type of index, and we check Allow duplicate values. In the pull down menu for the first Field box we choose id.

 

Press OK.

 

 

The new index is shown in provisional blue color, as something that will be added if we press Save Changes.   We double-check our work and press Save Changes.

 

Building an index on a very large data set can take a while, possibly even a minute or more.  Be patient.

 

Next, in the Project pane we double-click open the cyprus-latest Tags table.    We will add indexes on the id field, on the key field, and on the value field.   The id field will be used in the join, and the key field and the value field will be used in a WHERE clause.

 

 

With the focus on the open cyprus-latest Tags table, in the main menu we choose Edit - Schema.

 

 

In the Schema dialog, we press the Add command button and choose Index.  

 

 

As we did for the prior table, in the Index dialog, we specify id_x as the Name, we choose regular index (btree) as the Type of index, and we check Allow duplicate values. In the pull down menu for the first Field box we choose id.

 

Press OK.

 

 

The new index is shown in provisional blue color, as something that will be added if we press Save Changes.   In most cases, adding an index on the id field in the Tags table will be enough.  However, if we like we can make the query run just a little bit faster by adding two more indexes, for the key field and for the value field.  

 

First we will add an index for the key field.

 

 

We press the Add command button and choose Index.  

 

 

In the Index dialog, we specify key_x as the Name, we choose regular index (btree) as the Type of index, we check Allow duplicate values, and we also check Allow null values.  In the pull down menu for the first Field box we choose key.

 

Press OK.

 

 

A new row appears for the proposed key_x index.

 

We repeat the procedure above, pressing Add, choosing Index and so on, remembering to check Allow duplicate values and Allow null values, to add an index called value_x on the value field.

 

 

We now have three indexes we propose to add to the table.     We double-check our work and press Save Changes.

 

Creating three indexes at once with a very large table, especially with text fields, can take a long while.   Be patient.    Having the indexes will make the JOIN faster.

 

Videos

Join Videos

 

Find Percentages of Open Space in ZIP Code Area - Find the percentage of open space in each ZIP code area given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces. This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional few minutes of explanation what each step does. Works in Manifold Release 9 or using the free Manifold Viewer.

 

See Also

Tables

 

Queries

 

Drawings

 

Maps

 

Style: Drawings

 

Schema

 

Join

 

Join Examples

 

Command Window

 

PBF .pbf, OSM, O5M

 

Example: Find Percentages of Open Space in ZIP Code Areas - Given a drawing showing ZIP codes as areas (polygons) and another drawing showing open spaces like parks and nature preserves, we add a field to each ZIP code area that gives the percentage of open space in that ZIP code area.  The workflow we show handles situations where some open space regions overlap multiple ZIP code areas, correctly reckoning only that part of the open space within each ZIP code area.

 

Example: Create a Map Showing OSM Use by Country - A start-to-finish real life example of map creation that combines various Manifold capabilities.  Copying a table of numbers from a web site, we create a map that is thematically colored to show usage of OpenStreetMap by country in proportion to the population of that country.

 

Example: Style Pane Quickstart - A tutorial introduction to using the Style pane to apply color, symbology, size and rotation to areas, lines and points in drawings.

 

Example: Format a Drawing using the Style Pane - In this example we provide a first, step by step look at how to format areas in a drawing using the Style pane.  We can specify the same formatting for all areas or use a field to automatically set formatting, a process usually known as thematic formatting.

 

Example: Connect to an OSM Vector Server - We connect to an OSM Server that provides a vector layer containing points and lines in the OpenStreetMap database.  We then show how to scrape (copy) data from the OpenStreetMap server into local storage.  We extract building footprints from the local copy.