Street Address Geocoding

Street address geocoding is the process of determining an estimated latitude and longitude position for the location of a street address.  Reverse geocoding produces a list of street addresses near a given latitude and longitude position.   Given a geocoding data source, Manifold can do street address geocoding.  If the data source in use supports reverse geocoding, Manifold can do reverse geocoding as well.

 

A very common task when working with spatial data is to take a table where each row has a street address and to add latitude and longitude fields for the estimated location of that address.

 

For example, suppose we have a table of ten records where each record gives the restaurant number and the street address of an In-n-Out fast food restaurant in California, Utah or Texas.  This is the same table used in the Example: Street Address Geocoding topic.

 

eg_street_geocode01_01.png

 

The classic GIS way to street address geocode that table would be to add the estimated latitude and longitude of each restaurant.

 

eg_street_geocode01_13.png

 

In Manifold we normally jump straight to creating a geom field that contains a point at the right location for each record.

 

eg_street_geocode01_04.png

 

We can then see the location of each restaurant in a drawing.

 

eg_street_geocode01_05.png

 

The map above shows the geocoded restaurants drawing as a layer above a Google Street Map layer.

The Geocoding Process in Manifold

Manifold geocodes a table of addresses using geocoding data sources, which are normally web-based geocoding servers like the Google geocoding server.     We use SQL within Manifold to use a Manifold SQL geocoding function such as GeocodeAddress to geocode records in the table, either to update the table or to SELECT ... INTO to create a new table.   

 

The GeocodeAddress function takes as arguments a data source in the project and the name of an address field that contains an address as a string such as a varchar data type.   The function returns the longitude/latitude location of that address as given by the data source, providing it in Latitude / Longitude projection (WGS 84 base) as decimal coordinates in a float64x2 value.

 

At the present writing Manifold has nine web-based geocoding data sources built in, from Bing to Yandex.   Geocoding servers vary wildly in their characteristics with some being free to use, some requiring an API key, some allowing a limited number of free geocodes per day or per second with additional geocodes requiring a payment.   Google, for example, used to provide free access very generously.  These days, you probably will need an API key from Google to use their geocoding server.

 

These examples use Microsoft's Bing geocoder.  We can get a free developer key from Bing by visiting the Bing Maps Dev Center, creating an account (free), logging in, and then under My account drilling down to get a free API key, suitable for testing and development, but not allowing large scale geocoding for production use.

Create a Geocoding Data Source

We must have a geocoding data source in our project to do any geocoding.   We will create a geocoding data server using Microsoft's Bing geocoder.   

 

In the main Manifold menu, choose File - Create - New Data Source

 

dlg_create_bing_geocoder.png

 

Enter a short, memorable name for the geocoding data source.  We will be keyboarding this into queries that do geocoding, so keep it short.  In the example, we use bing.  Choose Web Server: geocodeserver as the Type and then in the long list available in the box below, choose Bing Geocoder.

 

When we visited Microsoft's Bing Maps Dev Center, signed up and obtained a free API key, we copied and pasted that API key to some safe storage location on our computer.  We now Copy that API key, and then we Paste it into the API key box.    Press Create Data Source and a new data source called bing will appear in our project.

 

To try out the data source, launch a Command Window and enter the expression

 

VALUES (GeocodeAddress([bing], 'Chicago'));

 

eg_test_geocoder01_01.png

 

Press the ! button in the main menu to run that query.  

 

eg_test_geocoder01_02.png

 

Bing will be happy to return the result

 

[ -87.6324462890625, 41.88425064086914 ]

 

which gives the longitude and latitude of where Bing considers the center of Chicago to be.  

 

Geocoding servers also vary in their ability to digest what we might think is a reasonable street address.  Because most were designed to handle geocoding for web sites where users tend to specify addresses in a wildly inconsistent manner, most are reasonably good at being able to parse a wide range of addresses.   For example, most geocoders can parse an address that consists of the name of a big city, like our example above.   

 

Why do geocoders like Bing think that the string 'Chicago' fed to them means the big city of Chicago in Illinois and not the locations also named Chicago in South Africa, Zimbabwe, Guatemala or Mexico?  Most likely because web searches for Chicago seek the big city in Illinois over the others by a wide margin.

 

If we provide a more detailed street address most geocoding servers can do even better.  Bing is happy to recognize an address such as

 

VALUES (GeocodeAddress([bing], '1600 Pennsylvania Avenue NW, Washington, DC 20500'));

 

eg_test_geocoder01_03.png

 

In the illustration above we have arranged the query to use two lines.  SQL is not sensitive to white space, so we can arrange what we write on multiple lines, with indented lines as we like to improve legibility.    Press ! and Bing will return the result of

 

[ -77.03655, 38.89768 ]

 

eg_test_geocoder01_04.png

 

...meaning that Bing thinks the White House (at that address) is located at latitude 38.89768 and longitude -77.03655.     

 

The above examples show how we can quickly test that a geocoding server we have added as a data source is working correctly.  We do not actually geocode one address at a time, but instead we use Manifold to geocode many addresses at once, as shown in the Example: Street Address Geocoding topic.

Geocoding Functions

The following Manifold SQL functions assume a geocoding server as a data source:

 

GeocodeAddress

Given a data source and an address, return the longitude/latitude coordinates as a float64x2 value.
 
Example: using a Google geocoding data source called GoogleG, return the longitude,latitude coordinates for 'Chicago'.
 

VALUES (GeocodeAddress([GoogleG], 'Chicago'));

 

GeocodeAddressMatches

Given a data source and an address, returns a table of matches. Each match is a string with the format of the string depending on the geocoding server in use.   Most geocoding servers return JSON.

GeocodeLocationMatches

Reverse geocoding.  Given a data source and a longitude/latitude location as a float64x2 value, returns a table of matches around that location.

GeocodeAddressSupported

Given a data source, returns true if the data source supports GeocodeAddress and GeocodeAddressMatches functions.

GeocodeLocationSupported

Given a data source, returns true if the data source supports the GeocodeLocationMatches function.

 

See the Example: Street Address Geocoding topic for examples using the GeocodeAddress function.

 

Reverse Geocoding

Reverse geocoding is the process of finding street addresses near a given latitude/longitude location.   To do reverse geocoding, we use the SQL query function GeocodeLocationMatches in an SQL query.

 

We first create a reverse geocoder data source.  Note that the Bing geocoding server is a different data source than the reverse geocoder.  We name our reverse geocoder bing_r.

 

dlg_create_bing_reverse_geocoder.png

 

Press Create Data Source and a new bing_r data source appears in the Project pane.

 

We can now launch the Command Window and try out the reverse geocoder.  The syntax of the  GeocodeLocationMatches function is:

 

GeocodeLocationMatches(<dataSource>, <valuex2>)

 

We enter the expression:

 

? CALL GeocodeLocationMatches([bing_r], VectorMakeX2(-77.03655, 38.89768))

 

We use a CALL (no semicolon at the end) because the function we use returns a table.   We use the VectorMakeX2 function to create a valuex2 type vector value from the two numbers we provide, which are the longitude and latitude values we obtained from our test for geocoding the 1600 Pennsylvania Avenue address.

 

eg_test_geocoder01_05.png

 

The illustration above formats the query using two lines, so it will fit into the illustration.   Press ! to evaluate the expression.

 

eg_test_geocoder01_06.png

 

The result is a table with an index, zero meaning the first record, and a very long Value that is what Bing returned in JSON notation.

 

{"__type":"Location:http:\/\/schemas.microsoft.com\/search\/local\/ws\/rest\/v1","bbox":[38.893817282429325,-77.043167265744714,38.901542717570678,-77.0299327342553],"name":"1600 Pennsylvania Ave NW, Washington, DC 20006","point":{"type":"Point","coordinates":[38.89768,-77.03655]},"address":{"addressLine":"1600 Pennsylvania Ave NW","adminDistrict":"DC","adminDistrict2":"City of Washington","countryRegion":"United States","formattedAddress":"1600 Pennsylvania Ave NW, Washington, DC 20006","intersection":{"baseStreet":"Pennsylvania Ave NW","secondaryStreet1":"W Executive Ave NW","secondaryStreet2":"Madison Pl NW","intersectionType":"Between","displayName":"Pennsylvania Ave NW, between W Executive Ave NW and Madison Pl NW"},"locality":"Washington Mall","postalCode":"20006"},"confidence":"High","entityType":"Address","geocodePoints":[{"type":"Point","coordinates":[38.89768,-77.03655],"calculationMethod":"Rooftop","usageTypes":["Display"]}],"matchCodes":["Good"]}

 

We have to read Bing's documentation to understand what Bing returned, but even without doing that we can guess that Bing found the 1600 Pennsylvania Avenue address at that location.

Geocoders Provide Approximate Locations

As anyone knows who has followed a web-based mapping application to a location provided for a given street address, such addresses are usually only approximate and can be wildly inaccurate.   Geocoders usually do not know where a specific address is located; instead, they maintain a database of street segments with a range of addresses for each particular street segment.   If a Main Street segment along a particular city block is stored with a range of addresses from 20 to 40, an address at 30 Main Street will usually be interpolated as halfway down the block without knowing exactly where the address is located.   In regions of sparse inhabitation, such as rural areas, geocoders can be wildly inaccurate and may interpolate address locations that are kilometers away from the actual location.

 

As the largest web providers become better at data mining their users, triangulating travel using the locations of surveilled cell phones and at blending information from multiple data sources, more refined geocoding strategies have become possible.   Geocoders will often no longer do interpolations within well-known urban areas but instead will utilize databases of structures maintained by cities or private providers to correlate addresses to specific locations.   Geocoders are also getting better in rural locations as sweeping efforts to digitize the exact locations of specific addresses (down to an access road or gate in the case of ranch and farm properties) in service of better emergency response have resulted in the accumulation of databases of exact address coordinates in many rural areas.

 

But even in such cases the exact location where to place a dot to denote the location of a particular address is often a matter of choice.   Should a dot be placed at a centroid of the real estate parcel associated with a given street address?  For large parcels, such as the one at 1600 Pennsylvania Avenue in Washington, a dot placed at the centroid would be far from any access road or entry gate.   Should the location dot be placed at the main entry gate?   At the postal box for the parcel?  At the center of the main building?   At the center of the facade of the main building that fronts the road used for the address?  Should the point be on the boundary of the parcel or along the edge of any sidewalk?

 

Such factors may seem like minor details, but they can have a big effect on how locations derived from street addresses are used for emergency service response, parcel delivery, computations involving real estate parcels or even simply finding a restaurant.  In the examples above several of the In-n-Out restaurants are located in shopping centers with the address of the restaurant being the address of the shopping center.   Find a dot at the centroid of a shopping center may be easy enough but then locating a restaurant that could be hundreds of meters away might not be so easy if it is not immediately in sight.

 

See the Example: Street Address Geocoding topic for an example of how two different geocoders may place the same address at two significantly different locations.

 

Getting Full Information from a Geocoding Server

Most geocoding data servers return a JSON string when asked to geocode a given address.   Some servers provide additional information within that JSON string, which we can acquire using the GeocodeAddressMatches function and then extract those parts of the JSON string of interest using the StringJsonValue function.

 

dlg_create_bing_geocoder.png

 

Suppose we have created a data source called bing which is a geocodeserver data source using the Bing geocoder web server.    Instead of "API key goes here" we would enter the Bing API key we have obtained from Microsoft (use Google to learn how to get a free Bing API key).  

 

The following query geocodes an address and returns a record for each match with the name and confidence returned by Bing for each:

 

eg_geocode_add_matches01_01.png

 

The query text is:

 

SELECT

  StringJsonValue([value], 'name', true) AS [name],

  StringJsonValue([value], 'confidence', true) AS [confidence]

FROM CALL GeocodeAddressMatches([bing], 'New York, Park Ave, 4');

 

eg_geocode_add_matches01_02.png

 

The result shows three matches with the confidence for each, which we have arranged into an understandable table using Manifold's StringJsonValue function.

 

The following query expands the above by adding the coordinates returned for each match, plus the latitude and longitude values, plus the calculation method returned by Bing.

 

eg_geocode_add_matches01_03.png

 

The query text is:

 

FUNCTION decodeMatches(@matches NVARCHAR) TABLE AS (

  SELECT

    [index],

    CAST (StringJsonValue([value], 'coordinates', false) AS FLOAT64X2)

      AS [coord],

    StringJsonValue([value], 'calculationMethod', true) AS [method]

  FROM CALL StringToJsonArrayValues(@matches)

) END;

SELECT

  StringJsonValue([value], 'name', true) AS [name],

  StringJsonValue([value], 'confidence', true) AS [confidence],

  SPLIT CALL decodeMatches(StringJsonArray([value], 'geocodePoints'))

FROM CALL GeocodeAddressMatches([bing], 'New York, Park Ave, 4');

 

The query defines and uses a function, to compartmentalize the process of decoding the JSON values returned by Bing.   When we run the query, we get more information:

 

eg_geocode_add_matches01_04.png

 

See Microsoft's documentation for Bing for a current list of what information Bing returns when geocoding addresses.

Notes

Got keys?  -  Most of the many geocoding servers Manifold lists provide free API keys for developer use.  Some provide some "reasonable" number of free geocodes per day with no key required.  Policies change quickly, and some geocoding server providers may without warning ban a user they feel has exceeded the guidelines for free use.  It is best to get an API key which allows some specific number of free or paid geocodes and to then follow the rules.   If we are in search of a free geocoder and Bing and Google do not work for us, try one of the others.  

 

NULLs - When a geocoding server returns a NULL for an address that usually means one of several  things:

 

 

 

Manifold Geocoding Database - A data source could also be the Manifold dataport for connecting to the legacy Manifold Geocoding Database (GCDB) used with Release 8 and built on the US Census Bureau's TIGER database.  That data source should be considered deprecated given the greater accuracy of more contemporary data sources.   When using the GCDB as a data source, the full GCDB data source must be linked.

 

See the Example: Street Address Geocoding topic for an example using GCDB as a geocoding data source.

 

See Also

Tables

 

Queries

 

Data Types

 

Web Servers and Image Servers

 

Street Address Geocoding

 

Command Window

 

Example: Add a Spatial Index to a Table - A typical use of an index is to provide a spatial index on a geom field in a table, so the geom data can be visualized in a drawing. This topic provides the step by step procedure for adding a spatial index.

 

Example: Create a Geocoded Table from a Drawing - A partner example to this topic.  A geocoded table has records with a latitude and longitude for each record.   This example starts with a table for a drawing of points where the geom field in the table contains geometry information for each point.   We extract the Y and X locations for each point  from the geom field to create latitude and longitude fields in the table for each record.

 

Example: Street Address Geocoding -  Geocode a table of street addresses using the Google Geocoder.

 

Example: Create a Drawing from a Geocoded Table - A partner example this topic.  A geocoded table has records with a latitude and longitude for each record.   This example starts with a table containing a list of cities with a latitude and longitude field for the location of each city.   We create a geom from the latitude and longitude fields using a template in the Transform panel and then we create a drawing that shows the cities as points.  This example shows all the infrastructure steps involved.