That YX Thing

Satan has a special place in the Inferno for slackers who say they use standards but who then ignore what those standards require.   When working with spatial data a recurrent problem is data that advertises itself as using the EPSG:4326 coordinate system, the so-called Latitude / Longitude coordinate system frequently used for unprojected data, but which, ignoring the EPSG requirement that EPSG:4326 data be in YX coordinate order, instead presents the data in XY coordinate order.   Lucky for us, Manifold has a check box to deal with such errors.  

 

This topic discusses what such errors are and how they have become so widespread that Manifold must provide defensive means against them.

 

Nomenclature: In geometry whether X comes first and Y second or the other way around is referred to as axis ordering because the coordinates are measured on the X axis or the Y axis.    In spatial data whether longitude (the X measure) or latitude (the Y measure) comes first or the other way around usually is referred to as coordinate ordering.  In this documentation we use axis ordering and coordinate ordering as interchangeable synonyms for the same thing.

 

The Universe Does Not Care

 

In lists of coordinates that specify locations, whether X comes first and Y second or the other way around doesn't really matter in terms of how the Universe is built.  There is no cosmic rule that says it must be one way or the other.  Just like deciding to drive on the left hand side of the road, as some countries do, or on the right hand side of the road, it is perfectly OK to choose either way and everything will work out just fine so long as everybody agrees to follow the standard.  

 

Other examples where which standard is picked does not matter so long as everybody agrees to use the chosen standard are whether processors and memory are Big Endian or Little Endian, or whether written text is read from left to right or from right to left.   In computing you can even have multiple standards, like software that can handle Big Endian or Little Endian so long as there is some way to keep track of which is which so it knows when to use Big Endian and when to use Little Endian.   None of this is rocket science.

 

But just like slackers who drive on the right side of the road will cause chaos in a country where the standard is driving on the left side of the road, people who ignore a standard that requires YX order and not XY order will cause chaos in spatial data if they use XY order.   In a perfect world such slackers would be immediately transported to that special place in the Inferno Satan has prepared for them.   In our real, imperfect world those slackers are praised by their friends for publishing flawed code on Github while the rest of us must deal with the chaos they cause.    

 

Dealing with such chaos is why Manifold has a check box captioned Use vector data as either XY or YX according to coordinate system and also why that check box is not checked by default.  How did this become necessary?


The Wonderful World of EPSG Standards

 

EPSG has been highly successful as a standard because it is rigorous, it is exact and it covers over 5000 coordinate systems with short, unambiguous numeric codes that are easy to specify.   That, of course, has made it a lure for people who think the apparent simplicity of a single code number will spare them the need to learn what those codes actually mean.

 

EPSG codes capture within a single code many important aspects of a coordinate system, such as which ellipsoid is used.   One of those important aspects captured by EPSG codes is the specification of axis ordering for a given coordinate system.  The axis order specified is as key a part of an EPSG code's definition of a coordinate system as the ellipsoid and the units of measure that are specified.   That allows the specification of coordinate systems that have been used at different times in different countries with different software, where in some cases XY ordering has been used and in other cases YX ordering has been used.   The comprehensive nature of EPSG, the result of an immense amount of diligent work by EPSG's experts, is one of its main attractions.

 

Most EPSG codes specify X, Y coordinate ordering, that is, where each pair of numbers that specifies a location gives the X axis or longitude number first and the Y axis or latitude number second.   Some EPSG codes, such as EPSG:4326, specify Y, X axis ordering where the latitude number comes first and the longitude number comes second.   

 

The EPSG standard requires coordinate ordering to be honored: the axis ordering specified by an EPSG code is not an option that users can ignore if they feel lazy, no more than an American visitor to England should feel free to ignore the English standard of driving on the left hand side of the road because that is not the way it is done in the US.  In both cases ignoring the standard causes trouble.

 

In a perfect world all software and all spatial data that claimed to use EPSG would accurately honor EPSG, including the correct use of whatever axis ordering is required by the EPSG code in play.  

 

The Real World: What Standard?

 

In the real world, most data that has been published saying it uses EPSG:4326 does not honor the YX ordering specified by EPSG:4326 and instead uses XY ordering.   That is surprising given that EPSG:4326 clearly states YX ordering should be used.

 

Using Manifold we can take a look (in the Coordinate System dialog's EPSG tab) at how  EPSG:4326 is defined and we can see that YX ordering is clearly, explicitly specified, as seen below: 

 

{ "Name": "WGS 84 (EPSG:4326)", "System": "Latitude \/ Longitude", "Base": "WGS 84 (EPSG:4326)", "Axes": "YX", "MajorAxis": 6378137, "Eccentricity": 0.08181919084262149, "Unit": "Degree", "UnitLatLon": true, "UnitScale": 1, "UnitShort": "deg" }

 

So why such a widespread error?  There seem to be a variety of causes for why such an obvious error has gotten so much traction.

 

Metadata ignored - One such cause is that systems which consume or produce spatial data usually treat axis order as an external piece of metadata supplied by the client but which is not used for analysis.  That is especially true for databases that store spatial data and, possibly, provide some functions for spatial analysis on the data.   Such systems will typically take spatial data in whatever axis order it is supplied, perform the analysis and then return the result in the same axis order in which the original data was provided.

 

If someone commits the error of tagging XY ordered data with an EPSG code that specifies YX ordering and then uses that data within a system like the above they might never notice their error.  What they get back from the black box of the database system will be in the same form as what they put in.   Everything will seem to work correctly right up to the moment that the data is finally used on a system which correctly respects the EPSG code that has been specified and which reveals the error, perhaps by visualizing the data so anyone can see the data has been flipped and rotated.  But until the incorrectly tagged data is detected the error will keep proliferating as the wrongly-tagged data is copied and re-copied throughout the world's spatial data archives.

 

An accident of English - A second cause is apparently nothing more than the random bad luck of the English phrase "Latitude / Longitude" being adopted as the most common name to refer to "unprojected" data.    Spatial data is never really "unprojected," of course: it is always in some coordinate system with so-called "unprojected" data usually being cast within a coordinate system that treats units of radial measurement, degrees of latitude and longitude, as linear units such as meters or feet.  

 

Within that "Latitude / Longitude" coordinate system for "unprojected" data the latitude number is the Y axis number, that is, the vertical axis, since latitude numbers are a measurement in degrees up or down from the Equator.  The longitude number is the X axis number, that is, the horizontal axis, since longitude numbers are a measurement in degrees to the left or right of Greenwich, England.   Given that latitude means the Y axis and longitude means the X axis, if we look at the literal meaning of the words the classic "Latitude / Longitude" name for unprojected data says the coordinates should be in Y, X order.  

 

Unfortunately for everyone, although the phrase "Latitude / Longitude" has become ubiquitous in English to mean the coordinate system for unprojected data, the mathematics orientation of programmers who create software to work with GIS seems to have been mostly on the side of classic geometric habit, where coordinates are given in X, Y order.   As a result, many users have become habituated to say "Latitude / Longitude" but to think that means "X, Y" order in terms of what numbers are actually in the data.

 

Well-meaning but inattentive users may read textual descriptions of EPSG:4326 as specifying the Latitude / Longitude coordinate system everyone has come to expect for unprojected data, and so when it is time to load their unprojected data into software that asks what EPSG code should be used for the coordinate system they specify EPSG:4326, not noticing that when EPSG says "Latitude / Longitude" EPSG also says YX ordering explicitly is required.

 

Inexpert programming - In recent years the widespread availability of source code published openly on the web has made it easy for aspiring applications creators to copy and paste other people's work into functioning applications even if they do not themselves fully understand what they are doing.   That can be a very good thing and a great way to learn to be a better programmer.   But when people copy work from others without understanding the nuances of that work, or without being able to judge if the source code they are re-using is well-written or poorly-written, they can make serious errors.

 

One such serious error made by inexpert programmers is to use EPSG codes for their apparent ability to provide a simple way of specifying coordinate systems, but to make the mistake of assuming that axis ordering is always XY.  It seems never to occur to programmers who do not look at EPSG in detail that ordering could be YX for some codes.

 

Because inexpert users also frequently make that same mistake the result of two mistakes put together is often a correct result.  If a user inputs XY data with an EPSG:4326 code that says it is YX, but the application in use makes the same error and ignores the YX specification under the assumption all data is always XY, then the wrongly-tagged XY data nonetheless will be utilized as XY data and all will be well.

 

Software which claims to use EPSG codes but fails to honor EPSG standards is easy to find, with the GeoPackage format being an example.  GeoPackage uses EPSG codes to specify coordinate systems but then always uses XY order no matter what the EPSG code requires.  The GeoPackage group is aware of that failing, but instead of correcting it the group states that the EPSG standard should be ignored if an EPSG code specifies YX axis order.

 

Grown-ups, of course, will ridicule the idea that a standard should be used except when design flaws in an application prevent it from being used correctly, in which case it should be ignored.   Responsible people will also criticize a package which knowingly publishes false information, as in the case of GeoPackage software which writes XY data using an EPSG:4326 code that explicitly states the data is YX.   Knowingly publishing false data is one way so much erroneous data finds its way into the world's spatial data archives.

 

Reasoning by Analogy -  One final reason why EPSG axis ordering errors are so widespread is that some users and some programmers are too quick to reason by analogy to what they already know instead of learning how to correctly use something new.

 

EPSG is not the only standard for specifying coordinate systems.   In particular, there are a variety of "open" standards introduced by other organizations, such as OGC.   Some such standards are hard-wired to always require XY ordering, GeoJSON being the best example since GeoJSON coordinate ordering is always XY.  But most standards, for example, WFS, WKT, WKB and even shapefiles, are simply silent on axis ordering even though they are frequently used by people who expect XY ordering all the time, apparently for no better reason than they personally have never used anything but XY ordering.   

 

Complicating the picture is that applications will often use a mix of technologies that have arisen in the context of different standards.  For example, a programmer may configure a web server utilizing WFS protocol that is fed data by programs which have always used shapefiles in situations where the data has used only XY ordering.   But at the same time to take advantage of the vast, universal coverage of EPSG the programmer will use EPSG codes to specify coordinate systems.  Using EPSG codes sounds like a simple resolution to a complex problem, like adding boiling water to make noodle soup from a packet of pre-packaged dried pasta and spices.

 

But using EPSG leads to mistakes if the programmer does not realize that EPSG requires support of both XY and YX ordering depending on what a particular EPSG code specifies. It is not an option.  If a programmer creates an application like GeoPackage that always assumes XY no matter what the EPSG code specifies, trouble follows.

 

When caught out in such errors by users or by other software using EPSG codes correctly, the creators of flawed software will sometimes push back with the excuse that OGC standards say data should always use XY ordering.  They are often backed in such nonsense on forums by self-appointed experts who cite various OGC standards without really knowing all the details that are in those standards.

 

In reality, all other standards setting bodies, including OGC, insist that when a coordinate system is defined with specific axis ordering as being XY or YX that coordinate order must be honored.  OGC in particular explicitly says that if we use EPSG codes we are obliged to use them correctly within the EPSG standard.  None of that should be surprising since standards setting bodies are not normally in the business of arguing that standards should be disregarded by people who cannot be bothered to honor them.

 

It also makes no sense to state that when using EPSG the rules of some other standard, such as GeoJSON, should be used instead of EPSG rules.   That is as fundamentally wrong as saying that because the standard in Louisiana is to drive on the right, a fine thing when in Louisiana, then people who normally live in Louisiana should continue to drive on the right when they visit England, a very bad thing when in England.

 

The end result of all the above is fairly negative:  most software and most data uses XY ordering even when it announces that it is using EPSG:4326 and thus should be using YX ordering.   We never know in a particular case without examination whether a data source is falsely claiming YX when in fact it uses XY.  

 

It also is common to connect to web server data sources serving data using, say, WFS protocol where one server will provide some data sets claiming EPSG:4326 in the correct YX form while another server will provide other data sets also claiming EPSG:4326 in the incorrect XY form.   On occasion we can even run into a WFS server where some EPSG:4326 data sets are published in YX form while others using the very same EPSG:4326 code will be published incorrectly in XY form.  Satan smiles.

 

Accepting Reality

 

There are two tactics Manifold uses to deal with the widespread errors caused by software or incorrectly tagged data  ignoring the axis specification of EPSG codes.

 

The first is to deliberately make an explicit exception for software which is known to contain such flaws, for example, GeoPackage.   Manifold is aware GeoPackage ignores ESPG axis specifications, so the Manifold dataport for GeoPackage will always import or link data from GeoPackage .gpkg files as XY data, even if the data is tagged with an EPSG:4326 code.   That is a highly unsatisfactory way of dealing with the problem, but it is the least worst of the various possible approaches.   

 

The second tactic is to provide a check box captioned Use vector data as either XY or YX according to coordinate system and also to have that box not checked by default.   That is a miserable thing to have to do, to set an option by default to ignore what a truly brilliant standard says, but the sad reality is that so much spatial data and so much software ignores the EPSG standard for coordinate ordering that checking the box by default would result in more bad data than good being brought into Manifold incorrectly.

 

Providing a check box option allows correct operation by people who know their spatial data honors the standard or who are doing interchange with a package that also honors the standard.

 

The good news is that usually with Manifold it is relatively easy to tell if data or other software is falsely stating use of YX ordering when it in reality uses XY:

 

 

A Conclusion

 

Is all the above an incredible amount of hassle for what should be a very simple thing?   Yes, of course.  But it is always that way when people kill off the usefulness of simple standards by slacker disregard for those standards.

 

Keep in mind that neither this essay nor an "honor the standard" check box would have been necessary if only people who said they honored the EPSG standard would have indeed honored it.   Anyone who is unhappy at having to second-guess whether software that claims to honor EPSG actually does so should drop a note to those software developers who claim to honor EPSG but then ignore key parts of the EPSG standard.