DISTINCT

A qualifier that filters SELECT results to omit records that contain duplicate data in the selected fields, leaving only one of each instance.  il_distinct01_01.png

 

SELECT DISTINCT <field>/SPLIT <query>, ... FROM <table>

Example

Suppose we have a table called Expenses.   For each record the table has three fields: Vendor, Job and Payment.    The table records the amount of payments made to different vendors for various categories of jobs in a construction project, such as carpentry (wood work) or masonry (work with brick and stone).  Categories typically have more than one vendor who provides that job.

 

SELECT [Job] FROM [Expenses];

 

il_distinct01_02.png

A basic SELECT with no qualifier returns the Job value for all records, returning many duplicates.

 

 

SELECT DISTINCT [Job] FROM [Expenses];

 

il_distinct01_03.png

 

Adding the DISTINCT qualifier to the query returns one record for each unique value that is found within the Job field, returning no rows with duplicated values in the Job field that was selected.

 

Suppose we SELECT more than one field?

 

SELECT [Vendor], [Job] FROM [Expenses];

 

il_distinct01_04.png

 

In our example Expenses table none of the Vendor names are duplicated.   A SELECT query that returns records with two fields, the Vendor and Job field, will return a table with no duplicated records.

 

SELECT DISTINCT [Vendor], [Job] FROM [Expenses];

 

il_distinct01_05.png

If we add the DISTINCT qualifier to the query the same set of records is returned.  There are no duplicates to begin with so using DISTINCT has no duplicates to filter out.  Note that the order of records returned may be different whether DISTINCT is used or not, since sets of records in SQL are unordered by default.  

 

If we want to see records in order we use ORDER BY to order the results table:

 

SELECT [Vendor], [Job] FROM [Expenses]

  ORDER BY [VENDOR];

 

il_distinct01_06.png

 

... and ...

 

SELECT DISTINCT [Vendor], [Job] FROM [Expenses]

  ORDER BY [VENDOR];

 

il_distinct01_07.png

 

Using ORDER BY to order the results table we can see that both with and without DISTINCT the results are the same, because all records returned by the SELECT are unique with no duplicated records.

 

Notes

NULLs - All NULL values are treated as equal to each other (the behavior prescribed by the standard and implemented in other databases).  DISTINCT is one of those places where the notion of a NULL as a generic unknown value meets established practice: instead of treating all NULL values as different from all other values, including other NULL values, DISTINCT treats them as the same value. For example, using DISTINCT on a table with a single field will combine all NULL values into a single record.

 

Internal optimization - Instead of regular indexes, when processing DISTINCT and UNION / EXCEPT / INTERSECT Manifold utilizes specialized internal data structures which are optimized for separate write and read phases, with reading done without any locking whatsoever.

 

Order - Some databases implement DISTINCT so the original order of records is preserved, taking the first duplicate of a set, or such that the results table records are sorted as they would be if ORDER were applied on all fields, left to right.  Manifold Release 8, for example, does the former. This is purely an implementation detail since DISTINCT does not require the output records to be ordered in any way; DISTINCT merely requires that the results table records contain no duplicates.   The Manifold query engine makes use of that and, for maximum performance, simply removes duplicates without enforcing any further ordering for performance.    The bottom line is that the output of DISTINCT is unordered. This applies to all cases where DISTINCT is used such as within SELECT, UNION / EXCEPT / INTERSECT,  and COLLECT.

 

Long values are OK - Many databases do not allow using DISTINCT or ORDER with long values such as LOBs ("LOB" = "large object") and / or values of complex / user-defined types such as geometry.  In databases with such limits that carries over to UNION / EXCEPT / INTERSECT, including EXCEPT ALL and INTERSECT ALL, if they are implemented. The Manifold query engine does not have such limitations.  We can use any type we want, using binary order if there is no sensible type-specific order that otherwise could be used.

 

ALL and DISTINCT - ALL and DISTINCT are opposites, and generally, where one is applicable, the other is applicable as well.   With most databases ALL or DISTINCT can be used within SELECT, UNION / EXCEPT / INTERSECT, and aggregates, in the case of Manifold the COLLECT aggregate.   

 

Some databases always allow using either ALL or DISTINCT in all places. That can lead to conceptual difficulties in that the defaults are different: a plain SELECT implies ALL, a plain UNION / EXCEPT / INTERSECT implies DISTINCT, and a plain aggregate implies ALL again.   Adding to complexities that must be kept in mind, some databases, most notably Oracle, do not allow using ALL with EXCEPT and INTERSECT but only with UNION.   What other databases refer to as EXCEPT, Oracle calls MINUS.

 

Manifold's approach to the above is to (a) implement both ALL and DISTINCT logic for each of the above cases in code, (b) to preserve the uneven defaults, according to the standard and accepted practice:  SELECT defaults to ALL, UNION / EXCEPT / INTERSECT default to DISTINCT, and COLLECT defaults to ALL, but also (c) to disallow repeating the default explicitly and only to allow writing an explicit change from the default, that is, to accept SELECT and SELECT DISTINCT but to fail on SELECT ALL in order to make it clear that SELECT ALL is the default.

 

See Also

Queries

 

Command Window

 

Query Builder

 

SQL Statements

 

UNION / EXCEPT / INTERSECT