Tables

icon_table.png Everything in Manifold is stored in a table.  Every bit of data, regardless of its nature - vector, raster, text, whatever, - is stored in a table.   Most GIS users naturally understand what a table is when they see data, such as attributes, in a classic row and column table presentation, but it may be a new notion to think of tables as also storing data that is used to create visual displays, such as geometry data for drawings or tile data for images.

 

To create a new table:

 

  1. Choose File - Create - New Table, or use the context menu in the Project pane.

  2. Specify a Name for the new table.

  3. Press the Edit Schema button and use the Schema dialog to add fields and indexes and computed fields as desired to the table.

  4. Press Create Table.

 

We can easily create tables as in the above, but most often the tables we work with are created as a result of importing or linking data, either tables or drawings or images, or they are created by SQL we write.

 

dlg_create_new_table.png

All Data is in Tables

When we open visual windows like drawings or images, in reality those are just a different way of showing data in a table somewhere, for example, geometry data for features in drawings or tile data for pixels in raster images.   Drawings and images are just a specialized way of displaying data in tables.  A drawing itself contains no data: it has only a few bytes of information saying what table to use, what field in that table contains the desired geometry data, and how geometry data from the subject table should be displayed.  All of the data shown in the drawing is in the table.

 

Drawings Show Data from Tables

il_drawing_from_table.png

The shape and locations of objects in drawings are stored in geometry fields in a table, using a Manifold data type called a geom.   Each object in a drawing is one record in that drawing's table.   To create the drawing, for each record in a table Manifold reads the geometry data in that record's geom field and draws whatever object is stored in the geom.   In the illustration above, each record in the table has a geom field that describes the shape and location of a province in Mexico.    By tradition, the geom field in a Manifold drawing's table is named "Geom," so when we look at a list of fields in a table we can tell right away from the name of the field which one is the geom field.   There is nothing special about that name.  The geom field could be called "Harry" if we like, although such an odd name would not at a glance indicate it contained geometry data.  

 

Images Show Data from Tables

il_image_from_table.png

The shape and locations of objects in images are stored in tile fields in a table, using a Manifold data type called a tile.   The default name for a tile field in a table is named "Tile," so it is obvious what data that field contains.   As with the default name of "Geom" in drawings, there is nothing special about the default name "Tile".  It just makes more sense as a default name instead of, say, "Josephine."

 

There are many different types of tiles to allow storing the different types of raster data found in images.   Manifold uses the word "images" to refer to all forms of raster data, including raster data used for photographic images, multichannel raster data from satellite sensors, and other raster data such as terrain elevation data.  Any raster image in Manifold is composed of a checkerboard of tiles, where each tile is a record in that image's table.  In the illustration above, the aerial photograph showing an airport on the coast in Italy is composed of many tiles, each one of which is 128 pixels by 128 pixels in size, with three channels of 16 bit integer data for each pixel.  The selected row in the table contains the tile that is displayed just to the left of the end of one of the runways.

 

Labels Show Data from Tables

il_labels_from_table.png

The locations of labels and the text they display also are data in a table, the table which defines the parent drawing of the labels.   The geom field in that table specifies the location of the label and a text field in the table provides the text the label displays.   Each label corresponds to a single object in the parent drawing and thus a single record in that drawing's table.  

 

Because all data in Manifold is stored in a table somewhere and is fully exposed for us to use however we prefer, that means that every bit of data can be accessed via SQL, and everything that can be done with other user interfaces can be done with SQL or with other table interfaces like the Transform panel.  That makes it very easy to automate what we do and greatly magnifies the power and convenience of Manifold as a GIS and as an incredibly powerful tool for spatial data manipulation.

 

For example, to format one drawing we can use the Style panel interactively with a point and a click.  But to format a thousand drawings we do not want to point and click on each drawing.  Suppose our organization has decided on a standard background color for all drawings and we want to apply that color to a thousand drawings in a project? That is just a simple update of records in a meta table as in this example.   A few mouse clicks in Manifold can do that for thousands of drawings at once by using the Select and Transform panels, easily automating what could take hours or days in a system that did not expose all data of interest in tables.

Table Windows

i_table_parts_labeled.png

 

i_table_parts_cursor.pngDouble-clicking open a table in Manifold will show it in a table window, as seen above.   Tables show data in a classic grid presentation, with rows and columns.  Each row is one record in the table and the various columns show the values for fields in each table.    

 

Manifold uses the words row and record as synonyms and the words column and field as synonyms as well.   If we say column handle we mean the same thing as field handle, for example, and if we say row handle we mean the same thing as record handle.    In addition, GIS people will often refer to fields as attributes.  

 

Default actions, such as clicking the Enter key to begin editing a cell, will apply to the current cell, which is marked by a dotted line box called the cursor.  

Index Required

ico_nb_arrow_blue.png To be editable, tables must have an index and they cannot be read-only.  If a table has a light gray background in all of the columns it is not editable.  Having an index is a good idea in any event because it makes work with tables faster and it enables facilities like interactive selection.   See the Add an Index to a Table topic.

 

For user interface basics for tables, see the Getting Started , User Interface Basics, and the Editing Tables topics.

Schemas

A table has one or more fields with each field being a particular data type.  A table can also have zero or more indexes and zero or more constraints. All of these aspects of a table's structure taken together are referred to as a table schema.   We can edit schemas to add, remove, or rename fields, constraints and indexes using the Edit - Schema dialog.

 

Everything in a schema has a name and each name must be unique in the schema.    For example, we may not have a field named x and an index named x in the same table, but we may have a field named x in one table and an index named x in a different table.   We can also have indexes named x within several different tables in the same database.   The names of indexes do not have to be unique within a database in Manifold.

tech_ravi_sm.png

 

Important:  A table must have a btree index in it to support selection as well as commands like Select All, Select Inverse and Select None and Copy, Cut and Delete on records.  Many tables created by Manifold, such as the table created for a drawing when drawings are imported or created, will automatically have a btree index created in them but tables that are imported or opened in external data sources might not.    

 

The mfd_id Field

Tables that contain data for drawings must have a key field with unique values and an index built on that key fields.  That is such a common requirement that Manifold has a special, built-in facility to make that easy and automatic: the mfd_id field.   When a table contains a numeric field named mfd_id, Manifold will automatically populate that field with a unique value for every record.  If a table has an mfd_id field and we add new records to that table manually, we can, if we like, provide whatever value we like for that mfd_id field, so long it is not duplicate the value in the mfd_id field for some existing record, and Manifold will be happy to use the value we provide.   But if we add a new record and we do not provide a value for the mfd_id field, Manifold will automatically put a unique value in that field for us.  

 

The mfd_id field is a special case provided to make it easier to create drawings from tables for GIS use.   No other fields in a Manifold table are like the mfd_id field.  The name (as with any name that begins with mfd_) of the field is reserved.   Although the mfd_id field can be any numeric type, the numeric data type should be large enough to handle the total number of records we expect to have in that table.   When creating tables for drawings, Manifold automatically creates the mfd_id field as an INT64 data type.  By tradition, when adding an mfd_id field to a table, Manifold users will also create it as an INT64 data type.

 

Tables must have an index to be fully operational, for example, to enable manual selection and to be editable.    An index requires a key field, so if we manually add an index to a table many Manifold users will add an mfd_id field to the table to use as a key field.    See the discussion in the Indexes topic and in the Add an Index to a Table topic.

Working with Tables

Tables can have just a few records or they can have billions of records.  Manifold provides a variety of controls to provide information and to adjust the display of table windows as we like.

 

 

 

 

 

 

 

Managing Columns in Tables

 

With the focus on the opened table, choose the Layers panel in the Contents pane to manage which fields appear in a table and the column widths used for each field.

 

 

 

 

 

il_table_layers01_01.png

 

 

The display above shows a table with 13 fields that is open on the left side of the desktop with the Layers panel open on the right side of the desktop.   

 

 

il_table_layers01_02.png

 

 

We can adjust the width of a column by either double-clicking into the numeric width of the field, for example, changing the 72 point default width of the mfd_id column to, say, 38 points, or by dragging the column's border in the table.

 

 

il_table_layers01_03.png

 

 

If we drag the border of the mfd_id column to make it a narrower column the point size readout in the Layers pane will automatically be updated.

 

 

il_table_layers01_04.png

 

 

To hide a field, double-click the small box at the right to toggle it off/on.  In the illustration above we have hidden the Country field.

 

To change the order in which fields are displayed, Ctrl-click the row for that field in the Layers panel to select it, and then use the up and down arrows to move the field up or down in the list.  In the illustration above we have moved the mfd_id field down one row, so that it displays between the Town column and the Year column in the table window.

 

As with layer characteristics when the Layers panel is used with maps, a change will apply to all selected rows.  If we select four rows and we double-click one of them off to hide it, all four of the rows will be hidden.   If we select three rows and then press the Move Down arrow button, all three rows will move down.  

 

 

il_table_layers01_05.png

 

 

For example, suppose we select all rows where the column width is 96 points.   We double-click into the first such selected row to change the width of the column to 72 points.  

 

 

il_table_layers01_06.png

 

The moment we press Enter to accept the edit, all of the selected rows will have their column widths also changed to 72 points and the width of the columns in the table window will be adjusted.

 

The usual selection moves and keyboard shortcuts work.  Ctrl-A to select all, Shift-Ctrl-A to unselect all, Ctri-I to invert the selection.   Ctrl-click on a row to toggle it selected or unselected.   Ctrl-click on a row to select it and then Shift-ctrl-click on another row and all the rows in between will also be selected.

 

Saving a project will save current Layers panel settings for a table.  

Query Results Tables

Query results tables are virtual tables in that they are constructed on the fly to show the results of a query.  To make them permanent we would use SELECT ... INTO to create a real table.  However, for more effective browsing of results tables we can use many display features such as the Layers panel and Filters with query results tables.

 

 

il_table_layers01_07.png

 

For example, in the illustration above we have altered settings in the Layers pane to hide the mfd_id field in the results table and we have adjusted widths of columns.  Such settings go away with the next run of the query but they are so quick to specify they can be very convenient when browsing a complex results table.  

 

For example, if we are only interested in two fields out of twenty in a results table, we Ctrl-click on those two fields in the Layers panel to select them, we Ctrl-I to invert the selection, and we double-click any of those selected fields OFF to hide all eighteen of the selected fields.  That is a very quick way to hide all but two fields in a results table, taking but three seconds for an experienced Manifold user.

Sorting Tables

Ctrl-click on a column head to sort the table by that column.   Shift-click additional column heads to sub-sort by those fields.   See the Sorting Columns topic for details.

Tooltips

A tooltip is a small information box that appears when the mouse cursor hovers over an item.   Many fields will have tooltips that show their contents in more expanded form.   For example,  the expression used for a computed field appears in the tooltip for that field's column handle.

i_tabletooltip01.png

Tooltips can be extensive and can display Unicode characters, as in the screenshot above of a tooltip that shows the contents of an nvarchar data type text field.

 

il_table_column_tooltip.png

 

Tooltips on column headers will show the name of the field as well as the data type:

Selection

Selection in an interactive, Manifold sense is a different concept but is related to the use of SELECT in SQL.   Selection in Manifold is an interactive user interface way of marking some records in tables or some objects in drawings as being of interest.

i_ctrlclick_row01_01.png

 

Ctrl-click anywhere in a row to select or to deselect that record.  Selected records will be shown in red selection color.

i_ctrlclick_row01_02.png

Ctrl-click another row to add that record to the selection.  

 

i_ctrlclick_row01_03.png

 

Ctrl-click and drag will draw a selection box that will select all records any part of which falls within the box.   Alt-Ctrl-click and drag will draw a deselection box that will deselect all records any part of which falls within the box.    We can also use the Select dialog to select records in the table.   See the Selection topic for more info and examples.

Show only Selected Records

The View - Filter command allows us to show only selected records in the table window.

 

i_ctrlclick_row01_04.png

With two records selected as in the above examples we choose View - Filter - Selected.

i_ctrlclick_row01_05.png

Immediately the table window re-displays to show only selected records.   If we wanted the table window to show only those records that were not selected we could have chosen View - Filter - Unselected.    To show all records again in the table window we choose View - Filter - Clear Filter.

Filters

Filters are a great way to show only desired records in a table window.  They are incredibly fast and handy because they require no keyboarding.   Right-click on a cell that has a desired value and then choose a filtering expression based on that value.  Only records which have that expression value for that field will be displayed.

 

il_filters_table01_01.png

 

To display only those records that the fclass field says are supermarkets we right-click on an fclass field cell that already has supermarket in it.

 

il_filters_table01_02.png

 

In the Add Filter menu we choose fclass = 'supermarket' as the filter we want.

 

il_filters_table01_03.png

 

The table window immediately re-displays to show only the supermarket records.    We add more filters using other fields if we like, and we can combine filters with selection.  See the Filters topic for details.

Big Tables

Working with smaller data may have accustomed us to going back and forth between views of the data as a drawing or as a table window.   When a drawing has only a few tens of thousands of features it is still realistic to switch to a table view of the objects that allows us to interactively scroll through a table.  Table windows in Manifold also allow easy interactive work with such data as well.   

 

But the situation changes for extremely large drawings and tables, which can involve billions or even trillions of records.  That is a situation we might not have experienced if we have been working with older GIS software that cannot handle really big data.  Life is different when tables have billions of records.  Even just a few million records are too many to manipulate interactively with a table display that shows only a few dozen records at a time.  Seeing a tiny sample of the data in a table window may be useful, but actually manipulating the entire table through a keyhole view of a table window?  That's like trying to paint a line on the ground across the entire United States by working through a window that allows painting only one screen at a time.    See the Note at the end of this topic on Really Big Tables for why a comparison to the size of the United States is actually an understatement.

Table Fill Strategies

Manifold employs several strategies to handle tables that may involve billions or trillions of records while still providing the interactive convenience of table windows in a form that GIS people expect.  The core idea behind these strategies is to provide the expected user experience in a way that protects inexpert users from launching a process that could take years with a really big data source.  Manifold has the ability to connect to such truly immense data sources that such safety measures are a good idea.  

 

A fill strategy is how Manifold populates records into ready status for display in a table window.   Strategies are necessary because when a large data source, such as the Google table for imagery of the entire Earth, contains trillions of records it is just plain stupid to try to fetch them all when at most only a few thousand could ever be seen by the user in the table window and only a few tens of billions would fit on the machine in any case.

 

Fill strategies include:

 

 

 

 

 

The combined effect of the above strategies results in very fast table population even with immensely large data sources.   

 

Important:  To see the total number of records in a table, open the table and in the Contents pane click on the Component panel.

 

il_table_50k_records.png

 

In the above illustration we have pressed Ctrl-End to jump to the end of a table window that shows the first 50000 records from a large table.   The icon in the next to last row indicates there are more records in the table.

 

il_table_50k_records_contents_component.png

 

The Component panel of the Contents pane reports how many records the table contains.

Placeholder Records

il_table_placeholder_records.png

 

Table windows display two special records at the very bottom of the table:

 

btn_fill_record_placeholder.pngThe fill record placeholder is a row at the bottom of the table that shows where records being filled into the table window will appear.   Small table windows fill so rapidly we will never see it.  When a table is fetching enough records from a data source that is slow enough (such as from a slow web server source or other slow data sources), the fill record placeholder will show blue preview color indicating records are still being fetched.   If the table window finishes fetching records before the display limit is reached the fill record disappears.   If the table window hits the display limit of 50000 records yet the table contains still more records the fill record remains with a background color of gray and a fill record icon in the record handle.  The table window above displays records from a table with very many records, so only the first 50000 are displayed and the fill record remains to alert us that the table contains more than 50000 records.

 

btn_new_record_placeholder.pngThe new record placeholder is a row at the bottom of the table, marked with an asterisk * icon in the record handle, that allows us to manually add new records to the table in the usual way.  See the Editing Tables topic.

How Records are Ordered for Display

How are records ordered for display?  As with most database systems, in Manifold they are not at all ordered by default.  

 

Tables in modern databases should not be presumed to display records in a fixed order the way a list in a word processing document might.  Big data sets tend to be complex with many different possible ways of sorting the data for ordered display.  Since sorting big data is computationally expensive, modern databases focus on performance and flexibility by storing data within efficient structures that can most easily be resorted as desired to show records in some particular order.  

 

Manifold does the same, storing data efficiently in unordered form.   To see records in any desired sort order we utilize a query employing constructs such as ORDER BY.   That can come as a surprise if we are not used to working with databases and expect records to always be displayed in some fixed order, perhaps in the order in which they were manually added to a table.  But part of learning to work with databases is remembering that what we see in a table window is unsorted.  Even as we scroll the order of records can change due to insertions or deletions by us or by some other process also connected to the same data source.

 

As noted in the Essay on browsing table windows, browsing a table by scrolling through records in a table window has limited utility, as in the case of big tables we see what is usually a small fraction of records among very many, and we see them in no particular order.  The order in which we see records in a table window is a function of performance enhancing mechanisms that allow us to reach into what could be very, very large tables, possibly in remote data sources to which connections have limited bandwidth.  That sort order can change.

 

For example, to improve performance Manifold table windows cache those records that have been displayed.  If we scroll a table window after inserting new records the window will seamlessly reset the display so that as the new records scroll out of cache they will reappear in the order set by the primary key.  We can see that effect with a simple experiment.

 

eg_addrec01_01.png

 

We create a new table with nvarchar type fields called Name and Comments and we add three records as seen above.  Manifold will automatically populate the mfd_id primary key ID field by auto-incrementing it starting from 1 as we add records to a new table, or we can manually populate that field with unique values.   In the screenshot above we have manually specified 8 for the value of the mfd_id primary key for the third record we have added.

 

eg_addrec01_02.png

 

We next add a fourth record, also specifying the mfd_id key manually to a value of 6.  At this point it would be a mistake to expect the records will always stay in this order when the table is displayed.  Remember, tables are not ordered: only the results of queries where we specify some order will be ordered.

 

eg_addrec01_03.png

 

 

If we close the table and then re-open it again Manifold will show the records in order set by the primary key as seen above.    Records added to a large table will also be seamlessly reordered by primary key if we just scroll about the table.  That may seem confusing if we added a record to a large table and then after scrolling about the table or jumping about in it from beginning to end we no longer see the record at the end of the table, where we think we added it.   If we added the record to the table it is still there, just ordered for display by the primary key.  

 

That records are displayed in table windows by primary key when the scrolling cache is reset may tempt us into the kludge of using the primary key mfd_id as a means of sorting tables by default.  But that is an unreliable kludge only good for small tables that are not much used or edited.  As  records are added and deleted there is no guarantee either manually-specified or auto-incremented primary keys will stay in the order desired.   So, rather than make the mistake of assuming something that should not be assumed,  in smaller tables we should sort on column heads as described above, and for bigger tables we should use the SQL tools provided to guarantee the sort order desired.   If the problem is lack of SQL skills the solution is not to thrash about with kludges but to learn to use basic SQL, which is easy to learn with a bit of focus.

 

Data Types

As with most database systems, fields in Manifold have standard data types, with very many different types available for use.   Compared to Manifold products prior to those using the Manifold engine there are new types for x2/x3/x4 vector types for each base numeric type plus new types such as uuid and tile.

 

There are two types within Manifold that get a lot of use in GIS applications and spatial data: tiles, which are used for raster data, and geoms, which store vector data.

 

Tiles store rectangular regions of pixels of the same type, plus a boolean mask for each pixel. Tiles are used for pixel-based, raster data as used in images and terrain elevation surfaces.    In a table that provides data for an image each record provides a tile for some rectangular portion of the image.   An image component knows from the properties of that image how to assemble the tiles from a particular table's records into an image by using the X and Y field values that give the intended position for each tile.   See the Example: How Images use Tiles from Tables topic for illustrations.

 

Tiles are used within Manifold primarily for performance reasons, as computers and database systems can work  faster with pixels when using batch processing to operate on many pixels instead of each operation working on a single pixel.  For example, it is quicker to store or retrieve a single record that stores a 128 x 128 tile of pixels (containing over 16000 pixels) than it is to store or to retrieve 16000 individual records each of which contains a single pixel

 

Geoms are a Manifold data type for storing vector information within spatial context.  They are used within records that store information, such as defining shapes and locations, for objects that appear in drawings and maps.   There, too, performance is a key factor in the use of geoms because it can be far more efficient to store an entire object's worth of coordinates within a single geom in a single record, potentially many thousands of coordinates, than to use a more decomposed storage where each coordinate is in a different record.

 

Geoms contain one of three basic geometry values that represent the three basic object types used in vector geometry in Manifold: points, lines or areas.  Areas are defined by their boundary lines, of which there can be multiple sets or branches within a single area to represent boundaries within boundaries, that is, islands or holes, in the same area object.    Lines and areas are composed of segments that can either be linear segments where each segment is a straight line, or curvilinear segments where each segment is a type of curve.  Curvilinear segments can be circle arcs, ellipse arcs or splines.  A line or an area can be made up of both linear segments and curved segments.   Operations that combine or analyze geometry will automatically linearize curves as necessary for geometric computations.

 

See the Data Types topic for a complete list of available types.

Vector Numeric Values

Vector numeric values , also known as composite numeric values, are pairs or triples or quads of numeric values of types XXXx2, XXXx3, XXXx4 respectively, where XXX can be either of the supported numeric types. for example, float32.   We refer to vector numeric values as x2, x3, x4, or, taken together, xN vector values. Vector values are very handy for representing points in 2D, 3D and 4D spaces, coordinates of rectangles, values in various color spaces like BGR, BGRA when adding an alpha channel to RGB color space, and so on.

 

Example:  Taking the numeric type float64 the xN types available are float64, float64x2, float64x3, and float64x4.   float64x3 is a data type consisting of a triplet of numbers each of which is a float64.

 

Indexes

Tables will often contain indexes of various types.   Tables without indexes are inconvenient: we cannot edit them and we cannot make selections in them.   Manifold therefore automatically creates an mfd_id unique identifier field along with an mfd_id_x index within tables that are brought into Manifold by dataports for most formats and most data sources.  On the rare occasion when a table is brought into Manifold or created without an index, it is easy to create an index in the table.   See the Adding an Index to a Table topic as well as the topics it in turn recommends.

 

BTREE indexes are used to store unique, ordered values. BTREEDUP indexes are similar to BTREE indexes but can also store duplicates. RTREE indexes are used to store geometry values in a form suitable for fast searches within rectangular regions.  Such indexes are known as spatial indexes and are easy to add: see the Example: Add a Spatial Index to a Table topic.

 

Manifold uses several different variations of RTREE indexes with various specific data sources.  All are similar but have different internal implementations.

 

Deleting mfd_id and mfd_id_x

 

In an ideal world all tables that are brought into Manifold would be imported with no modifications whatsoever, with no extra fields added.   In the real world we prioritize faster performance and greater reliability and ease of use.  When balancing logical rigor with ease-of-use Manifold makes the small compromise of automatically adding an identifier field that is guaranteed unique and non-NULL, mfd_id, to most tables with most dataports also creating a btree index on that field called mfd_id_x.  Adding those fields greatly simplifies life for users in many ways, for example, by greatly enhancing the efficiency with which drawings may be displayed.   

 

But doing so also changes the structure of an imported table by adding a field and possibly a field and an index to the schema of that table.   That can cause surprises and unexpected errors in things like copying and pasting between different databases if the mfd_id field is sometimes handled separately and not intuitively or sometimes not being handled separately.

 

To deal with such issues Manifold allows the mfd_id field to be writable so long as the new value is unique.  That allows, for example, specification of the mfd_id field during INSERT operations or changing the mfd_id field of an existing record during UPDATE queries.

 

Manifold also allows the mfd_id field and mfd_id_x index used by a table within the .map project to be "deleted," the word used in quotes because the deletion is a virtual deletion that just pushes the operation of such infrastructure down further into Manifold internal workings where it is hidden.   Once "deleted," that is, hidden, the mfd_id field and mfd_id_x index can be restored.

 

See the SQL Example: Using the mfd_id Field and mfd_id_x Index topic.

 

Computed Fields and Constraints

Manifold tables can contain fields that are computed on the fly.  These fields are called computed fields.  Computed fields compute values based on an expression using SQL syntax, for example, [population] / 1000  or  TileAbs([Tile]).   See the Example: Add a Computed Field to a Table topic for a step-by-step example of how to add computed fields.   

 

For example, a table may have a field that stores data as tiles and the table may also have a computed field which takes tile data from the first field and converts color values in the tiles into gray scale values. Whenever we change data in the first field, the system automatically will recompute the second field.  See the Example: An Image using Computed Fields in a Table topic for step-by-step example of an image created from a computed field.

 

Computed fields allow specifying one or more statements to prepare the execution context for an expression which computes values. This allows expressions for computed fields to use functions and scripts.   Each computed field uses its own execution context: functions and values defined in the context of a computed field are inaccessible from other fields.

 

Constraints are related to computed fields in that they also are expressions computed on the fly.  Primarily used for data integrity checks, constraints are  expressions which must evaluate to a boolean value of true for all table records.   Every time we try to insert a record into a table the system evaluates all constraints for the data about to be inserted and allows the insertion only if all constraints are satisfied. The system also checks constraints whenever we try to update a record.   If the attempted update fails one or more constraints it is rejected in full and the record is not modified.    See the Example: Create a Table with a Constraint topic for a step by step look at constraints in a table.

 

Like computed fields, constraints also allow specifying one or more statements to prepare the execution context for an expression which accepts or rejects records.  This allows expressions for constraings to use functions and scripts.   Each constraint uses its own execution context: functions and values defined in the context of a constraint for a given field are inaccessible from expressions for constraints for other fields.

 

When loading a .map file for a project that contains tables with computed fields and constraints, the system will resolve expressions for those computed fields and constraints to allow successfully loading expressions from cross-referencing tables.  When loading a .map file containing expressions in computed fields or constraints that fail to load, for example, because they refer to tables that do not exist, such as tables within linked data sources that are no longer accessible, the system will nonetheless start up and will preserve all other data.   All existing data, including data in computed fields with expressions that failed to load, will be fully and safely accessible. Computed fields and constraints with expressions that failed to load then can be safely removed.

 

See the Computed Fields and Constraints topic.

Nulls

Values stored in table records can be NULL.   To change the value of a cell to NULL, right click onto the cell in a table window and choose Set to NULL.  When adding records to tables, for a new record all editable fields are NULL by default.

 

Note that in Manifold empty strings are not NULL.   If we double-click into a text field for a new record to start editing but then we press Esc to cancel the edit box or we press Enter without making any changes, the system will leave unchanged what was there before we double-clicked into the cell to start editing: a NULL.

 

If we start editing again and keyboard some text and then press Enter that text will go into the field.  If we now edit that text by double-clicking  into the cell and then deleting all of the text and then pressing Enter the result in the field will be an empty string, not a NULL. If we want to reset the field to NULL we can do so by right-clicking onto the cell and choosing Set to NULL

 

The behavior of NULL values can be counter-intuitive.   Although different databases tend to handle NULLs differently, Manifold tries to stick to the notion of a NULL value denoting an unknown value which one day might be specified or which might remain unknown.  Doing that means NULLs just get skipped as if they weren't there to begin with. For example, most aggregates skip NULLs and NULLs are normally only handled for special cases such as First, Last or Count(*).

 

One interesting consequence is that NULL Xor TRUE returns NULL, but JoinXor() on two records with NULL and TRUE returns TRUE.  That is similar in its apparent counter-intuitiveness to how NULL + 1 returns NULL but Sum() on two records with NULL and 1 returns 1.

Some guidelines on NULLs:

Databases / Data Sources

Manifold's world view is that a project consists of databases that store entities of different types.   Those entities are called components.  

 

A component of type table has an associated table. Components of other types do not have an associated table but they may utilize data from some table. All types of components have metadata that is stored in a system table named mfd_meta. The names and types of all components are listed in a system table named mfd_root.

 

icon_data_source_cylinder.png Data sources are external files or databases.   A database can store a reference to another database. This is done by creating an entity of the type data source in the database that refers to another database.  Data sources can be nested with no limit to nesting depth.

Types of Data Sources

The system supports many types of data sources, roughly classified as:

 

 

Databases and file databases differ from all other types of data sources in that they normally support storing multiple tables (albeit usually with various limitations) and can thus store many components at once.    They are also normally updatable in that they allow the creation and deletion of components within them.  

 

Databases and file databases can also use Manifold dataports, which are pluggable pieces of logic designed to interpret raw data contained in databases. For example, the Manifold dataport for ESRI ArcSDE data interprets raw blobs and metadata within ArcSDE for structured use within Manifold as drawings.   

 

A dataport might allow one-way conversion (reading) or two-way conversion (reading and writing) with two-way conversion implemented in Manifold dataports whenever possible and sensible.  All dataports available for a specific type of database are turned on by default with provisions to turn off some dataports if desired, for example, for performance reasons.

 

File databases and files differ from all other types of data sources in that they have a recognizable physical file somewhere within the file system, a characteristic that Manifold can exploit.   

 

For example, Manifold can place additional files near such files to cache some data derived from the file for faster access.   In such cases Manifold might create a cache file containing a spatial index for faster use of a file data source that contains geometry data but which does not in the native file format support spatial indexes.  Manifold usually will provide options such as whether to allow creation of such additional cache files and so on.

 

Manifold can also cache data for data sources that are web servers, but that is done by creating tables in the parent data source, if the parent data source exists.   Although a parent data source might not exist in purely programmatic use via the script API, a parent data source always exists otherwise because there is always a project and a .map file associated with that project.  Even projects that have not yet been saved to a named .map file work on the basis of a temporary, unnamed and as yet unsaved .map file.

Nested Data Sources

Probably the most frequent use of nested data sources is when a data source has been created in a given Manifold project and then the .map file for that Manifold project is used to create a data source in another Manifold project.  

 

For example, we can create a data source in a project and then save that project as a .map file called myproject.map.   The myproject.map project has one data source in it, that is, a data source nested one level down within the project.   Suppose now in a different Manifold session we create a new project: we can create a new data source in that new project from the previously saved myproject.map file.  When we expand that data source we will see within it the data source previously created within the myproject.map project.    In our new project that data source is now two levels deep.

 

Nested data sources can be very convenient for organizing many projects and sub-groupings of data.  Suppose we have a worldwide business with tables organized by regions and subregions in each country with databases for each.  For example, in the Centre region of France we have databases for the various departments of Centre such as Cher, Eure-et-Loir, Indre and so on. (In France, sub-regional administrative divisions are called "departments," similar to how US states have "counties.")

 

To keep these organized we use .map files with nested data sources. At the lowest level for each department or subregion of a country we have a .map file that contains tables for that department, for example, a .map files that hold tables for Cher, one that holds tables for Eure-et-Loir and so on.  To group those for convenient access for regional projects we have map files for each region such as a .map file for Centre that has data sources created in it from the various departmental .map files for that region. For national projects we have a .map file for France that has data sources created in it from the various regional .map files, such as Centre, Normandie and so on. At the continental level we have a .map file for Europe that has data sources created in it for the various country .map files such as for France, Germany, Italy and so on.  

 

We can open a worldwide project that has data sources created in it from or the European, North American and other continental level .map projects and then drill down to the tables all the way within the lowest subregional level. All of the different data sources will appear nested within each other in the Project pane's hierarchy.  In the illustration below we've drilled all the way down through nested data sources to open the Customers table for the Eure-et-Loir department.

 

il_nestedsources01_01.pngil_nestedsources01_01a.png

 

To help us keep track of the origin of a component, component windows (such as a table window) will note the data source of the component listing nested names in order separated by a < character.   In the example above the Customers table is within the Eure-et-Loir data source that is in turn within the Centre data source that is within the France data source within the Europe data source.

 

There are many useful aspects to nested data sources but one of the most convenient is that we can reach out through any of them that are read/write to work with, including changing, their contents as if they were at the top level of the project.

 

il_nestedsources01_02.png

 

For example, as in the illustration above we could slow-click on the Categories table to rename it if we wanted, just as if it were a table hosted in the top level project and not something four or five levels deep in one project file and data source nested within another.

 

Nested data sources require no storage (their data is stored in whatever origin file or server to which the data source refers)  and they may be used with zero loss of performance, just as if they were part of the parent project at the top level.  When we import a large image into a Manifold project it will open instantly.   If that same large image appears in some data source nested five levels deep within Manifold projects it will still open instantly when we double-click it.

Reserved Names

All names starting with mfd_ (case not significant) are reserved for use by the system.  That is literally all names, including names for fields, indexes, constraints, tables, components other than tables, properties... everything:  Do not name anything beginning with mfd_ or MFD_ or in any upper or lower case combination of those characters.    mfd_id is a reserved name for a field in tables, to be used only for the identity field that Manifold maintains as an automatically unique value field.

 

Maximum Table Sizes

Tables in a Manifold project can be stored either within the .map project file or, if they are linked into the project, stored outside of the .map file within some external data source.  See the Importing and Linking topic.   The general rule is that Manifold can work with all such tables the same, whether they are stored within the .map or within an external data source.   But there are some differences, like possibly different maximum table sizes.

 

Different data sources will allow different maximum table sizes.  Any single table stored within a .map file is limited to 2 billion records (with effectively unlimited size of records), but that is a limitation of .map project file storage and not a limit of Manifold.   When Manifold stores tables in external data sources such as Oracle,  a table can have many more than two billion records, up to whatever is the limit imposed by that data source.

 

Some data sources, for example, .mdb file databases using Microsoft Access MDB format files, do not have a maximum number of records but instead a maximum data size for tables, such as 2 gigabytes.    Storing large records means the maximum data size for the table will be encountered with fewer records than if the records stored are small.  Such limits are not Manifold limits but are limits of the external format.

Discussion: Nested Data Sources

We would like to keep track of archaeological sites in Germany, France, Italy, Turkey and Bulgaria. We have Manifold projects for each of those countries saved to .map files called  germany.map, france.map, italy.map and so on.   Each of them has many large tables, drawings, images and maps.

 

Our new research topic involves trends in the French city of Strasbourg adjacent to the border with Germany.  For that work we will want data from our projects in Germany and in France.   We create a project and save it as strasbourg.map.

 

We can easily use our stockpile of German and French data within that new project by using the File - Create - New Data Source dialog to create new data sources called Germany and France which use the germany.map and the france.map project files as the source of their data.    Doing so will create two new data sources in the Project pane, each marked with a yellow database cylinder and a + icon button we can press to expand the data source.     If we click the + icon button next to the new France data source to expand it within the Project pane hierarchy we will see the contents of that project.   

 

Suppose now we would like to use an image server to create images of the region around Strasbourg.    To keep all of our data for France together in the same archive we would like to create that image server within the database that the france.map project file represents.  That's easy to do.   With the focus within the France data source hierarchy (just clicking the + button to expand that data source has already moved the focus into that data source's hierarchy) we create a new data source using a Manifold image server, for example, a Virtual Earth (VE) Hybrid image that shows streets overlaid on satellite photos.

 

When we do that we'll see files for an image server data source created in the usual way, all within the France parent data source hierarchy.  We can open the VE image server's image and see the visual display of satellite imagery overlaid with roads and other vector data provided by VE.   If we browse through the hierarchy of the France data source we will also see the .Tiles table Manifold created to cache tiles fetched from the VE server over the web that were used to create the image we see.  The full name of that cache table will be something ungainly, but self-documenting, such as "imageserver_VirtualEarthHybridImage.Tiles."   The tile cache table was created within the France data source because that is the parent data source of the newly created image server data source nested within.

 

To recall where we stand in the hierarchy of nested data sources, we are working with a project saved in a file called strasbourg.map.  That project in turn has two data sources in it called France and Germany which seamlessly link into the project the Manifold databases and all their components that live within the france.map and germany.map project files we had created during earlier work.

 

Within the France data source we have created yet a third data source, this one coming in from a web server in the form of a Manifold image server connection to hybrid imagery served by Virtual Earth.   When we look at the images served up by VE, panning and zooming into them to see our area of interest in Strasbourg, France, we are looking in to the third level of nested databases: the VE image server data source nested within the France project data source that is nested within our current Strasbourg project.

 

There are two things that are very cool about this situation.  First and foremost at the high end of the "cool" scale is that we encounter absolutely zero performance loss from nesting databases / data sources in this way.   The entire contents of the france.map project are seamlessly available to us within our strasbourg project just as gobsmacking fast as if we had opened the france.map project within its own Manifold session and were not using it within the strasbourg project as a nested data source.   We could nest data sources ten levels deep and still have zero performance loss.

 

The second very useful characteristic of nested data sources is that changes propagate as we would like and would expect.   If we make any changes to a nested data source, for example, adding an image server data source nested within, that change updates the parent data source as we would expect.

 

If we had nested an image server data source within a SQL Server data source the addition would take effect immediately, since any edits within SQL Server take effect immediately, as anyone who works regularly with SQL Server well expects.

 

In the case of a data source that is a .map project file, when we added the VE image server data source as a nested data source we would notice that a small asterisk * symbol appeared in the corner of the yellow database cylinder icon next to the France data source name.   That asterisk reminds us the project needs to be saved for the changes to be saved to the .map file used for that data source.

 

When we next save our strasbourg.map project the .map files used within any nested data sources will also be saved along with any changes we've made, such as adding a nested VE image server data source within their database.

 

Suppose we save our strasbourg.map file and exit Manifold.   We launch Manifold again and we open the france.map project.   We'll see that the project now has a new data source in the project, the VE image server data source.   It also has the ".Tiles" cache file still full of tiles from our previous browsing of the VE image server display.   Because the france.map project was the parent data source when we added the VE image server data source in our earlier session that nested data source and any accessory files that appeared with its use all were created in the france.map project.

 

Suppose we close that project and now we create a new, empty project that we save under the name alsace.map (Strasbourg is the capital of the Alsace region of France).   We use File - Create - New Data Source to create a data source that uses the france.map project as its database.   When we open that data source we will see the nested VE image server data source still there, ready for use.

 

Such effects make it easy to create repositories of related data and to organize additions to data so they go into the right repository.    If we are going to do work in the region around Strasbourg it could be useful to have our data sets for Germany immediately at hand as well as the data sets for France that we expect to use in our work with Strasbourg.   But we also may as well ensure that various additions involving French data for Strasbourg get created within the French compilation so that they can be used in any other projects that link in the French repository.

Discussion: Nested Data Source using SQL Server

Consider an example that shows data source nesting on a machine  that has Microsoft SQL Server available for use.    We open Manifold and create a new project.   We use File - Create - New Data Source to create a new data source for a SQL Server database which we will call SQL Server.   The new data source appears in the project pane with a yellow DBMS cylinder icon and a + sign indicating that data source can be expanded, that is opened.

 

In the Project pane we click on the + icon to open the SQL Server data source.   Next, we create a new data source for a Manifold image server, for example, the Virtual Earth Street Map image server, allowing it to cache data whenever it wants to.   A new data source for the image server appears, under the SQL Server data source hierarchy since that was opened and had the focus when we created the data source for the image serer.

 

We open the image server street map image and see it download tile data from the web to form the expected image of a street map.  If we look closely we note that it has also created a new table in the SQL Server data source with a .Tiles extension to the name as a table to hold the tiles it has downloaded.   The .Tiles table was created in the SQL Server data source because that is the parent data source to the new image server data source we created using the VE image server.

 

Suppose we now exit Manifold.   All the tile data and the created data source are still in the SQL Server database we used.     Suppose we launch Manifold again and create a new project.   Within that new project we use File - Create - New Data Source to create a new data source that once again uses the same SQL Server database we used earlier.   When we click on the + icon to expand that newly created SQL Server data source we will see the data source for the Manifold image server that we created in our earlier Manifold session and also we will see the .Tiles extension table it created with tile data it had cached.

 

This example takes advantage of the sophisticated capabilities of SQL Server.  It won't work if we try it on a data source that we created using an .mdb file database because the limitations of .mdb for the names of  tables will prevent Manifold from creating tables using the default naming scheme used for image servers.   It won't work with sqlite files either since those cannot host the indexes Manifold would like to create.  But it will work with other sophisticated DBMS environments like PostgreSQL or Oracle.

 

See additional examples of using tables in the Images topic and other topics recommended below.

 

tech_ravi_sm.png

One More Thing...

Tech tip: Have we mentioned that tables are editable available only if there is btree or similar index in a table?   See the Adding an Index to a Table topic.

 

Notes

All Data is in Tables - We are accustomed from DBMS applications to click open tables and to see them in some sort of row and column table browser, and we are accustomed from applications like graphics editors to click open images or CAD drawings and to see them as something other than tables, as photographs or other images, for example.  Classic GIS applications will also often encapsulate geometry data for the objects shown in drawings within the drawing, and will use tables only for "attribute" data for each object.  The geometry data for the object is somehow hidden within the drawing and, other than being displayed in the drawing, might be accessible only through programmatic means.

 

So it could be that when we first click open an image in Manifold and see it as a photograph we might think, "Hey... that's not a table, that's a photo."    But the data for that image really indeed is stored in a table somewhere in Manifold.   The image seen in a window is simply that table's data visualized by  the display handler of the image window, which knows how to interpret the table's data to present it for us in visual form.

 

If we doubt the data is coming from a table we can right click on the image component and choose Properties to see the name of the table that stores the data for the image.  If we open that table we can see the records which contain the data for the image, most likely in the form of tile data within each record.  It is the same with objects in drawings, where the drawing window is just a means of visualizing geometry data in the table.

 

That Manifold explicitly exposes such data as a field in a table is a bit different than how systems like an Adobe graphics editor or a classic GIS might operate, but providing explicit and easy access to that data greatly expands our ability to utilize that data for fun and profit.  For example, given a table with geometry data in the form of a Manifold geom data type we might want to change that data into some different form, such as geometry data in GeoJSON form or Well Known Binary (WKB) geometry form.  When we can see the data as a field in a table such transformations are trivially easy with a few clicks in the Transform panel.

 

When geometry data is in plain sight at our fingertips in a table we can also use SQL to manipulate it, to analyze it, to slice and dice it, to transform it and so on.   Such processes become far easier and more flexible than when data is stored in some secret form within a drawing where only those facilities thought of in advance by the product's designers can get at it.

 

Really Big Tables - Tables in Manifold can be very big data, so big that it does not make much sense to use a table window to manipulate such truly enormous table data.  A famous thread in the Forum discussed a LiDAR point cloud data set that contained 1.72 billion records in the table.   How big is a table that shows 1.72 billion records?  If we displayed the table in a series of screens where each page full of records was the height of a typical computer monitor screen the total length of the table would be over 8600 km, or about 1.35 times the radius of the Earth.   That is such a large table that no amount of interactive viewing of the table would show anything more than the tiniest fraction of the table.   Such large tables are handled with SQL or programmatically, or they display their contents in drawing or image layers, but they are not typically browsed interactively using table windows.

 

Sorting and Collation -  When sorting a table window by clicking on a column handle there is no option to specify collation.  If we would like to specify collation, we should use an SQL query using SELECT ... ORDER BY.

 

Sorting and Slow Data Sources - Table windows showing tables in fast data sources will fill before we can do anything.   However, when filling table windows from slow data sources the table window might not be filled before we manage to click a column handle to command a sort.  Clicking a column handle to sort a table window that is in the process of filling will stop filling the window.

 

Autogeneration of Queries - For an "inside look" into how components can be created from tables by using SQL we can take advantage of Manifold's query window.  Choose View - New Command Window - SQL to open a command window for queries.    The topmost pane is where we would write query text, but that topmost pane has intelligence of its own.  We can Copy a component (for example, an image or a drawing) in the Project pane and then Paste into the query text pane and Manifold will write in the query pane the SQL statements which generate the pasted component.   In fact, we can Copy  more than one component from the Project pane and Paste them into the query text pane and Manifold will generate multiple CREATE statements for us.  See the Examples in the Queries topic.

 

Connection Prerequisites - Connections to data sources may depend upon having necessary .dll files on our system or other non-Manifold considerations.   For example, like other Manifold products Manifold System utilizes Microsoft routines to connect to Microsoft formats such as .mdb, which can lead to some 32-bit / 64-bit integration issues depending on the vintage of our Windows system, what other Microsoft software we've installed and so on.  The rule of thumb when running 64-bit systems is to launch Manifold in 32-bit operation when connecting to Microsoft "Jet" formats such as .mdb, .xls and so on, to ensure that the predominantly 32-bit orientation of Microsoft routines to connect to Access-style file databases won't cause the connection to fail to work on a 64-bit system.  

 

Other database systems may also have their integration quirks to enable connections.  For example, to connect to .sqlite file databases we must have the required sqlite3.dll files either in the execution PATH for our system or in the same folder from which the manifold.exe file which we use to launch Manifold is located.   Put the 64-bit sqlite3.dll file in the ~\Bin64\ folder along with the 64-bit manifold.exe executable and the 32-bit sqlite3.dll file in the ~\Bin\ folder along with the 32-bit manifold.exe executable.

 

Compiled commands - Compiled commands, as used by the Manifold query engine in EXECUTE statements, are used not only in OLE DB and ODBC data sources but also for many others, for example, ADO.NET, MySQL, Oracle, PostgreSQL and SQLite.

 

MTA - OLE DB data sources use MTA (Multi-Threaded Apartment threading model) whenever possible for better performance.  Also greatly helping performance is that fetching large amounts of data from a remote database need not be done in pages.

 

Testing for NULLs -  Manifold tests for a null using the IS NULL operator. Many DBMS applications also have operators such as IS TRUE, IS FALSE and sometimes IS UNKNOWN, which is more or less synonymous to IS NULL.    Manifold does not use IS TRUE and IS FALSE since they are way too redundant given that Manifold's equality operator works with boolean values.    For example, instead of <something> IS TRUE we can use <something>  = TRUE or just plain <something>.   Instead of  <something> IS FALSE we can use <something> = FALSE or NOT <something>.  

 

Widths in printer's points - Why are the widths of columns in tables specified in printer's points as a unit of measure?  Tables display values using fonts that are specified in printer's points, with displays and printouts normally scaling to show those fonts in reasonably accurate real-world sizes.   Setting the width of columns using the same units of measure allows table column sizes to scale the same way as the fonts they contain.

 

Read-only data - The Layers pane recognizes when the data it displays is read-only, and disables controls and commands that cannot be used with read-only data. Temporary layouts and temporary maps are always writable. Tables and queries always appear writable with changes to tables on read-only data sources being kept in the window and being discarded after the window is closed.

 

Centre - The discussion of nested data sources refers to the Centre region of France.  More accurately the name of the region is  Centre - Val de Loire, but it is often referred to as simply Centre for short.

See Also

Getting Started

 

User Interface Basics

 

Sorting Columns

 

Filters

 

Editing Tables

 

Data Types

 

Indexes

 

Queries

 

Drawings

 

Copy and Paste between Drawings

 

Images

 

Labels

 

Computed Fields and Constraints

 

Table Windows and Big Data

 

Schema

 

File - Create - New Drawing

 

View - Filter

 

View - Order

 

Examples - Do not miss!  Browse through the many examples for step by step tutorials.

 

Example: Closing without Saving - An example that shows how File - Close without saving the project can affect local tables and components differently from those saved already into a data source, such as an .mdb file database.

 

Example: Select a Range of Records in a Table - We can select or deselect a range of records between any two records in a table window using fast and easy mouse commands.

 

Example: Drawings use Geom Fields in Tables  - An essential discussion on how drawings are created from geom fields in tables, including how the drawing knows which coordinate system to use.

 

Example: Multiple Drawings from the Same Table - Illustrates how easy it is to create multiple drawings that use the same table and same geometry by copying and pasting an existing drawing.  Each new drawing takes no additional storage space in the project, but can be formatted differently.   

 

Example: Add a Computed Field to a Table - How to add a field to a table.  We first set the values for the field in a static way and then we show how to set the values for a field dynamically with a computed field.

 

Example: Create a Table with a Constraint  - Create a simple table that includes a simple constraint upon one of its fields.  

 

SQL Example: Create a Table with a Constraint via a Query -  A simple example using SQL to create a table with a constraint.

 

SQL Example: Force an Anomaly in Constraints - Constraints are only evaluated when we insert or update records.   If a constraint refers to external data, such as the values in a different table, we can force an anomalous condition where the table with the constraint may contain data that no longer meets the requirements of the constraint.

 

SQL Example: Using the mfd_id Field and mfd_id_x Index - A sequence of SQL examples of working with the mfd_id field and mfd_id_x index.