Tables

icon_table.png Everything in Manifold is stored in a table.  Every bit of data, regardless of its nature - vector, raster, text, whatever, - is stored in a table.   Most GIS users naturally understand what a table is when they see data, such as attributes, in a classic row and column table presentation, but it may be a new notion to think of tables as also storing data that can also be shown in visual displays, such as geometry data for drawings or tile data for images.

 

When we open visual windows like drawings or images, in reality those are just a different way of showing data in a table somewhere, for example, geometry data for features in drawings or tile data for pixels in raster images.   Drawings and images are just a specialized way of displaying data in tables.  A drawing itself contains no data: it just has a few bytes of information saying how geometry data from the subject table should be displayed.

il_drawing_from_table.png

The shape and locations of objects in drawings are stored in geometry fields in a table, using a Manifold data type called a geom.   Each object in a drawing is one record in that drawing's table.   To create the drawing, for each record in a table Manifold reads the geometry data in that record's geom field and draws whatever object is stored in the geom.   In the illustration above, each record in the table has a geom field that describes the shape and location of a province in Mexico.    By tradition, the geom field in a Manifold drawing's table is named "Geom," so when we look at a list of fields in a table we can tell right away from the name of the field which one is the geom field.   There is nothing special about that name.  The geom field could be called "Harry" if we like, although that would make it a less obvious later on what that field contained.  

 

il_image_from_table.png

The shape and locations of objects in images are stored in tile fields in a table, using a Manifold data type called a tile.   The default name for a tile field in a table is named "Tile," so it is obvious what data that field contains.   As with the default name of "Geom" in drawings, there is nothing special about the default name "Tile".  It just makes more sense as a default name instead of, say, "Josephine."

 

There are many different types of tiles to allow storing the different types of raster data found in images.   Manifold uses the word "images" to refer to all forms of raster data, including raster data used for photographic images, multichannel raster data from satellite sensors, and other raster data such as terrain elevation data.  Any raster image in Manifold is composed of a checkerboard of tiles, where each tile is a record in that image's table.  In the illustration above, the aerial photograph showing an airport on the coast in Italy is composed of many tiles, each one of which is 128 pixels by 128 pixels in size, with three channels of 16 bit integer data for each pixel.  The selected row in the table contains the tile that is displayed just to the left of the end of one of the runways.

 

Because all data in Manifold is stored in a table somewhere and is fully exposed for us to use however we prefer, that means that every bit of data can be accessed via SQL, and everything that can be done with other user interfaces can be done with SQL or with other table interfaces like the Transform panel.  That makes it very easy to automate what we do and greatly magnifies the power and convenience of Manifold as a GIS and as an incredibly powerful tool for spatial data manipulation.

 

For example, to format one drawing we can use the Style panel interactively with a point and a click.  But to format a thousand drawings we do not want to point and click on each drawing.  Suppose our organization has decided on a standard background color for all drawings and we want to apply that color to a thousand drawings in a project? That is just a simple update of records in a meta table as in this example.   A few mouse clicks in Manifold can do that for thousands of drawings at once by using the Select and Transform panels, easily automating what could take hours or days in a system that did not expose all data of interest in tables.

Table Basics

i_table_parts_labeled.png

 

i_table_parts_cursor.pngDouble-clicking open a table in Manifold will show it in a table window, as seen above.   Tables show data in a classic grid presentation, with rows and columns.  Each row is one record in the table and the various columns show the values for fields in each table.    

 

Manifold uses the words row and record as synonyms and the words column and field as synonyms as well.   If we say column handle we mean the same thing as field handle, for example, and if we say row handle we mean the same thing as record handle.    In addition, GIS people will often refer to fields as attributes.  

 

Default actions, such as clicking the Enter key to begin editing a cell, will apply to the current cell, which is marked by a dotted line box called the cursor.  

 

For user interface basics for tables, see the Getting Started and the User Interface Basics topics.

Schemas

A table has one or more fields with each field being a particular data type.  A table can also have zero or more indexes and zero or more constraints. All of these aspects of a table's structure taken together are referred to as a table schema.   We can edit schemas to add, remove, or rename fields, constraints and indexes using the Edit - Schema dialog.

 

Everything in a schema has a name and each name must be unique in the schema.    For example, we may not have a field named x and an index named x in the same table, but we may have a field named x in one table and an index named x in a different table.   We can also have indexes named x within several different tables in the same database.   The names of indexes do not have to be unique within a database in Manifold.

tech_ravi_sm.png

 

Important:  A table must have a btree index in it to support selection as well as commands like Select All, Select Inverse and Select None and Copy, Cut and Delete on records.  Many tables created by Manifold, such as the table created for a drawing when drawings are imported or created, will automatically have a btree index created in them but tables that are imported or opened in external data sources might not.

Maximum Table Sizes

Tables in a Manifold project can be stored either within the .map project file or, if they are linked into the project, stored outside of the .map file within some external data source.  See the Importing and Linking topic.   The general rule is that Manifold can work with all such tables the same, whether they are stored within the .map or within an external data source.   But there are some differences, like possibly different maximum table sizes.

 

Different data sources will allow different maximum table sizes.  Any single table stored within a .map file is limited to 2 billion records (with effectively unlimited size of records), but that is a limitation of .map project file storage and not a limit of Manifold.   When Manifold stores tables in external data sources such as Oracle,  a table can have many more than two billion records, up to whatever is the limit imposed by that data source.

 

Some data sources, for example, .mdb file databases using Microsoft Access MDB format files, do not have a maximum number of records but instead a maximum data size for tables, such as 2 gigabytes.    Storing large records means the maximum data size for the table will be encountered with fewer records than if the records stored are small.  Such limits are not Manifold limits but are limits of the external format.

Data Types

As with most database systems, fields in Manifold have standard data types, with very many different types available for use.   Compared to Manifold products prior to those using the Manifold engine there are new types for x2/x3/x4 vector types for each base numeric type plus new types such as uuid and tile.

 

There are two types within Manifold that get a lot of use in GIS applications and spatial data: tiles, which are used for raster data, and geoms, which store vector data.

 

Tiles store rectangular regions of pixels of the same type, plus a boolean mask for each pixel. Tiles are used for pixel-based, raster data as used in images and terrain elevation surfaces.    In a table that provides data for an image each record provides a tile for some rectangular portion of the image.   An image component knows from the properties of that image how to assemble the tiles from a particular table's records into an image by using the X and Y field values that give the intended position for each tile.   See the Example: How Images use Tiles from Tables topic for illustrations.

 

Tiles are used within Manifold primarily for performance reasons, as computers and database systems can work  faster with pixels when using batch processing to operate on many pixels instead of each operation working on a single pixel.  For example, it is quicker to store or retrieve a single record that stores a 128 x 128 tile of pixels (containing over 16000 pixels) than it is to store or to retrieve 16000 individual records each of which contains a single pixel

 

Geoms are a Manifold data type for storing vector information within spatial context.  They are used within records that store information, such as defining shapes and locations, for objects that appear in drawings and maps.   There, too, performance is a key factor in the use of geoms because it can be far more efficient to store an entire object's worth of coordinates within a single geom in a single record, potentially many thousands of coordinates, than to use a more decomposed storage where each coordinate is in a different record.

 

Geoms contain one of three basic geometry values that represent the three basic object types used in vector geometry in Manifold: points, lines or areas.  Areas are defined by their boundary lines, of which there can be multiple sets or branches within a single area to represent boundaries within boundaries, that is, islands or holes, in the same area object.    Lines and areas are composed of segments that can either be linear segments where each segment is a straight line, or curvilinear segments where each segment is a type of curve.  Curvilinear segments can be circle arcs, ellipse arcs or splines.  A line or an area can be made up of both linear segments and curved segments.   Operations that combine or analyze geometry will automatically linearize curves as necessary for geometric computations.

 

See the Data Types topic for a complete list of available types.

Reserved Names

All field names beginning with mfd_ (case not significant) are reserved for internal use.    mfd_id is a reserved name for a field in tables, to be used only for the identity field that Manifold maintains as an automatically unique value field.

Vector Numeric Values

Vector numeric values , also known as composite numeric values, are pairs or triples or quads of numeric values of types XXXx2, XXXx3, XXXx4 respectively, where XXX can be either of the supported numeric types. for example, float32.   We refer to vector numeric values as x2, x3, x4, or, taken together, xN vector values. Vector values are very handy for representing points in 2D, 3D and 4D spaces, coordinates of rectangles, values in various color spaces like BGR, BGRA when adding an alpha channel to RGB color space, and so on.

 

Example:  Taking the numeric type float64 the xN types available are float64, float64x2, float64x3, and float64x4.   float64x3 is a data type consisting of a triplet of numbers each of which is a float64.

Computed Fields

Manifold tables can contain fields that are computed on the fly.  These fields are called computed fields.  Computed fields compute values based on an expression using SQL syntax, for example, [population] / 1000  or  TileAbs([Tile]).   See the Example: Add a Computed Field to a Table topic for a step-by-step example of how to add computed fields.   

 

For example, a table may have a field that stores data as tiles and the table may also have a computed field which takes tile data from the first field and converts color values in the tiles into gray scale values. Whenever we change data in the first field, the system automatically will recompute the second field.  See the Example: An Image using Computed Fields in a Table topic for step-by-step example of an image created from a computed field.

Indexes

Tables will often contain indexes of various types.   Tables without indexes are inconvenient: we cannot edit them and we cannot make selections in them.   Manifold therefore automatically creates an mfd_id unique identifier field along with an mfd_id_x index within tables that are brought into Manifold by dataports for most formats and most data sources.  On the rare occasion when a table is brought into Manifold or created without an index, it is easy to create an index in the table.   See the Adding an Index to a Table topic as well as the topics it in turn recommends.

 

BTREE indexes are used to store unique, ordered values. BTREEDUP indexes are similar to BTREE indexes but can also store duplicates. RTREE indexes are used to store geometry values in a form suitable for fast searches within rectangular regions.  Such indexes are known as spatial indexes and are easy to add: see the Example: Add a Spatial Index to a Table topic.

 

Manifold uses several different variations of RTREE indexes with various specific data sources.  All are similar but have different internal implementations.

 

Deleting mfd_id and mfd_id_x

 

In an ideal world all tables that are brought into Manifold would be imported with no modifications whatsoever, with no extra fields added.   In the real world we prioritize faster performance and greater reliability and ease of use.  When balancing logical rigor with ease-of-use Manifold makes the small compromise of automatically adding an identifier field that is guaranteed unique and non-NULL, mfd_id, to most tables with most dataports also creating a btree index on that field called mfd_id_x.  Adding those fields greatly simplifies life for users in many ways, for example, by greatly enhancing the efficiency with which drawings may be displayed.   

 

But doing so also changes the structure of an imported table by adding a field and possibly a field and an index to the schema of that table.   That can cause surprises and unexpected errors in things like copying and pasting between different databases if the mfd_id field is sometimes handled separately and not intuitively or sometimes not being handled separately.

 

To deal with such issues Manifold allows the mfd_id field to be writable so long as the new value is unique.  That allows, for example, specification of the mfd_id field during INSERT operations or changing the mfd_id field of an existing record during UPDATE queries.

 

Manifold also allows the mfd_id field and mfd_id_x index used by a table within the .map project to be "deleted," the word used in quotes because the deletion is a virtual deletion that just pushes the operation of such infrastructure down further into Manifold internal workings where it is hidden.   Once "deleted," that is, hidden, the mfd_id field and mfd_id_x index can be restored.

 

See the SQL Example: Using the mfd_id Field and mfd_id_x Index topic.

Constraints

Primarily used for data integrity checks, constraints are boolean expressions which evaluate to true for all table records.   Every time we try to insert a record into a table the system evaluates all constraints for the data about to be inserted and allows the insertion only if all constraints are satisfied. The system also checks constraints whenever we try to update a record.   If the attempted update fails one or more constraints it is rejected in full and the record is not modified.    See the Example: Create a Table with a Constraint topic for a step by step look at constraints in a table.

Nulls

Values stored in table records can be NULL.   To change the value of a cell to NULL, right click onto the cell in a table window and choose Set to NULL.  When adding records to tables, for a new record all editable fields are NULL by default.

 

Note that in Manifold empty strings are not NULL.   If we double-click into a text field for a new record to start editing but then we press Esc to cancel the edit box or we press Enter without making any changes, the system will leave unchanged what was there before we double-clicked into the cell to start editing: a NULL.

 

If we start editing again and keyboard some text and then press Enter that text will go into the field.  If we now edit that text by double-clicking  into the cell and then deleting all of the text and then pressing Enter the result in the field will be an empty string, not a NULL. If we want to reset the field to NULL we can do so by right-clicking onto the cell and choosing Set to NULL

 

The behavior of NULL values can be counter-intuitive.   Although different databases tend to handle NULLs differently, Manifold tries to stick to the notion of a NULL value denoting an unknown value which one day might be specified or which might remain unknown.  Doing that means NULLs just get skipped as if they weren't there to begin with. For example, most aggregates skip NULLs and NULLs are normally only handled for special cases such as First, Last or Count(*).

 

One interesting consequence is that NULL Xor TRUE returns NULL, but JoinXor() on two records with NULL and TRUE returns TRUE.  That is similar in its apparent counter-intuitiveness to how NULL + 1 returns NULL but Sum() on two records with NULL and 1 returns 1.

Some guidelines on NULLs:

Table Windows

Tables can have just a few records or they can have billions of records.  The Component panel in the Contents pane reports the total number of records in a table.

 

Working with smaller data may have accustomed us to going back and forth between views of the data as a drawing or as a table window.   When a drawing has only a few tens of thousands of features it is still realistic to switch to a table view of the objects that allows us to interactively scroll through a table.  Table windows in Manifold also allow easy interactive work with such data as well.   

 

But the situation changes for extremely large drawings and tables, which can involve billions or even trillions of records.  That is a situation we might not have experienced if we have been working with older GIS software that cannot handle really big data.  Life is different when tables have billions of records.  Even just a few million records are too many to manipulate interactively with a table display that shows only a few dozen records at a time.  Seeing a tiny sample of the data in a table window may be useful, but actually manipulating the entire table through a keyhole view of a table window?  That's like trying to paint a line on the ground across the entire United States by working through a window that allows painting only one screen at a time.    See the Note at the end of this topic on Really Big Tables for why a comparison to the size of the United States is actually an understatement.

Table Fill Strategies

Manifold employs several strategies to handle tables that may involve billions or trillions of records while still providing the interactive convenience of table windows in a form that GIS people expect.  The core idea behind these strategies is to provide the expected user experience in a way that protects inexpert users from launching a process that could take years with a really big data source.  Manifold has the ability to connect to such truly immense data sources that such safety measures are a good idea.  

 

A fill strategy is how Manifold populates records into ready status for display in a table window.   Strategies are necessary because when a large data source, such as the Google table for imagery of the entire Earth, contains trillions of records it is just plain stupid to try to fetch them all when at most only a few thousand could ever be seen by the user in the table window and only a few tens of billions would fit on the machine in any case.

 

Fill strategies include:

 

 

 

 

 

The combined effect of the above strategies results in very fast table population even with immensely large data sources.   

 

Important:  To see the total number of records in a table, open the table and in the Contents pane click on the Component panel.

 

il_table_50k_records.png

 

In the above illustration we have pressed Ctrl-End to jump to the end of a table window that shows the first 50000 records from a large table.   The icon in the next to last row indicates there are more records in the table.

 

il_table_50k_records_contents_component.png

 

The Component panel of the Contents pane reports how many records the table contains.

Placeholder Records

il_table_placeholder_records.png

 

Table windows display two special records at the very bottom of the table:

 

btn_fill_record_placeholder.pngThe fill record placeholder is a row at the bottom of the table that shows where records being filled into the table window will appear.   Small table windows fill so rapidly we will never see it.  When a table is fetching enough records from a data source that is slow enough (such as from a slow web server source or other slow data sources), the fill record placeholder will show blue preview color indicating records are still being fetched.   If the table window finishes fetching records before the display limit is reached the fill record disappears.   If the table window hits the display limit of 50000 records yet the table contains still more records the fill record remains with a background color of gray and a fill record icon in the record handle.  The table window above displays records from a table with very many records, so only the first 50000 are displayed and the fill record remains to alert us that the table contains more than 50000 records.

 

btn_new_record_placeholder.pngThe new record placeholder is a row at the bottom of the table, marked with an asterisk * icon in the record handle, that allows us to manually add new records to the table in the usual way.  See the Editing Tables topic.

Sorting Table Windows

Clicking a column handle will sort the table window display for all fetched records by that field.   Clicking the column handle again will sort the table window in reverse order by that field.   Keep in mind we are not sorting data in the table itself: as with all high end database systems table data within Manifold is always stored in unsorted form.   We are only sorting the display of the table data in the table window.

 

eg_table_columns_sort01_01.png

 

The table window above shows the one thousand largest cities in the US.   Clicking the state column handle sorts the table window by state name, in ascending order.   

 

eg_table_columns_sort01_01a.png

 

Sorting the table window by state means that all records displayed by the window with a state value of Alabama will appear before those from Alaska, which all will appear before those from Arizona and so on.   Within each state records are not ordered.   

 

eg_table_columns_sort01_02.png

 

Clicking the column handle again re-sorts the table window in descending order on that field.   For example, we can click on the state column handle again.

 

eg_table_columns_sort01_03.png

 

That re-sorts the table window so records with Wyoming as the value for the state field come first.   

 

Clicking on another column handle sorts the table window by that field.  For example, we can click on the city field.   

 

eg_table_columns_sort01_04.png

 

That resorts the table window display so records with a city value beginning with A come first.    Click on a different column handle to sort by that field.  For example, we can click on the state field to  sort the table window by state.   

Sub-sorting by a Second Field

Suppose we have sorted the table window by one field, such as state, and now we want to sort within that sort so that records are ordered first by state and then within the state from largest to smallest population.   Easy: after clicking on the state column handle to sort the display by state, then Shift-click on the population column handle to sort within each state grouping by population.  The Shift modifier tells Manifold not to abandon the previous sort but to add another field to the sort order.  

 

eg_table_columns_sort01_05.png

 

In the example above, after clicking on the state column handle to sort by state, we Shift-click on the population column handle to further sub-sort by the population field.

 

eg_table_columns_sort01_06.png

 

The default sort is ascending, within Alabama from lowest population in Phenix City to highest population in Birmingham. To change that to a descending sort so that the cities within Alabama are sub-sorted from highest population to lowest population we Shift-click again on the population column handle.

 

eg_table_columns_sort01_07.png

 

That resorts the table window so first, all records are sorted by state and then second, records within each state are sorted from highest population to lowest population.  

 

We can continue Shift-clicking additional fields to add yet more sub-sorts if we liked.   For example, suppose our table also had a Zip field giving zip code tabulation areas (like regions encompassing each postal code area) for each city in Alabama.   Suppose that there was not just one record that had Birmingham as the city and Alabama as the state, but ten such records, each one of which had a different value in the Zip field.   We could Shift-click on the Zip column handle and that would sort the table window so that all the Birmingham records were further sub-sorted by Zip.

 

We can also choose sort order for a table window by using the View - Order command.  The first field chosen using View - Order sorts the table while additional fields chosen using View - Order will add sub-sorts as we had Shift-clicked their column handles.

Tooltips

A tooltip is a small information box that appears when the mouse cursor hovers over an item.   Many fields will have tooltips that show their contents in more expanded form.   For example,  the expression used for a computed field appears in the tooltip for that field's column handle.

i_tabletooltip01.png

Tooltips can be extensive and can display Unicode characters, as in the screenshot above of a tooltip that shows the contents of a text field.

Selection

Selection in an interactive, Manifold sense is a different concept but is related to the use of SELECT in SQL.   Selection in Manifold is an interactive user interface way of marking some records in tables or some objects in drawings as being of interest.

i_ctrlclick_row01_01.png

 

Ctrl-click anywhere in a row to select or to deselect that record.  Selected records will be shown in red selection color.

i_ctrlclick_row01_02.png

Ctrl-click another row to add that record to the selection.  

 

i_ctrlclick_row01_03.png

 

Ctrl-click and drag will draw a selection box that will select all records any part of which falls within the box.   Alt-Ctrl-click and drag will draw a deselection box that will deselect all records any part of which falls within the box.    We can also use the Select dialog to select records in the table.   See the Selection topic for more info and examples.

Show only Selected Records

The View - Filter command allows us to show only selected records in the table window.

 

i_ctrlclick_row01_04.png

With two records selected as in the above examples we choose View - Filter - Selected.

i_ctrlclick_row01_05.png

Immediately the table window re-displays to show only selected records.   If we wanted the table window to show only those records that were not selected we could have chosen View - Filter - Unselected.    To show all records again in the table window we choose View - Filter - Clear Filter.

 

Filters

Filters are a great way to show only desired records in a table window.  They are incredibly fast and handy because they require no keyboarding.   Right-click on a cell that has a desired value and then choose a filtering expression based on that value.  Only records which have that expression value for that field will be displayed.

 

il_filters_table01_01.png

 

To display only those records that the fclass field says are supermarkets we right-click on an fclass field cell that already has supermarket in it.

 

 

il_filters_table01_02.png

 

In the Add Filter menu we choose fclass = 'supermarket' as the filter we want.

 

il_filters_table01_03.png

 

The table window immediately re-displays to show only the supermarket records.    We add more filters using other fields if we like, and we can combine filters with selection.  See the Filters topic for details.

Showing and Hiding Fields

When working with tables that have many fields we may want to hide some of those fields.  Right-click on the column handle and choose Hide, or use the View - Fields dialog to choose which fields are displayed and in which order.  Keyboard shortcuts make it easy to quickly choose a few fields out of many as illustrated in the View - Fields topic.

 

How Records are Ordered for Display

How are records ordered for display?  As with most database systems, in Manifold they are not at all ordered by default.  

 

Tables in modern databases should not be presumed to display records in a fixed order the way a list in a word processing document might.  Big data sets tend to be complex with many different possible ways of sorting the data for ordered display.  Since sorting big data is computationally expensive, modern databases focus on performance and flexibility by storing data within efficient structures that can most easily be resorted as desired to show records in some particular order.  

 

Manifold does the same, storing data efficiently in unordered form.   To see records in any desired sort order we utilize a query employing constructs such as ORDER BY.   That can come as a surprise if we are not used to working with databases and expect records to always be displayed in some fixed order, perhaps in the order in which they were manually added to a table.  But part of learning to work with databases is remembering that what we see in a table window is unsorted.  Even as we scroll the order of records can change due to insertions or deletions by us or by some other process also connected to the same data source.

 

As noted in the Essay on browsing table windows, browsing a table by scrolling through records in a table window has limited utility, as in the case of big tables we see what is usually a small fraction of records among very many, and we see them in no particular order.  The order in which we see records in a table window is a function of performance enhancing mechanisms that allow us to reach into what could be very, very large tables, possibly in remote data sources to which connections have limited bandwidth.  That sort order can change.

 

For example, to improve performance Manifold table windows cache those records that have been displayed.  If we scroll a table window after inserting new records the window will seamlessly reset the display so that as the new records scroll out of cache they will reappear in the order set by the primary key.  We can see that effect with a simple experiment.

 

eg_addrec01_01.png

 

We create a new table with nvarchar type fields called Name and Comments and we add three records as seen above.  Manifold will automatically populate the mfd_id primary key ID field by auto-incrementing it starting from 1 as we add records to a new table, or we can manually populate that field with unique values.   In the screenshot above we have manually specified 8 for the value of the mfd_id primary key for the third record we have added.

 

eg_addrec01_02.png

 

We next add a fourth record, also specifying the mfd_id key manually to a value of 6.  At this point it would be a mistake to expect the records will always stay in this order when the table is displayed.  Remember, tables are not ordered: only the results of queries where we specify some order will be ordered.

 

eg_addrec01_03.png

 

 

If we close the table and then re-open it again Manifold will show the records in order set by the primary key as seen above.    Records added to a large table will also be seamlessly reordered by primary key if we just scroll about the table.  That may seem confusing if we added a record to a large table and then after scrolling about the table or jumping about in it from beginning to end we no longer see the record at the end of the table, where we think we added it.   If we added the record to the table it is still there, just ordered for display by the primary key.  

 

That records are displayed in table windows by primary key when the scrolling cache is reset may tempt us into the kludge of using the primary key mfd_id as a means of sorting tables by default.  But that's a really unreliable kludge only good for small tables that are not much used or edited.  As  records are added and deleted there's no guarantee either manually-specified or auto-incremented primary keys will stay in the order desired.   So rather than make the mistake of assuming something that should not be assumed we should use the SQL tools provided to guarantee the sort order desired.   If the problem is lack of SQL skills the solution is not to thrash about with kludges but to learn to use basic SQL, which is easy to learn with a bit of focus.

 

Databases / Data Sources

Manifold's world view is that a project consists of databases that store entities of different types.   Those entities are called components.  

 

A component of type table has an associated table. Components of other types do not have an associated table but they may utilize data from some table. All types of components have metadata that is stored in a system table named mfd_meta. The names and types of all components are listed in a system table named mfd_root.

 

icon_data_source_cylinder.png Data sources are external files or databases.   A database can store a reference to another database. This is done by creating an entity of the type data source in the database that refers to another database.  Data sources can be nested with no limit to nesting depth.

Types of Data Sources

The system supports many types of data sources, roughly classified as:

 

 

Databases and file databases differ from all other types of data sources in that they normally support storing multiple tables (albeit usually with various limitations) and can thus store many components at once.    They are also normally updatable in that they allow the creation and deletion of components within them.  

 

Databases and file databases can also use Manifold dataports, which are pluggable pieces of logic designed to interpret raw data contained in databases. For example, the Manifold dataport for ESRI ArcSDE data interprets raw blobs and metadata within ArcSDE for structured use within Manifold as drawings.   

 

A dataport might allow one-way conversion (reading) or two-way conversion (reading and writing) with two-way conversion implemented in Manifold dataports whenever possible and sensible.  All dataports available for a specific type of database are turned on by default with provisions to turn off some dataports if desired, for example, for performance reasons.

 

File databases and files differ from all other types of data sources in that they have a recognizable physical file somewhere within the file system, a characteristic that Manifold can exploit.   

 

For example, Manifold can place additional files near such files to cache some data derived from the file for faster access.   In such cases Manifold might create a cache file containing a spatial index for faster use of a file data source that contains geometry data but which does not in the native file format support spatial indexes.  Manifold usually will provide options such as whether to allow creation of such additional cache files and so on.

 

Manifold can also cache data for data sources that are web servers, but that is done by creating tables in the parent data source, if the parent data source exists.   Although a parent data source might not exist in purely programmatic use via the script API, a parent data source always exists otherwise because there is always a project and a .map file associated with that project.  Even projects that have not yet been saved to a named .map file work on the basis of a temporary, unnamed and as yet unsaved .map file.

Nested Data Sources

Probably the most frequent use of nested data sources is when a data source has been created in a given Manifold project and then the .map file for that Manifold project is used to create a data source in another Manifold project.  

 

For example, we can create a data source in a project and then save that project as a .map file called myproject.map.   The myproject.map project has one data source in it, that is, a data source nested one level down within the project.   Suppose now in a different Manifold session we create a new project: we can create a new data source in that new project from the previously saved myproject.map file.  When we expand that data source we will see within it the data source previously created within the myproject.map project.    In our new project that data source is now two levels deep.

 

Nested data sources can be very convenient for organizing many projects and sub-groupings of data.  Suppose we have a worldwide business with tables organized by regions and subregions in each country with databases for each.  For example, in the Centre region of France we have databases for the various departments of Centre such as Cher, Eure-et-Loir, Indre and so on. (In France, sub-regional administrative divisions are called "departments," similar to how US states have "counties.")

 

To keep these organized we use .map files with nested data sources. At the lowest level for each department or subregion of a country we have a .map file that contains tables for that department, for example, a .map files that hold tables for Cher, one that holds tables for Eure-et-Loir and so on.  To group those for convenient access for regional projects we have map files for each region such as a .map file for Centre that has data sources created in it from the various departmental .map files for that region. For national projects we have a .map file for France that has data sources created in it from the various regional .map files, such as Centre, Normandie and so on. At the continental level we have a .map file for Europe that has data sources created in it for the various country .map files such as for France, Germany, Italy and so on.  

 

We can open a worldwide project that has data sources created in it from or the European, North American and other continental level .map projects and then drill down to the tables all the way within the lowest subregional level. All of the different data sources will appear nested within each other in the Project pane's hierarchy.  In the illustration below we've drilled all the way down through nested data sources to open the Customers table for the Eure-et-Loir department.

 

il_nestedsources01_01.png

 

To help us keep track of the origin of a component, component windows (such as a table window) will note the data source of the component listing nested names in order separated by a < character.   In the example above the Customers table is within the Eure-et-Loir data source that is in turn within the Centre data source that is within the France data source within the Europe data source.

 

There are many useful aspects to nested data sources but one of the most convenient is that we can reach out through any of them that are read/write to work with, including changing, their contents as if they were at the top level of the project.

 

il_nestedsources01_02.png

 

For example, as in the illustration above we could slow-click on the Categories table to rename it if we wanted, just as if it were a table hosted in the top level project and not something four or five levels deep in one project file and data source nested within another.

 

Nested data sources require no storage (their data is stored in whatever origin file or server to which the data source refers)  and they may be used with zero loss of performance, just as if they were part of the parent project at the top level.  When we import a large image into a Manifold project it will open instantly.   If that same large image appears in some data source nested five levels deep within Manifold projects it will still open instantly when we double-click it.

 

Examples

Example: Add a Computed Field to a Table - How to add a field to a table.  We first set the values for the field in a static way and then we show how to set the values for a field dynamically with a computed field.

 

Example: Create a Table with a Constraint  - Create a simple table that includes a simple constraint upon one of its fields.  

 

SQL Example: Create a Table with a Constraint via a Query -  A simple example using SQL to create a table with a constraint.

 

SQL Example: Force an Anomaly in Constraints - Constraints are only evaluated when we insert or update records.   If a constraint refers to external data, such as the values in a different table, we can force an anomalous condition where the table with the constraint may contain data that no longer meets the requirements of the constraint.

 

SQL Example: Using the mfd_id Field and mfd_id_x Index - A sequence of SQL examples of working with the mfd_id field and mfd_id_x index.   

 

Discussion: Nested Data Sources

We would like to keep track of archaeological sites in Germany, France, Italy, Turkey and Bulgaria. We have Manifold projects for each of those countries saved to .map files called  germany.map, france.map, italy.map and so on.   Each of them has many large tables, drawings, images and maps.

 

Our new research topic involves trends in the French city of Strasbourg adjacent to the border with Germany.  For that work we will want data from our projects in Germany and in France.   We create a project and save it as strasbourg.map.

 

We can easily use our stockpile of German and French data within that new project by using the File - Create - New Data Source dialog to create new data sources called Germany and France which use the germany.map and the france.map project files as the source of their data.    Doing so will create two new data sources in the Project pane, each marked with a yellow database cylinder and a + icon button we can press to expand the data source.     If we click the + icon button next to the new France data source to expand it within the Project pane hierarchy we will see the contents of that project.   

 

Suppose now we would like to use an image server to create images of the region around Strasbourg.    To keep all of our data for France together in the same archive we would like to create that image server within the database that the france.map project file represents.  That's easy to do.   With the focus within the France data source hierarchy (just clicking the + button to expand that data source has already moved the focus into that data source's hierarchy) we create a new data source using a Manifold image server, for example, a Virtual Earth (VE) Hybrid image that shows streets overlaid on satellite photos.

 

When we do that we'll see files for an image server data source created in the usual way, all within the France parent data source hierarchy.  We can open the VE image server's image and see the visual display of satellite imagery overlaid with roads and other vector data provided by VE.   If we browse through the hierarchy of the France data source we will also see the .Tiles table Manifold created to cache tiles fetched from the VE server over the web that were used to create the image we see.  The full name of that cache table will be something ungainly, but self-documenting, such as "imageserver_VirtualEarthHybridImage.Tiles."   The tile cache table was created within the France data source because that is the parent data source of the newly created image server data source nested within.

 

To recall where we stand in the hierarchy of nested data sources, we are working with a project saved in a file called strasbourg.map.  That project in turn has two data sources in it called France and Germany which seamlessly link into the project the Manifold databases and all their components that live within the france.map and germany.map project files we had created during earlier work.

 

Within the France data source we have created yet a third data source, this one coming in from a web server in the form of a Manifold image server connection to hybrid imagery served by Virtual Earth.   When we look at the images served up by VE, panning and zooming into them to see our area of interest in Strasbourg, France, we are looking in to the third level of nested databases: the VE image server data source nested within the France project data source that is nested within our current Strasbourg project.

 

There are two things that are very cool about this situation.  First and foremost at the high end of the "cool" scale is that we encounter absolutely zero performance loss from nesting databases / data sources in this way.   The entire contents of the france.map project are seamlessly available to us within our strasbourg project just as gobsmacking fast as if we had opened the france.map project within its own Manifold session and were not using it within the strasbourg project as a nested data source.   We could nest data sources ten levels deep and still have zero performance loss.

 

The second very useful characteristic of nested data sources is that changes propagate as we would like and would expect.   If we make any changes to a nested data source, for example, adding an image server data source nested within, that change updates the parent data source as we would expect.

 

If we had nested an image server data source within a SQL Server data source the addition would take effect immediately, since any edits within SQL Server take effect immediately, as anyone who works regularly with SQL Server well expects.

 

In the case of a data source that is a .map project file, when we added the VE image server data source as a nested data source we would notice that a small asterisk * symbol appeared in the corner of the yellow database cylinder icon next to the France data source name.   That asterisk reminds us the project needs to be saved for the changes to be saved to the .map file used for that data source.

 

When we next save our strasbourg.map project the .map files used within any nested data sources will also be saved along with any changes we've made, such as adding a nested VE image server data source within their database.

 

Suppose we save our strasbourg.map file and exit Manifold.   We launch Manifold again and we open the france.map project.   We'll see that the project now has a new data source in the project, the VE image server data source.   It also has the ".Tiles" cache file still full of tiles from our previous browsing of the VE image server display.   Because the france.map project was the parent data source when we added the VE image server data source in our earlier session that nested data source and any accessory files that appeared with its use all were created in the france.map project.

 

Suppose we close that project and now we create a new, empty project that we save under the name alsace.map (Strasbourg is the capital of the Alsace region of France).   We use File - Create - New Data Source to create a data source that uses the france.map project as its database.   When we open that data source we will see the nested VE image server data source still there, ready for use.

 

Such effects make it easy to create repositories of related data and to organize additions to data so they go into the right repository.    If we are going to do work in the region around Strasbourg it could be useful to have our data sets for Germany immediately at hand as well as the data sets for France that we expect to use in our work with Strasbourg.   But we also may as well ensure that various additions involving French data for Strasbourg get created within the French compilation so that they can be used in any other projects that link in the French repository.

Discussion: Nested Data Source using SQL Server

Consider an example that shows data source nesting on a machine  that has Microsoft SQL Server available for use.    We open Manifold and create a new project.   We use File - Create - New Data Source to create a new data source for a SQL Server database which we will call SQL Server.   The new data source appears in the project pane with a yellow DBMS cylinder icon and a + sign indicating that data source can be expanded, that is opened.

 

In the Project pane we click on the + icon to open the SQL Server data source.   Next, we create a new data source for a Manifold image server, for example, the Virtual Earth Street Map image server, allowing it to cache data whenever it wants to.   A new data source for the image server appears, under the SQL Server data source hierarchy since that was opened and had the focus when we created the data source for the image serer.

 

We open the image server street map image and see it download tile data from the web to form the expected image of a street map.  If we look closely we note that it has also created a new table in the SQL Server data source with a .Tiles extension to the name as a table to hold the tiles it has downloaded.   The .Tiles table was created in the SQL Server data source because that is the parent data source to the new image server data source we created using the VE image server.

 

Suppose we now exit Manifold.   All the tile data and the created data source are still in the SQL Server database we used.     Suppose we launch Manifold again and create a new project.   Within that new project we use File - Create - New Data Source to create a new data source that once again uses the same SQL Server database we used earlier.   When we click on the + icon to expand that newly created SQL Server data source we will see the data source for the Manifold image server that we created in our earlier Manifold session and also we will see the .Tiles extension table it created with tile data it had cached.

 

This example takes advantage of the sophisticated capabilities of SQL Server.  It won't work if we try it on a data source that we created using an .mdb file database because the limitations of .mdb for the names of  tables will prevent Manifold from creating tables using the default naming scheme used for image servers.   It won't work with sqlite files either since those cannot host the indexes Manifold would like to create.  But it will work with other sophisticated DBMS environments like PostgreSQL or Oracle.

 

See additional examples of using tables in the Images topic and other topics recommended below.

Notes

All Data is in Tables - We are accustomed from DBMS applications to click open tables and to see them in some sort of row and column table browser, and we are accustomed from applications like graphics editors to click open images or CAD drawings and to see them as something other than tables, as photographs or other images, for example.

 

So it could be that when we first click open an image in Manifold and see it as a photograph we might think, "Hey... that's not a table, that's a photo."    But the data for that image really indeed is stored in a table somewhere in Manifold and it is only the display handler of the image window that knows how to interpret the table's data in the case of an "image" component to present it for us to see in visual form.

 

If we doubt the data is coming from a table we can right click on the image component and choose Properties to see the name of the table that stores the data for the image.  If we open that table we can see the records which contain the data for the image, most likely in the form of tile data within each record.

 

In the examples we created images by copying them from existing images and then we edited the properties so they would use different tables or different fields as the source of the image data.   We also could create images from tables using SQL.

 

Really Big Tables - Tables in Manifold can be very big data, so big that it does not make much sense to use a table window to manipulate such truly enormous table data.  A famous thread in the Forum discussed a LiDAR point cloud data set that contained 1.72 billion records in the table.   How big is a table that shows 1.72 billion records?  If we displayed the table in a series of screens where each page full of records was the height of a typical computer monitor screen the total length of the table would be over 8600 km, or about 1.35 times the radius of the Earth.   That is such a large table that no amount of interactive viewing of the table would show anything more than the tiniest fraction of the table.   Such large tables are handled with SQL or programmatically, or they display their contents in drawing or image layers, but they are not typically browsed interactively using table windows.

 

Sorting and Collation -  When sorting a table window by clicking on a column handle there is no option to specify collation.  If we would like to specify collation, we should use an SQL query using SELECT ... ORDER BY.

 

Sorting and Slow Data Sources - Table windows showing tables in fast data sources will fill before we can do anything.   However, when filling table windows from slow data sources the table window might not be filled before we manage to click a column handle to command a sort.  Clicking a column handle to sort a table window that is in the process of filling will stop filling the window.

 

Autogeneration of Queries - For an "inside look" into how components can be created from tables by using SQL we can take advantage of Manifold's query window.  Choose View - New Command Window - SQL to open a command window for queries.    The topmost pane is where we would write query text, but that topmost pane has intelligence of its own.  We can Copy a component (for example, an image or a drawing) in the Project pane and then Paste into the query text pane and Manifold will write in the query pane the SQL statements which generate the pasted component.   In fact, we can Copy  more than one component from the Project pane and Paste them into the query text pane and Manifold will generate multiple CREATE statements for us.  See the Examples in the Queries topic.

 

Connection Prerequisites - Connections to data sources may depend upon having necessary .dll files on our system or other non-Manifold considerations.   For example, like other Manifold products Manifold System utilizes Microsoft routines to connect to Microsoft formats such as .mdb, which can lead to some 32-bit / 64-bit integration issues depending on the vintage of our Windows system, what other Microsoft software we've installed and so on.  The rule of thumb when running 64-bit systems is to launch Manifold in 32-bit operation when connecting to Microsoft "Jet" formats such as .mdb, .xls and so on, to ensure that the predominantly 32-bit orientation of Microsoft routines to connect to Access-style file databases won't cause the connection to fail to work on a 64-bit system.  

 

Other database systems may also have their integration quirks to enable connections.  For example, to connect to .sqlite file databases we must have the required sqlite3.dll files either in the execution PATH for our system or in the same folder from which the manifold.exe file which we use to launch Manifold is located.   Put the 64-bit sqlite3.dll file in the ~\Bin64\ folder along with the 64-bit manifold.exe executable and the 32-bit sqlite3.dll file in the ~\Bin\ folder along with the 32-bit manifold.exe executable.

 

Compiled commands - Compiled commands, as used by the Manifold query engine in EXECUTE statements, are used not only in OLE DB and ODBC data sources but also for many others, for example, ADO.NET, MySQL, Oracle, PostgreSQL and SQLite.

 

MTA - OLE DB data sources use MTA (Multi-Threaded Apartment threading model) whenever possible for better performance.  Also greatly helping performance is that fetching large amounts of data from a remote database need not be done in pages.

 

Testing for NULLs -  Manifold tests for a null using the IS NULL operator. Many DBMS applications also have operators such as IS TRUE, IS FALSE and sometimes IS UNKNOWN, which is more or less synonymous to IS NULL.    Manifold does not use IS TRUE and IS FALSE since they are way too redundant given that Manifold's equality operator works with boolean values.    For example, instead of <something> IS TRUE we can use <something>  = TRUE or just plain <something>.   Instead of  <something> IS FALSE we can use <something> = FALSE or NOT <something>.  

 

Reserved names - All names starting with mfd_ (case not significant) are reserved for use by the system.  Names for fields, indexes, constraints, tables, components other than tables, properties... everything. Do not name anything beginning with mfd_ or MFD_ or in any upper or lower case combination of those characters.

 

Centre - The discussion of nested data sources refers to the Centre region of France.  More accurately the name of the region is  Centre - Val de Loire, but it is often referred to as simply Centre for short.

See Also

Getting Started

 

User Interface Basics

 

Filters

 

Data Types

 

Indexes

 

Queries

 

Drawings

 

Images

 

Labels

 

Table Windows and Big Data

 

Editing Tables

 

Schema

 

Examples - Do not miss!  Browse through the many examples for step by step tutorials.

 

Example: Closing without Saving - An example that shows how File - Close without saving the project can affect local tables and components differently from those saved already into a data source, such as an .mdb file database.

 

Example: Select a Range of Records in a Table - We can select or deselect a range of records between any two records in a table window using fast and easy mouse commands.

 

Example: Drawings use Geom Fields in Tables  - An essential discussion on how drawings are created from geom fields in tables, including how the drawing knows which coordinate system to use.

 

Example: Multiple Drawings from the Same Table - Illustrates how easy it is to create multiple drawings that use the same table and same geometry by copying and pasting an existing drawing.  Each new drawing takes no additional storage space in the project, but can be formatted differently.