Table Windows and Big Data

If we are used to working with small data sets we might have expectations of table windows that don't match the reality of working with big data.   For example, we might expect to be able to open a table window, to see the first records at the beginning of the table and then to be able to scroll to the very bottom of the table by dragging the vertical scroll bar at the right of the table to the bottom.  When we do that with a big table in Manifold and the window display moves only a few dozen records we might wonder, "Hey... What happened?  Where's the bottom?"

 

When we work with spreadsheets involving a few thousand rows or when we work with a database that has a few thousand records it makes logical sense that the table window is a view into the entire table and that the vertical scroll bar shows, at least in some relative way, the position of the current view into the table from beginning to end.  Scroll the bar all the way down and we see the last few records.  Scroll the bar all the way up and we see those in the beginning.

 

But that is not a good mental model to apply when working with tables involving millions of records.    A window showing a few dozen records out of millions shows such a microscopic fraction of the total table that there is no sensible meaning to a vertical scroll bar in the context of the entire table.   

 

We could, of course, use Manifold controls such as Ctrl-End to jump to the end of a table and display a screen full of records from the end of the table.  We could then scroll up from there.  But even if we do that we are still seeing just those few records which fit on those screens to which we can jump or scroll through - a fraction of a large table.

 

Using the vertical scroll bar to get near a single record out of millions would be like trying to use a horizontal scroll bar on a map showing all of the United States to zoom into a particular street address in Kansas City.  That is not a realistic expectation.

 

Instead of using a scroll bar for the entire US to try to find a specific address in Kansas we would use a different approach: we would zoom into that address by using some automated search tool, for example, by entering the address into a search box that would zoom us into just the immediate area around that address.

 

Table windows in Manifold are like that as well.  They are a view of the table intended for browsing a screen's worth of records a time, so once we find individual records using some automated means, like a Ctrl-F for Find, we can see in the table window the desired record in context, to edit that record and so on.

 

When we use table windows with small tables it is true we can browse the data in a window to review many of the records and thus get our heads around the data or find records that interest us.    But that doesn't work when data sets have millions of records because we could spend weeks peeking at data through windows showing a few dozen records at a time and still not see more than a fraction of the records.  

 

Table browsers usually are not an effective way of getting our head around big data.  Instead, we can get our heads around the data by using tools like SQL to write clever queries and to perform insightful analyses, that, like magic, slice and dice their way through millions of records to find or to manipulate just those records we want.

 

Manually sifting through millions of records a table full at a time is no way to find a needle in a haystack and that's not what table windows are for in big data.   Instead, table windows are just a way to browse very small glimpses of a big data set.  They are convenient for editing records found by other means in the context of records around them, to look at views of a few hundred records here or there to see if some command had wildly unintended effects and for other such specific, usually limited, purposes.   

 

Nonetheless, once we have a tool like Manifold on hand even if we procured it for our big data projects we might also use it casually as well, as a personal information manager or for data sets involving just a few hundred or a few thousand records.  It's just like how many IT professionals who use Oracle for their enterprise might also use Oracle to keep track of a hobby collection like stamps or coins or wines.   In such cases the table window will be very handy for browsing tables, editing records and so on.