Home Page

Database Commander Case Study: Build a Database of Dealers

Welcome to the first Case Study for Database Commander.  The Commander launch is hot with immediate application everywhere.  One very strong sign is that we find ourselves using Commander to build manifold.net's business as well.

In addition to cool database software, manifold.net makes GIS (mapping) software as well as 3D Visualization software.  It occurred to our Marketing department that dealers selling competitive GIS systems would find our 3D Visualization software very useful and would probably want to hear about Commander as well.  GIS is very database intensive and the same sorts of jobs Manifold System users send to Commander probably occur in other GIS systems as well. We set out to harvest dealer mailing lists from our competitor's sites so we could send them information about 3D View Studio and about Commander.

There are faster ways of doing the following that might be used by an expert Commander user; however, the procedure given is still very fast and it's easier for a beginner to follow.  It's basically a copy and paste between two Commander windows that have been opened simultaneously.

The vendor's reseller locator page allows users to enter a location (such as a country) and in response a table of dealers in that region is served.  We will run a query for each region in which they have dealers and then save the resulting page to a series of files called r1.htm, r2.htm... and so on.  We will open two Commander sessions.  In one Commander we will open a new, blank database.  In the other Commander we will open each saved .htm file in turn, Copy the data it contains and then Paste into the other Commander session to build up our database. 

Once the database has been assembled, we will use Database Commander's transformation features to alter the data from web jargon into a form that we can exploit.  For example, we will extract email addresses into their normal form.

Harvest Data from Web Pages
dealers01.jpg (149537 bytes) Locate target web site and begin running queries.  This shows the results of a request for dealers in "Central Europe" as served by the web page.  Save this page as r1.htm  Click Back in your browser and ask for all dealers in "Western Europe" and save the resulting page as r2.htm ... We will do this until we have asked for all the regions we want and saved the results as a series of  .htm files on our local computer.
Assemble Harvested Pages into One Database
We now have many .htm files saved, perhaps several dozen, that we will assemble into a single database.  We do this by opening two copies of Commander. One copy of Commander has a new blank database.  The other copy of Commander we will use to open the various .htm files in turn and then copy and paste / append their contents into the database we are growing.
dealers03.jpg (39741 bytes) Launch Database Commander and create a new database. 
dealers02.jpg (85495 bytes) Launch a second copy of Commander and open r1.htm.  Note that Commander is not confused (usually) by weird Java applets... it's grabbed the table of data out of all the other junk (spam, banners, text, images, etc) served by the web page.  Note that some web pages will have more than one table containing real data.  Click on the various tables in Commander's Database tree pane to instantly see which have data and which do not.
dealers04.jpg (91355 bytes) Right-click onto the table name and choose Copy
dealers05.jpg (44406 bytes) Switch to the Commander session with the blank database, right click onto Tables and choose Paste.  This is a short hand way of pasting the table structure as well as the data.  
dealers05a.jpg (84562 bytes) Note that the pasted data appears with cells highlighted in yellow background color to show they have been changed.
dealers05b.jpg (128423 bytes) Switch to the Commander session that is being used to load the .htm files saved previously.  Close the r1.htm file and load the next file, in this case r2.htm.   
dealers06.jpg (131413 bytes) Click on Highlight All to highlight all the records. 
dealers07.jpg (122259 bytes) Right-click anywhere in the Data Pane and choose Copy.   This copies all the highlighted records.
dealers08.jpg (84765 bytes) Switch to the other Database Commander session and right-click anywhere in the Data Pane table and choose Paste.
dealers09.jpg (127214 bytes) This pastes / appends the copied records into this table.  By pasting into the Data pane table view, we've appended the records. 
Continue in this way until all of the .htm files saved from our Internet sessions have been copied and pasted into one big table in our new database.
Remove Duplicates
Because the web site we've interrogated will often provide the name of the same dealer in tables responding to different queries, we need to get rid of duplicates.  First, we will clean up the display a bit by renaming the big table to "dealers" and by deleting a few of the fields that were imported that we do not need.  Renaming the table will change the background color back to white, a sort of "starting afresh" moment.  Next, we will use an Analysis bar to find duplicates.
dealers10.jpg (111477 bytes) We've opened the Analysis pane and created an analysis bar that finds the total duplicates except the first one for the Address field.  Clicking on the highlight button will highlight the five records that are unnecessary duplicates.
dealers11.jpg (104110 bytes) Right click on any of the highlighted records and choose Delete Records.  This will get rid of all unnecessary duplicates.  Isn't this the easiest way to get rid of duplicates you've ever seen?  Note how Commander can get rid of all duplicates or only those duplicates after the first one (which is what one usually wishes to do).
Extract Email Addresses
dealers12.jpg (114006 bytes) Note that the "Contact" information is usually a name and an email address.  However, not all records have an email address.  Also, the email addresses are given in web-page style using "#" characters and a "mailto:" command.  Let's separate the names and email addresses into separate fields.
Create a new text field called Name.  This step is not illustrated because it is so easy.  Right-click on the Fields heading in the Database Tree pane and choose New....
dealers14.jpg (19958 bytes) Check that the "tokens" separator selection in Tools - Options includes a "#" character.  When we use token commands, any text that occurs between # characters will be considered a "token".  ....Can you see where we will go with this idea in the next step?
dealers13.jpg (112012 bytes) Use Contact, Move First Token To, Name in Transformer toolbar.  Press Apply.  For each record in our database, Commander will take the first "token" it finds in the Contact field and move it to the Name field.
dealers15.jpg (113063 bytes) Here is a screen shot immediately after we have pressed Apply.  Note that the "changed" cells have yellow background color.  Also note that the names have been correctly extracted using the "token" command even in cases where there are no separator characters (because there is no email address in some records).  If there are no separators, the entire text is considered the "first token."
dealers16.jpg (106881 bytes) We note that #mailto: in front of each email address consists of 8 characters, so in Transformer toolbar we use Contact, Remove Left, 8 and press Apply.  This removes the leftmost 8 characters from the Contact field.
dealers17.jpg (105919 bytes) Now we remove the trailing # character.  Again in Transformer, we use Contact, Remove Right, 1 to remove the rightmost character from the Contact field.  The result is a field with a correctly formed email address in all cases.
dealers18.jpg (95601 bytes) While browsing the database, we note that the Specialty field often contains "&nbsp", which is web page jargon for "non-breaking space."  We will replace this with an ordinary space character throughout the database using the search and Replace function.  [Isn't Commander wonderful ?!]
dealers19.jpg (92564 bytes) Browsing again, we note that some fields contain information that includes line breaks.  We can see this by "dragging" the border between records a bit to increase the height of a record row as seen in the screen shot.  Note that if desired, Database Commander's Replace function makes it easy to search and replace Line Breaks as well as other special character combinations such as Any Digit or just White Space.  Is this cool or what?

If desired, we could further transform this database.  For example, we note that Address fields contain multi-part addresses in a single field.  We could easily use Replace to insert some "token" characters not normally found in addresses (such as a vertical bar | ), and then use the various Token commands in Transformer toolbar to move address components to new fields we create such as Country, Postal Code, Province and so on.  The way we deal with records that have some of these components in their address but not others is to do a Token move command and then click on the head of the target field to sort by that field.  We can then easily see which records did have, for example, a postal code and which did not. We can select the exceptions and then apply Transformer only to the selection to move parts of the address back and forth between desired fields.

The database we have just transformed is now our own database of this vendor's dealers.  The entire operation from beginning to end took about 30 minutes, including the browsing of their web page to extract dealers.  In some areas, the vendor only had a single dealer so that many of the web page "tables" we saved from the browser as .htm files had only a single record in them.  That's no problem, because it is faster to just continue with the "assembly line" procedure than to figure out some optimization.  

We will use this database with the Manifold Send Email system to maintain an email system within Manifold and to mail to these dealers using their email addresses.  After some further processing to get the addresses in the form used by our postal software, we will also use their Address fields for a mass mailing to the dealers.

145.png (3634 bytes)A Steal at Twice the Price! - Manifold Database Commander is priced at only $145: that's the best deal ever in database software!   

btn_purchase_now.png (2836 bytes)Visit the Manifold.net Online Store anytime to purchase Manifold products. Manifold Database Commander ships from stock same day on orders received before 2:00 Pacific Time. Details regarding shipping and purchasing may be found on the Online Store's information pages as well as on the Miscellaneous Info page. Email sales@manifold.net at any time, or call 800-556-5919 during business hours if you have any questions regarding products, purchasing, or shipping information.

Home Page - Products - Search - Support - Shopping - News - Online Store
Personal Mapping - GIS - Database Commander - 3D View Studio - Maps and Data
Testimonials - Y2K - Links - Licensing - Privacy Statement - Terms Of Use


© 2001 CDA International Ltd. All Rights Reserved.
Manifold is a Registered Trademark of CDA

Prices, terms and conditions, and product specifications subject to change without notice.  Please contact Manifold Net with any special needs or requests.

Back to Manifold Home Page