Home
Page
Database Commander Case Study: Build a
Database of Dealers
Welcome
to the first Case Study for Database Commander. The Commander launch is hot
with immediate application everywhere. One very strong sign is that we
find ourselves using Commander to build manifold.net's business as well.
In addition to cool database software, manifold.net
makes GIS (mapping) software as well as 3D Visualization software. It
occurred to our Marketing department that dealers selling competitive GIS
systems would find our 3D Visualization software very useful and would probably
want to hear about Commander as well. GIS is very database intensive and
the same sorts of jobs Manifold System users send to Commander probably occur in
other GIS systems as well. We set out to harvest dealer mailing lists from our
competitor's sites so we could send them information about 3D View Studio and
about Commander.
There are faster ways of doing the following that
might be used by an expert Commander user; however, the procedure given is still
very fast and it's easier for a beginner to follow. It's basically a copy
and paste between two Commander windows that have been opened simultaneously.
The vendor's reseller locator page allows users
to enter a location (such as a country) and in response a table of dealers in
that region is served. We will run a query for each region in which they
have dealers and then save the resulting page to a series of files called r1.htm,
r2.htm... and so on. We will open two Commander sessions. In
one Commander we will open a new, blank database. In the other Commander
we will open each saved .htm file in turn, Copy the data it
contains and then Paste into the other Commander session to build up our
database.
Once the database has been assembled, we
will use Database Commander's transformation features to alter the data from web
jargon into a form that we can exploit. For example, we will extract email
addresses into their normal form.
| Harvest
Data from Web Pages |
 |
Locate
target web site and begin running queries. This shows the results
of a request for dealers in "Central Europe" as served by the web page. Save this page as r1.htm
Click Back in your browser and ask for all dealers in
"Western Europe" and save the resulting page as r2.htm
... We will do this until we have asked for all the regions we want and
saved the results as a series of .htm files on our local
computer. |
| Assemble
Harvested Pages into One Database |
| We
now have many .htm files saved, perhaps several dozen, that we
will assemble into a single database. We do this by opening two
copies of Commander. One copy of Commander has a new blank
database. The other copy of Commander we will use to open the
various .htm files in turn and then copy and paste / append their
contents into the database we are growing. |
 |
Launch
Database Commander and create a new database. |
 |
Launch
a second copy of Commander and open r1.htm. Note that
Commander is not confused (usually) by weird Java applets... it's
grabbed the table of data out of all the other junk (spam, banners,
text, images, etc) served by the web page. Note that some web
pages will have more than one table containing real data. Click on
the various tables in Commander's Database tree pane to instantly see
which have data and which do not. |
 |
Right-click
onto the table name and choose Copy |
 |
Switch
to the Commander session with the blank database, right click onto Tables
and choose Paste. This is a short hand way of pasting the
table structure as well as the data. |
 |
Note
that the pasted data appears with cells highlighted in yellow background
color to show they have been changed. |
 |
Switch
to the Commander session that is being used to load the .htm
files saved previously. Close the r1.htm file and load the
next file, in this case r2.htm. |
 |
Click
on Highlight All to highlight all the records. |
 |
Right-click
anywhere in the Data Pane and choose Copy. This
copies all the highlighted records. |
 |
Switch
to the other Database Commander session and right-click anywhere in the
Data Pane table and choose Paste. |
 |
This
pastes / appends the copied records into this table. By pasting
into the Data pane table view, we've appended the records. |
| Continue
in this way until all of the .htm files saved from our Internet
sessions have been copied and pasted into one big table in our new
database. |
| Remove
Duplicates |
| Because
the web site we've interrogated will often provide the name of the same
dealer in tables responding to different queries, we need to get rid of
duplicates. First, we will clean up the display a bit by renaming
the big table to "dealers" and by deleting a few of the
fields that were imported that we do not need. Renaming the table
will change the background color back to white, a sort of "starting
afresh" moment. Next, we will use an Analysis bar to
find duplicates. |
 |
We've
opened the Analysis pane and created an analysis bar that finds
the total duplicates except the first one for the Address
field. Clicking on the highlight button will highlight the
five records that are unnecessary duplicates. |
 |
Right
click on any of the highlighted records and choose Delete Records.
This will get rid of all unnecessary duplicates. Isn't this the
easiest way to get rid of duplicates you've ever seen? Note how
Commander can get rid of all duplicates or only those duplicates after
the first one (which is what one usually wishes to do). |
| Extract
Email Addresses |
 |
Note
that the "Contact" information is usually a name and an email
address. However, not all records have an email address.
Also, the email addresses are given in web-page style using
"#" characters and a "mailto:" command. Let's
separate the names and email addresses into separate fields. |
|
Create
a new text field called Name. This step is not illustrated
because it is so easy. Right-click on the Fields heading in
the Database Tree pane and choose New.... |
 |
Check
that the "tokens" separator selection in Tools - Options
includes a "#" character. When we use token
commands, any text that occurs between # characters will be considered a
"token". ....Can you see where we will go with this idea
in the next step? |
 |
Use Contact,
Move First Token To, Name in Transformer toolbar.
Press Apply. For each record in our database, Commander
will take the first "token" it finds in the Contact
field and move it to the Name field. |
 |
Here
is a screen shot immediately after we have pressed Apply.
Note that the "changed" cells have yellow background
color. Also note that the names have been correctly extracted
using the "token" command even in cases where there are no
separator characters (because there is no email address in some
records). If there are no separators, the entire text is
considered the "first token." |
 |
We
note that #mailto: in front of each email address consists of 8
characters, so in Transformer toolbar we use Contact, Remove
Left, 8 and press Apply. This removes the
leftmost 8 characters from the Contact field. |
 |
Now
we remove the trailing # character. Again in Transformer,
we use Contact, Remove Right, 1 to remove the
rightmost character from the Contact field. The result is a
field with a correctly formed email address in all cases. |
 |
While
browsing the database, we note that the Specialty field often
contains " ", which is web page jargon for
"non-breaking space." We will replace this with an
ordinary space character throughout the
database using the search and Replace function. [Isn't
Commander wonderful ?!] |
 |
Browsing
again, we note that some fields contain information that includes line
breaks. We can see this by "dragging" the border between
records a bit to increase the height of a record row as seen in the
screen shot. Note that if desired, Database Commander's Replace
function makes it easy to search and replace Line Breaks
as well as other special character combinations such as Any Digit
or just White Space. Is this cool or what? |
If desired, we could further transform this
database. For example, we note that Address fields contain
multi-part addresses in a single field. We could easily use Replace
to insert some "token" characters not normally found in addresses
(such as a vertical bar | ), and then use the various Token
commands in Transformer toolbar to move address components to new fields
we create such as Country, Postal Code, Province and so
on. The way we deal with records that have some of these components in
their address but not others is to do a Token move command and then click
on the head of the target field to sort by that field. We can then easily
see which records did have, for example, a postal code and which did not. We can
select the exceptions and then apply Transformer only to the selection to
move parts of the address back and forth between desired fields.
The database we have just transformed is now our
own database of this vendor's dealers. The entire operation from beginning
to end took about 30 minutes, including the browsing of their web page to
extract dealers. In some areas, the vendor only had a single dealer so
that many of the web page "tables" we saved from the browser as .htm
files had only a single record in them. That's no problem, because it is
faster to just continue with the "assembly line" procedure than to
figure out some optimization.
We will use this database with the Manifold
Send Email system to maintain an email system within Manifold and to mail to
these dealers using their email addresses. After some further processing
to get the addresses in the form used by our postal software, we will also use
their Address fields for a mass mailing to the dealers.
A
Steal at Twice the Price! - Manifold Database Commander is priced at
only $145: that's the best deal ever in database software!
Visit the Manifold.net Online Store
anytime to purchase
Manifold products.
Manifold Database Commander ships from stock same day on orders received before
2:00 Pacific Time. Details
regarding shipping and purchasing may be found on the Online Store's information pages as
well as on the Miscellaneous Info
page. Email sales@manifold.net at
any time, or call 800-556-5919 during business hours if you have any
questions regarding products, purchasing, or shipping information.
Back to
Manifold Home Page