Mata Hari -- Power Search Tool Bring the entire Web to your desktop.

[Newly Revised]

[Tutorial: Guide to Effective Searching on the Internet]

Section 1: Searching with Internet Provided Resources

Looking for that perfect condo for your ski trip? Needing specifications for a manufacturer's particular piece of equipment? Want discussion and commentary on your favorite, but obscure, author? Trying to find out what your competitors are up to? Seeking recent studies on planets in other solar systems? Needing information on special scholarships for which you might be qualified?

These, and millions of queries covering every conceivable topic, are now being posed daily to the Internet's search services. With anywhere from 300 million to 550 million or more publicly available documents - an amount remarkably doubling every 18 months - the Internet has become a vast, global storehouse of information. The only problem is: how do you find what you're looking for?

Unfortunately, there is no Dewey decimal system or central "card catalog" for the Internet. You must use a search service to find new information. Search services come in one of two main flavors. Each has its place, depending on your information needs.

'Directories' use trained professionals to classify useful Web sites into a hierarchical, subject-based structure. Yahoo is the best known and most used of these services. Directories are most useful when looking for information in clear categories, such as makers of yogurt or listings of educational institutions. Each directory uses its own categories and means to screen useful sites and assign them to a single category.

'Search engines' work differently. Excite, AltaVista and Infoseek are some of the best known engines. They "index" (record by word) each word within all or parts of documents. When you pose a query to a search engine, it matches your query words against the records it has in its databases to present a listing of possible documents meeting your request. Search engines are best for searches in more difficult topic areas or those which fall into the gray areas between the subject classifications used by directories. But, search engines are stupid, and can only give you what you ask for. You can sometimes get thousands (millions!) of documents matching a query. Also, at best, even the biggest search engines only index up to one third of the Internet's public documents.

So, while three quarters of the users cite finding information as their most important use of the Internet, that same percentage also cite their inability to find the information they want as their biggest frustration. The purpose of this tutorial is to help you end that frustration.


 Your ability to find the information you seek on the Internet is a function of how precise your queries are and how effectively you use search services. Poor queries return poor results; good queries return great results. Contrary to the hype surrounding "intelligent agents" and "artificial intelligence," the fact remains that search results are only as good as the query you pose and how you search. There is no silver bullet.

Most Internet searchers, perhaps including you, tend to use only one or two words in a query. Big mistake! Also, there are very effective ways to "structure" a query and use special operators to target the results you seek. Absent these techniques, you will spend endless hours looking at useless documents that do not contain the information you want. Or you will give up in frustration after search-click-download-reviewing long lists of documents before you find what you want.

All of us need information. But few of us have studied information or library science, and not everyone has used search services or Internet search engines sufficiently to learn all of the nuances. This tutorial is for those who are learning the ropes about 'power searching.' But, even if you're quite experienced in these areas, you might find some benefit from glancing through these topics.

This tutorial is organized to proceed from the basics to more advanced topics. It is divided into two sections: "Searching with Internet Provided Resources" and "Using a Powerful Desktop Resource — Mata Hari®". The first section has 12 parts containing 51 topics and describes the search services, available operators, and the extremely important information on how to compose your queries. The second section contains 11 topics and describes using our tool: Mata Hari — which we believe is the most powerful search tool ever developed. As heavy duty searchers ourselves, we had to create Mata Hari to automate and expedite the search process for our own needs. A description of its features and how it works is provided for your own assessment of whether or not you can benefit from this powerful tool.

Simple to follow examples are presented in each topic. We've written it to be a one-stop reference. Don't feel you need to work through all of the topics in one sitting. But, if you do take the time to work through this material, we guarantee you'll reap big dividends in faster and more accurate results. And, you will be on your way to earning the title of an Internet "Power Searcher."

You can also download this tutorial in hardcopy:

Download this tutorial as a zipped MS Word 6.0 document (143 kb)
Download this tutorial in Adobe Acrobat (PDF) format (347 kb)

Proceed Immediately to Tutorial Index

Documentation is appended at the end of [1,2]; click on a number citation reference to go directly to it.

Executive Summary: The Two-Minute Bottom Line

To illustrate some of the basic concepts and recommendations covered in this tutorial, let's say we have an interest in recent findings about new planets being discovered outside our solar system. Using the information "contained" in this statement, you can see how an effective query can be built by following these guidelines.

We'll summarize the recommendation, show how the statement is phrased, describe why it's important, and provide a pointer to the specific topic number in the tutorial that covers this recommendation. At the conclusion of the table are the topics and their titles listed by number.


Recommendation Example Why Important? Topic #
1. Use nouns and objects as query keywords planet or planets Actions (verbs), modifiers (adjectives, adverbs, predicate subjects), and conjunctions are either "thrown away" by the search engines or too variable to be useful 6, 7, 8
2. Use 6 to 8 keywords in query new, planet, planets, discovery, solar, system More keywords, chosen at the appropriate "level", can reduce the universe of possible documents returned by 99% or more 8, 10
3. Truncate words to pick up singular and plural versions planet* or discover* Use asterisk wildcard. The wildcard tells the search engine to match all characters after it, preserving keyword slots and increasing coverage by 50% or more 9, Section 2
4. Use synonyms via the OR operator discover* OR find Cover the likely different ways a concept can be described; generally avoid OR in other cases 11, Section 2
5. Combine keywords into phrases where possible "solar system*" Use quotes to denote phrases. Phrases restrict results to EXACT matches; if combining terms is a natural marriage, narrows and targets results by many times 12
6. Combine 2 to 3 "concepts" in query "solar system"
"new planet*"
discover* OR find
Triangulating on multiple query concepts narrows and targets results, generally by more than 100-to-1 20
7. Distinguish "concepts" with parentheses ("solar system")
("new planet*")
(discover* OR find)
Nest single query "concepts" with parentheses. (Overkill for now, but good practice when first learning.) Simple way to ensure the search engines evaluate your query in the way you want, from left to right 19
8. Order "concepts" with subject first ("new planet*")
(discover* OR find)
("solar system")
Put main subject first. Engines tend to rank documents more highly that match first terms or phrases evaluated 7, 19, 20
9. Link "concepts" with the AND operator ("new planet*") AND (discover* OR find) AND ("solar system") AND glues the query together. The resulting query is not overly complicated nor nested, and proper left-to-right evaluation order is ensured 14, 20, Section 2
10. Issue query to full "Boolean" search engine or metasearcher ("new planet*") AND (discover* OR find) AND ("solar system") Full-Boolean engines give you this control; metasearchers increase Web coverage by 3- to 4-fold 3, 35, 36, 38, Section 2

By issuing the query in #9 above to AltaVista, we are able to restrict results from a baseline of 917,754 documents using the query new AND planet (actually 1,139,837 if we were to properly include planets as well) to a count of 2,036 documents [1]. Though that number still seems like a lot, we have reduced our possible universe of results by 400 to 600 times, and four of the first five documents listed give us exactly what we were looking for:

http://www.got.net/~seasons/new.html
http://www.ucar.edu/quarterly/summer97/planet.html
http://www.geocities.com/Area51/Nebula/1456/todaysnews.html http://www.npr.org/news/healthsci/indexarchives/1998/May/980529.01.html

Go ahead; try these queries for yourself!

The ultimate bottom line to getting the best results for your queries is to search multiple services simultaneously using a universal format. Our solution is to provide you full Internet searching power at your desktop via the Mata Hari® product [Section 2].

Do you want to be able to get such impressive results for your own queries? Then, welcome. It's now time to start the tutorial. Or, if a given topic is of more interest to you, click on these topic headings to proceed to them directly:


 Index

Click on these topic headings to proceed to them:

Section 1: Searching with Internet Provided Resources

Executive Summary: The Two-Minute Bottom Line

Part 1: The Size of the Internet

Part 2: Internet Search Basics and Why There's a Problem
Topic 1: Status of the Internet and Searcher's Frustrations
Topic 2: Search Engine and Directory Basics
Topic 3: How Search Services Rank Documents
Topic 4: Characteristics of Searchers and What Takes Search Time

Part 3: Keywords — The Essence of the Search
Topic 5: Sample Information Problem for this Tutorial
Topic 6: Query Concepts: What, Where, When, How, Why
Topic 7: Breaking Down Your Query
Topic 8: Focus on Nouns and Objects
Topic 9: Word Stemming and Use of Wildcards
Topic 10: Finding the Right Level
Topic 11: Synonyms
Topic 12: Use of Phrases

Part 4: Boolean Basics
Topic 13: Boolean Overview
Topic 14: AND Operator
Topic 15: OR Operator

Part 5: Advanced Operators
Topic 16: NEAR Operator
Topic 17: BEFORE and AFTER Operators
Topic 18: AND NOT Operator

Part 6: Advanced Construction
Topic 19: Use of Parentheses
Topic 20: Combining Concepts for Power Searching
Topic 21: Punctuation and Capitalization
Topic 22: Multiple Queries and Query Refinements
Topic 23: Sample Information Problem Revisited

Part 7: Pitfalls to Avoid
Topic 24: Avoid Misspellings
Topic 25: Redundant Terms
Topic 26: Ignored Terms and Special Characters
Topic 27: Alternate Spellings
Topic 28: Too Many Terms, Synonyms
Topic 29: Improper Boolean or Complicated Construction

Part 8: Using Filters
Topic 30: Site Filters
Topic 31: Size Filters
Topic 32: Date Filters
Topic 33: Specialty Filters and Search Options

Part 9: Understand Your Engines
Topic 34: Some Caveats: The Dynamic Search Business
Topic 35: Duplication, Coverage and Responsiveness
Topic 36: Boolean or Not?
Topic 37: A Comparison of 100 Search Services
Topic 38: Features of the Top 10 Search Services
Topic 39: Specialty Engines
Topic 40: Some Other Services to Watch
Topic 41: Some Perplexing Behaviors

Part 10: Specialty Searches
Topic 42: Product Searches
Topic 43: Competitor Intelligence
Topic 44: Market Research
Topic 45: Finding People
Topic 46: Finding Places
Topic 47; Finding Documents
Topic 48: Finding Recent News

Part 11: Solutions and the Future of Searching
Topic 49: Ruminations on the Future of Internet Searching

Part 12: Summary and Further Information

Section 2: Using a Powerful Desktop Resource - Mata Hari
Topic 50: Mata Hari Product Features
Topic 51: What is Fast?
Topic 52: Universal Search Power
Topic 53: Search 140 Search Engines Simultaneously
Topic 54: Using Boolean Power with Non-Boolean Search Engines
Topic 55: Filtering, Phrases, and Plain Text
Topic 56: Efficiently Culling Results
Topic 57: Local Viewer
Topic 58: Engines and Queries Subtab
Topic 59: Scoring Subtab
Topic 60: Terms Subtab

Notes, Links and References


[VMC Home Page] [VMC Home Page] [Search Tutorial Index] [Part 1: The Size of the Internet ]



Mata Hari -- Power Search Tool Bring the entire Web to your desktop.
VisualMetrics Corporation
©Copyright 1999 VisualMetrics Corporation. All rights reserved.
Comments should be sent to the webmeister@thewebtools.com
Last updated: July 1, 1999