Current selection:

Home    

Contact us  

Advanced Search     Switch Selection     Classical books index     Preferences

The need for "Research Quality" search engines

Typical findings from the three experiments

Speed of search

All the major search engines are fast. If the response takes over a second to show on your screen, it is likely the connection that is slowing things down. Incidentally, any search engine within a program on your computer should provide instantaneous response if the content is properly indexed. If it is perceptibly slower, the program has a search technology from the early 1980s.

Results reported -- number of "hits" found

Search on words or phrases often yield hundreds of thousands, or millions of hits in an Internet search. This is despite the fact that even the best known Internet search engines cover only a fraction of the total Internet. There are typically fewer, yet still many, hits reported for search on combinations of words and/or phrases. A question for you: If you are told that there are 2,390,000 hits for a search, how many entries do you actually view? A hundred? Fifty? The first ten? Whatever your answer, it is a minuscule fraction of the total hits reported. You are offered far more results than you would ever dream of using. The point of this: The first few hits really matter. In other words, the method of relevance ranking has a critical bearing on the quality of your efforts.

Search terms present -- Internet search

Single words: When searching across the Internet, requested single words are virtually always present, typically more than once, in each of the first ten results. When the user specifies a single word, its frequency is one of the factors in calculating relevance. Among the first ten hits, the word is likely to show up quite often.

Phrase search across the Internet generally follows the same pattern ... at least one appearance (and often more) of the phrase in each of the first ten records. The higher quality search systems generally handle phrases without any problem. Warning: There still are search engines that insist on ignoring the quotation marks that are intended to designate a phrase. You attempt a search on "Harley Davidson" and wind up with results such as "Harley Granger" and, a few paragraphs further along, "Jean Davidson". Thanks, but no thanks.

Search for combinations of words and/or phrases ... Prepare for major disappointments. There are three very common problems, even in the best known search systems.

  1. One or more terms are not there, even in the very first (and presumably best) hit. In their desire to impress you with how many records can be found, the designers let the search engine switch your request from all the words (the first word AND the second word AND the third word, etc.) you wanted into a totally different search for any of the words (the first word OR the second word OR the third word, etc.). For a more detailed explanation, check out the glossary entry for Boolean logic and the difference between AND and OR in a search. The point is that some search engines routinely give you what you did NOT ask for.

  2. All the words or phrases are present, but these terms are paragraphs apart. The result has little or nothing to do with what you are seeking.

  3. Catalog-style lists of people, activities, organizations, books, or whatever dominate the pages reported as "most relevant". These pages are large, with many different words. These pages are selected because they have a term or two in one entry, another term in a different entry, and so forth. That is worse than useless, since it only adds to the clutter that the user has to examine. A list page would be meaningful to the user only if all the terms refer to one entry in the list.

Search terms present -- Web sites, Intranets, ebooks, contents of personal files

Consider this test that was carried out in December 2007 at Experts-Exchange. Selecting "E-E" rather than "Internet" meant that the web site was being searched. One term "HXS" (which identifies Microsoft help files ) was requested. 335 hits were announced. Among the first 25 of those hits, two and two only contained "HXS", another two had "HEX", one had "HXX", and the remaining 20 had "HX". In other words, 92 percent of the first 25 hits did not even have the search term that was requested.

To put it politely, time wasting results like this are "not untypical".

What's going on?

  1. The engine designers offer variations that contain some of the same letters. This certainly increases the number of hits. Unfortunately, the underlying assumption is that the searcher does not know what she wants. The effect is that the knowledgeable searcher is loaded down with unwanted results.

  2. The factors that accounted for success of the major search engines on the Internet simply do not apply in other settings. There is not the network of linkages across pages in an Intranet that express relative importance of pages. There is no one to work up key words for searching ebooks or your personal files. The search engines cannot fall back on community interest groups to add value.

  3. The temptation to add new features over time gets in the way. "It would be neat to give them automatically words that are similar to what they ask for." Result: More hits that consist of stuff that you did not ask for. Control is being kept firmly in the hands of the engine designers and providers.

Types of information obtained by search

The results of search for a single word help you define the word, and get an idea of the variety of its usages and contexts. In any writing task, you need a firm grasp of meanings and usage. In other words, single word search provides preliminary, but often useful, information.

Phrase search is also useful for definitions, but goes beyond that. Phrases are useful for finding web sites, products, organizations, persons, and more. They are helpful for tracking concepts, but suffer the limitation that (typically) only exact phrases are found. Concepts or ideas do not lend themselves to consistent wording. Often a word is inserted or a word is changed, and the record with the alternative form is missed by phrase search. For example, search for the exact phrase "that they may be one" will capture all instances of the five words in a row, but will ignore wording that is almost the same, yet has the same meaning: "that they may all be one".

Search for combinations of words and/or phrases is a stronger method for detecting patterns among words. Most learning activities and research aim at this conceptual level. Words, word roots, and phrases when they occur near one another steer us nicely in "diligent and systematic inquiry or investigation into a subject in order to discover or revise facts, theories, applications, etc." (That's the Random House Second Edition Unabridged Dictionary definition of research.)

Previous | Research Quality | Next


 
Meaningful  precision  search  for text data  is available now.  Learn more.
 
words close together.com The "Research Quality" Search Engine by Marpex, Inc.