The need for "Research Quality" search engines
Quality issues in contemporary search
Single word search
The standard online search engines are all excellent at single word search; they give you exactly what you are looking for, very quickly.
Phrase search
Phrase search has a hidden problem -- false negatives. Good hits are left out of the list, because of some minor difference between the phrase that you specified and the phrase as it occurs in the text. For example, if you search for the phrase "low carb chilled avocado soup", do you want to miss records with "low carb chilled cucumber avocado soup"? Phrase search looks good, but for the serious user it is not really as good as it first appears.
There is a way around the false negatives that are inherent in search for exact phrases. That is to search for combinations of the words that make up the phrase. That strategy, unfortunately, leads into a much greater problem.
Search for a combination of words and/or phrases
Search for a combination of words and/or phrases lands the user in a disaster area. The problem is false positives. Try any of the following combinations on words on any of the major search engines: **
One search engine reported 671,000 hits for one of these combinations, and one of the words was missing from the very first record. The lack of a key word gutted the meaning of the intended search. Recall that the first record is supposed to be the most relevant of all. One shudders at what might be in the next 670,999 records.
- teen music summer camp ohio
- guitar teacher chicago
- pizza steubenville ohio fast delivery
So forget the hundreds of thousands of hits. How many of the first one hundred records are on topic? How many of the first ten?
Each false positive yields a hit that has little meaning or value for the searcher. This is a recipe for irrelevance. Does it bother you that such irrelevant hits appear near the top in a search engine's relevance ranking? A search engine that delivers irrelevant hits is in effect a garbage delivery system. "Here's all this stuff. You pick through it." No thanks. Our time is too valuable for that.
Search target size
The unit of search used by the major contemporary Internet engines is without exception a web page.
Web pages vary greatly in size. However, we can observe readily the number of kilobytes or paragraphs in each of the top ten hits of a search. With a bit more effort, we could get determine word counts. One simple generalization arises from experimenting along these lines: The web pages selected as "most relevant" by the standard search engines typically have many words that are totally unrelated to other words on the same page. If the user's desired words are paragraphs apart in the records that are selected, those results are meaningless to the user.
It is a simple principle of language that words close together convey meaning; words far apart have little meaning.
The use of web pages as search units assures greater incidence of meaningless hits. Meaningless hits waste the user's time. People would be much better served if search engines were to use the natural unit in language, a paragraph (or a single screenful of short paragraphs), as the unit of search.
Quality overview
The three experiments do not touch on issues in advanced search. Try out advanced search on your customary Internet search engine, and ask the same sort of questions.
What we do know is that any reasonable user can run the three experiments and establish fairly conclusively that one may expect excellence in single word search, a hidden problem in phrase search, and performance well short of mediocrity in anything more complex. Moreover, whatever the type of search, one is left digging around on the selected pages trying to locate the search terms, which are often too far apart to be meaningful.
To put it politely, the quality of contemporary search leaves room for improvement.
** Footnote: For those who have time to make performance comparisons, a short list of major Internet search engines might include AltaVista, AOL, Ask.com, DogPile, Google, Lycos, MSN, and Yahoo. "Many of the product names referred to herein are trademarks or registered trademarks of their respective owners." While striving to make lawyers happy, let us also note that the operative term is "list ... might include"; if your client's search engine is not mentioned, we gladly concede that it might still be a "major search engine".
Previous | Research Quality | Next
|
|||||
| words close together.com | The "Research Quality" Search Engine by Marpex, Inc. | ||||