Meaningful Search Results
The popular search engines are generally good. But they could be better. When people search large quantities of text using the Words Close Together engine, benefit # 1 is more meaningful results.
Search full text, not just key words.
The expression full text means that every word in a body of text is indexed. Full text search came into its own in the early 1980s, yet there are still some search engines that rely on an alternative called key word search in which editors select words and phrases to describe the content of a body of text. The problem in key word search: The person searching has to be able to think like the editor who chose the key words. That doesn't happen much; people do not necessarily share the mind set of an editor. The problem will get worse if key word editing tasks are moved offshore. If the editor is from another culture, the gap between the editor's choice of words and the searcher's choice of words widens even further.
Fortunately, the sheer quantity of new text data created each day makes key word search a decreasingly viable option. Index the full text. Give the searcher control, and the searcher is more likely to discover the meaning desired.
Good hits have the desired words close together.
We call the languages that we speak natural languages. Certainly in English, as in most natural languages, words have meaning according to how they are situated in relationship to one another. Words close together convey meaning; words far apart have little meaning. Therefore the meaningfulness of a search result relates directly to how close together the searcher's selected words occur.
Try this experiment on any of the major Internet search engines. Input these five words, not in quotation marks: teen summer music camp Ohio. I tried this on Search Engine "G" (actual name withheld to protect the guilty). It reported 1,410,000 hits. I downloaded the first one hundred hits and examined them. In the very first hit, the five words were all present as part of a extended web page. But the words were each paragraphs away from each other. The closest fit for the five words was 2295 words apart. Guess what Search Engine G's best hit in 1,410,000 was NOT about? You are right; it was not about teen summer music camps in Ohio. And so it went. Most of the first 100 hits were off topic. The selected words in most cases were much too far apart. (If it's any comfort, Search Engine G's 73rd hit was right on target.)
Headings have a role in meaning.
Recognition of headings contributes significantly to the quality of full text search.
Something of value is lost when headings are lumped in as an ordinary part of the text. A heading tells something about the content that follows. In a Words Close Together list of search results, you will notice that each hit is preceded by a series of headings (for example, book name, part, chapter, section). Words in the headings are considered close to words in the text that follows. This special treatment of headings enhances the search for meaning.
Footnotes are an aid to finding meaning.
Words Close Together preparation tools scan for enumerators in an attempt to automate detection of headings, lists, and footnotes. Footnotes are given special treatment because they add words that may clarify meaning of the footnoted paragraph. If both footnote references and the footnotes themselves are clearly and consistently numbered with no gaps or duplications in the numbering, then the preparation software moves each footnote into position below the paragraph containing reference to the footnote. Footnotes are identified, indented, and put in a smaller font. By placing the footnote with the paragraph, search is strengthened. This repositioning is also very convenient for the person searching; in order to see the footnote, just look further down the page.
Allow search using word roots.
The search for meaning is helped by giving the searcher control over whether full or partial words are expected. In the Shakespeare example below, a wild card asterisk is used to take the place of various sequences of letters that may follow the search term shad*. Sure enough, a range of variants of shad (shade, shadow, etc.) appear in the list of hits. Wild cards -- asterisk for any number of characters, question mark for a single character -- give the flexibility needed.
Use natural relevance ranking for natural language.
When search software looks for the terms you want within a collection of text, it reports back a series of so-called hits. Relevance ranking has to do with the order in which the hits are presented. Most people would prefer that the most useful (or relevant) hits come at the top of the list. Well, what makes one hit more relevant than another? That's a subject of debate and intense competition:
Almost all the relevance ranking methods are proprietary secrets; you can never be sure why one hit appears before another. Words Close Together takes a very different, altogether public approach -- closeness of fit or, if you prefer, natural relevance ranking. In natural languages, the closer the words are together, the more likely the hit has meaning to the person searching. Simply count the number of words that intervene between the words you want in any set of hits. The fewer intervening words (the closer the desired words), the higher a result appears in the list.
- Word frequency: How many times the desired words show in the hit -- a useful tie breaker, but not a sure factor;
- Some say popularity of a hit is the best measure. A lot of money has been made with this one. But it invites attempts at gaming, and (let's face it) popularity and meaning are not the same thing.
- Others prefer to search on key words selected by editors.
- In 2002 there was a scandal over ranking based on how much web sites paid to the search provider to get their content near the top.
If natural relevance ranking is so good, you may ask why all the search engines are not using it. Of all the current search engines, AltaVista is virtually alone in offering a "NEAR" option, a very limited capability of determining proximity. Google appears to have walked away from its attempts at proximity in February 2006. There is a technical reason this very desirable feature is not commonly offered -- most search engines would have to call up every record found in order to check the closeness of fit of the desired words. Computers are fast. But opening up a million records and running checks on each one would consume far too much time.
Words Close Together patent-pending technology has solved this problem. Download the program and some sample databases. Check it out.
Add all the above: Find the meaning you are after.
People search text to satisfy a need. That need is NOT searching. They need to FIND. The objective of a search is some form of learning. People want to recognize patterns; they want to understand. They want meaning.
|
|||||
| words close together.com | The "Research Quality" Search Engine by Marpex, Inc. | ||||