Meaning scores, frequency scores
Suppose we carry out a search in a patent. We want to see hits containing the three terms "and", "the", "computer*". The asterisk at the end of the last word is a wild card that selects for search all words that begin with "computer".
Here is one hit that is presented by the Words Close Together program:
United States Patent 6,304,601 Davison October 16, 2001
Description -- Data compression apparatus
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTThe output of switch 4 is connected to a circuit 40 controlled by CPU 6 which discriminates between text data and pictorial data and which outputs text data to the data compression/decompression apparatus 5 and pictorial data to compression/decompression apparatus 41 for comparing the pictorial data by an appropriate compression algorithm, the output from apparatus 41 also being stored in hard disc storage area 7. CPU 6 controls the reconstitution of both sets of compressed data and also enables one or the other or both sets to be displayed, printed as hard copy or transmitted via either an output aerial 42 or the ISDN output terminal 10.
It will be appreciated that the apparatus disclosed in FIG. 1 can be a general purpose computer programmed to carry out the compression and decompression algorithms which have been described. The program for such a computer or processor can be stored in various types of transportable computer-readable media such as floppy discs, optical discs, tape streamers and the like. FIG. 1 shows a floppy disc 5' as one example of a computer readable medium.
There are several hits here. The hit is shown in one paragraph, and the scoring method follows in the second paragraph of the pair:
[1] and also enables one or the other or both sets to be displayed, printed as hard copy or transmitted via either an output aerial 42 or the ISDN output terminal 10. It will be appreciated that the apparatus disclosed in FIG. 1 can be a general purpose computer
Hit [1] has 43 intervening words, including two extra copies of the word "the". Its meaning score is 100 - 43 = 57.
[2] computer programmed to carry out the compression and
Hit [2] has 5 intervening words. Meaning score = 100 - 5 = 95.
[3] and decompression algorithms which have been described. The program for such a computer
Hit [3] has 11 intervening words. Meaning score = 89.
[4] computer-readable media such as floppy discs, optical discs, tape streamers and the
Hit [4] has 9 intervening words, for a meaning score of 91.
[5] and the like. FIG. 1 shows a floppy disc 5' as one example of a computer
Hit [5] has 13 intervening words, for a meaning score of 87.
The meaning score for this passage is 95. We pick the best fit within the domain, " computer programmed to carry out the compression and " and use the score for that best fit.
The summary presentation shows the score. Here is what it looks like:
1. United States Patent 6,304,601 Davison October 16 > Description > DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
... be appreciated that the apparatus disclosed in FIG. 1 can be a general purpose computer programmed to carry out the compression and decompression algorithms which have been described. The program for such a computer or processor can be ... [Meaning Score 95, Frequency 22]If you review the full hit near the top of this page, you will see that 22 words match the search criterion and are highlighted.
It all comes down to counting words. There is no mystery. Understanding this, anyone may look at any search result, and see exactly why it is placed where it is in the list of relevance hits. Highest meaning scores go to the top of the list. Meaning scores descend progressively. Frequency score (number of highlighted terms) is used to break ties between hits with the same meaning score.
Previous | Research101 | Next
|
|||||
| words close together.com | The "Research Quality" Search Engine by Marpex, Inc. | ||||