Current selection:

Home    

Contact us  

Advanced Search     Switch Selection     Classical books index     Preferences

Glossary

The following entries are either other people's jargon or our scintillating additions to the lexicon of English.

Boolean

George Boole was a nineteenth century British mathematician. His name became attached to Boolean logic. You may not know it, but you use Boolean logic every time you conduct a search for more than one term with a computer. Most frequently we want this word AND that word AND some other word.
  • A Boolean AND reduces the number of items found, since the hit must satisfy several terms and not just one.
  • A Boolean OR increases the count of hits. Example: Searching Shakespeare's plays for single words for weapons, "sword" yields 300 hits, "knife" 46 hits, "spear" 7 hits, "lance" 13 hits". Searching for any of these words (sword OR knife OR spear OR lance) results in a count of 359, higher than any single count.
  • A Boolean AND NOT excludes some possibilities: Show me paragraphs containing bat and run AND NOT flying AND NOT cave.

Here is an example of the difference between AND and OR in searches. Consider three houses. House M is in Minneapolis, House S is in St. Paul. Both these houses are some distance from the border between the Twin Cities. A third house, B, is smack on the border, with its living room in St. Paul and the kitchen in Minneapolis.

Boolean OR: How many of these houses are in Minneapolis OR St. Paul? One way to decide is to consider one house at a time. Is House M in either of the cities (Minneapolis or St. Paul). Sure it is. What about House S? Is House S in either of the cities? Again, yes. Under Boolean OR logic, only ONE condition has to be fulfilled in order to say Yes to the question. What about House B, the one on the border? It meets the condition with no problem. So how many of these houses are in Minneapolis OR St. Paul? The answer is three. Houses M, S, and B are all in Minneapolis OR St. Paul.

Boolean AND: How many of these houses are in Minneapolis AND St. Paul? Both the conditions have to be fulfilled. House S is only in St. Paul; it is not in Minneapolis. It fails the test. Similarly, House M is only in Minneapolis; it is not in St. Paul. It will not be counted either. Only House B on the border is in both Minneapolis AND St. Paul. How many of these houses are in Minneapolis AND St. Paul? One.

In most searches for words, people intend AND, even though they probably never heard of George Boole. If you were to search for "low carb" "cherry" "cheesecake" "recipe" (four terms, one of them a phrase), do you want to see a result that is for the sweetest, richest, diet-busting pineapple cheesecake you ever dreamed of? The terms "low carb" and "cherry" are important to you. Why should some search engine designer get away with blowing away your specifications and substitute something else?

But they do it every day.

Bytes to Yottabytes

A byte is eight bits of data, where a bit can have only two values: zero or one. There are 256 patterns among 8 bits. A byte is used for example for each letter in the alphabet and for each common punctuation character. Are you ready for this? ...
1024 bytes make up one kilobyte.
1024 kilobytes make up one megabyte.
1024 megabytes make up one gigabyte.
1024 gigabytes make up one terabyte.
1024 terabytes make up one petabyte.
1024 petabytes make up one exabyte.
1024 exabytes make up one zettabyte.
1024 zettabytes make up one yottabyte
Now if someone asks you how many bytes there are in an exabyte, you can tell them right away that the answer is 1,152,921,504,606,846,976. There, wasn't that easy?

Data

A datum is a single fact or statistic or observation. Data is a collection of these single facts, statistics, or observations. Information arises through applying selected rules to data, for example: Isolate all data for DVD movies published in 2007 that are comedies with sale prices between fifteen and twenty-five dollars. Information informs, data does not. As information is culled from data, consistent patterns may emerge that lead to knowledge.

Download

A download is a collection of electronic bits and bytes made available on the Internet, so that people may transfer ("download") the collection onto their own computers. The collection may be electronic files of any type ... text, movies, sound, software, whatever.

Enumerators

Enumerators number things. Enumerators are useful for counting. They have special use in analysis of text. They help to detect headings, lists, and footnote references. The Words Close Together preparation tools scan for sequences starting one, two, three, or 1, 2, 3, or I, II, III, or first, second, third, etc. These enumerators are often at the start of a line, but they may be embedded in headings such as "PART XV The Chesterton - Shaw Debates".

False negative

A false negative is a failure of a search engine to follow the wishes of the person who is searching. A record that contains what is desired is disregarded by the search engine, and omitted from the list of hits. False negatives are difficult to detect because we are normally not aware of what is NOT present in a list. The way to get around the false negatives associated with search for a phrase is to search for all the words in the phrase. Unfortunately, in most search engines, the result is a flood of false positives.

False positive

A false positive is a failure of a search engine to follow the wishes of the person who is searching. A record which does NOT contain what is desired is represented by the search engine as a valid hit, and is included in the list of hits. False positives are encountered very frequently when using the standard Internet search engines, especially when one searches for combinations of words and/or phrases. False positives place a time burden on the searcher; he or she is put in the position of having to open and evaluate hits in order to evaluate their meaningfulness and relevance.

HTML

HTML stands for HyperText Markup Language. If in an Internet browser you click on View -- Source, you will see lots of HTML. There is text plus tags embedded in the text. A tag consists of a less than symbol <, some standardized words or abbreviations (for example, P for paragraph), then a greater than symbol >. If the first character within the tag is a forward slash, that means the end; </P> signals the end of a paragraph. If it doesn't make sense to you, don't worry, because it does make sense to an Internet browser.

Index (plural: indexes or indices)

An index consists of words, a count of occurrences of that word, and a list of places where that word occurs. The backs of reference works and text books normally have subject indexes which are exactly parallel in purpose ... to help you find what you are looking for. The magic of a computerized index is the program's abilities to compare lists of multiple words quickly, identify which records contain the desired combination of terms, and calculate which hits are most likely to be relevant to the person who is searching.

Meaning score

See a discussion of meaning scores and their role in relevance ranking.

Relevance ranking

See a discussion of relevance ranking as it relates to meaningful search results.

String search

String search is a computerized method that reads through an entire body of text in order to find particular desired terms. This works for tiny sets of data. For most normal purposes, string search is totally obsolete because it is painfully slow and it does not handle combinations of words very well.

 
Meaningful  precision  search  for text data  is available now.  Learn more.
 
words close together.com The "Research Quality" Search Engine by Marpex, Inc.