Current selection:

Home    

Contact us  

Advanced Search     Switch Selection     Classical books index     Preferences

The need for "Research Quality" search engines

Opportunities for improvement -- Serving researchers

This page attempts to get at the needs of researchers and at the usefulness of search tools to researchers. If you conduct research, you are invited to offer your opinions and comments at the bottom of this page.

Scientific method

The scientific method provides the foundation for credibility for research projects. It calls for objectivity, measurability, reliability, verifiability, and related criteria. These requirements apply across the board from the physical sciences to the behavioral sciences, from pure theoretical basic work to mundane commercial applied research.

Certain patterns of activity emerge from the scientific method. These include, in no consistent order, repeated iterations of:

Our interest on this page is in what is required of tools used in research. Our "hypothesis", so to speak, is that researchers are better served if the tools have the qualities in the headings below.

Accuracy

By carrying out Experiment 3 any reasonable person can find examples quickly of a problem that pervades search tools -- the prevalence of false positives when the user requests combinations of multiple words and/or phrases. In a word, the results are too often inaccurate. Terms are either too far apart in too large units of search to be meaningful, or some of the requested terms are missing altogether.

The researcher needs higher levels of accuracy. They would be helped by a search system that filters out many or most of the "garbage" hits.

Transparency

The standard Internet search systems all have relevance ranking systems that are proprietary and secret. An industry has grown up to assist web site owners to get their content to show up earlier in search result lists.

Wouldn't it be better to have an open system, and relevance criteria that anyone can observe and verify? We recommend that the closeness of requested terms can be measured by anyone, and that, the closer terms are together, the higher the probability that a result will be meaningful to the person conducting the search. No secrets. Just count the number of words that intervene between the words you want. Rank the records found by increasing counts of intervening words. Where necessary, use frequency as a tie breaker when terms are equally far apart. Period.

Consistency and reliability

The firms providing Internet search make a very public point that they are continuously tweaking their systems. This is a problem. If they keep changing the rules behind the scenes, then how can one research team reliably replicate the results found by an earlier team. The tools have changed; the rules have changed. Bad scene for researchers!

Objectivity, freedom from bias

Good news! The search engines are more objective than some of them were shown to be in 2001. Some providers (not all!) were accepting payments that were demonstrably linked to how far up a web sites' records would appear in a list of search results.

Are the major search engines totally free of bias today? Are they utterly objective in their relevance rankings? That might be a lovely research project for someone. Unfortunately, the relevance ranking algorithms are secret. One observable verifiable fact is that there are billions of dollars changing hands each year in connection with Internet search. Is it fair to observe that throughout human history, the combination of secrecy and large sums of money has tended to put a strain on human nature? Might there not be at least a risk of bias or loss of objectivity?

Control

Who has the tools? Who decides what the tools are to be used on? Each in the current crop of Internet search engines is a centralized system. These firms have the data gathering software, the indexing software, the choice of what to process and what to ignore, the say in what shall be deemed more relevant, and so forth. They have the power. This stands in stark contrast to the needs of researchers. Be they lifetime professionals, students, or simply people wanting to learn something about their world, those who carry on any sort of research prefer to maximize their control over both tools and data.

Automation

Data gathering is perhaps the most painstaking and laborious part of research. The fun in research is in discovery, in the recognition of facts or patterns previously unknown. Can the donkey work (data gathering) be automated so that the emphasis moves away from searching to finding?

Collaboration and sharing

It is only as data and results are shared that we get either the affirmation or the back-to-the-drawing-board message that we need. There is an essential social component to research. Can the tools for search and research be made more collaborative?

Pattern detection

Search can contribute greatly to pattern detection within large quantities of text. The holy grail of concept search is still beyond our grasp. (For concept search, you specify a paragraph or a body of text and tell the computer to "get me the stuff most closely related to this".) One stepping stone toward concept search is to make it possible to accumulate and refine indexes on the researchers' computers. Automated pattern recognition techniques are quite rightly unwelcome when used on the servers of Internet search engines; they use up computer cycles unmercifully. But a researcher's work would be advanced nicely with the same techniques applied to indexes resident on the researcher's personal computer.

Your feedback

If you conduct research, your comments on this page would be welcomed.

Please rate the importance to you of each of the following factors with respect to tools that you use in research.

Accuracy:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Transparency:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Consistency and reliability:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Objectivity, freedom from bias:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Control:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Automation:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Collaboration and sharing:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

Pattern detection:
           Very important       Somewhat important       Relatively unimportant       Irrelevant

For the complete set of tools described in the tool kit, I would be willing to pay (US dollars per year) up to:

Describe in broadest terms the nature of the research that you conduct. Please share nothing that is proprietary. You need not identify either yourself or your organization.

Your comments:

Your name [OPTIONAL]:

Your email address [OPTIONAL - Include only if you want a response. Will not be shared with anyone]:

Previous | Research Quality | Next


 
Meaningful  precision  search  for text data  is available now.  Learn more.
 
words close together.com The "Research Quality" Search Engine by Marpex, Inc.