SourceForge Logo

OASIS Search Engine

We are still in the process of setting thing up on SourceForge, so apologies for the ascetic content and look.

Why another Web search engine?

OASIS is a distributed search engine. Computational load is spread among participating nodes both at indexing and at query processing time. Distributed search has several advantages:

Can centralised search engines hack it?

Unlikely. There amount of Web sites is growing, the number and size of stored documents grows even faster, and site contents get updated more and more often. Centralised systems can not grow that fast and cover ever decreasing segment of the Net.

Outline of the OASIS architecture

There are two major roles, a collection and a broker. Collections store parts of document index. Brokers receive queries from the users, choose the most appropriate collections, forward the queries and merge the collections' responses into a final result set (see the diagram below).

The broker selects the collections for query propagation relying on the collection descriptions stored in the LDAP directory service. It is the responsibility of collections to create and update their descriptions in the specified format.

Topical collections

Distributed search is efficient only when each query is propagated to a relatively small number of nodes. When the total number of collections is large, it becomes possible only when collections are different. When the collections consist of documents belonging to a reasonably well defined topic area, the broker can make a reasonable query propagation decision. In fact, such decisions are based on term frequency statistics in collections and queries. Collections covering distinct well-defined topics have distinctly different term usage statistics, thus allowing efficient query routing.

Indexing tool

Topical document indexes can come from multiple sources. If your site already has one or several focused topic ares, it may be sufficient to just index each area and advertise the corresponding number of collections in the directory service. The alternative way is to use a Web robot for discovering the relevant pages in the Net.

A topic-oriented Web Crawler is a part of our project. It takes 50-200 documents relevant to the collection topic as an example, and searches the Net for documents similar to the sample. Periodic manual inspection of the documents returned by the Crawler is necessary, but still it takes far less time than manual search.

Who can be interested in installation of the OASIS software?

Contact:

E-mail us at oasis-team@oasis-europe.org