|
XML Feed Access
Do you have a project where our web search index might
be useful but don't have a large budget?
We might be able to help. Tell us about it!
|
|
|
|
| Web Search Project |

|
Generally, we spend our energies developing custom search applications that are
very topic specific. However, from time to time we've also been experimenting with some
very broad crawls of the internet.
Although this has certainly presented some unique challenges our small footprint web search
index has been showing some promise.
About the Web Index Project
Research Index Size - 25 million pages and growing.
High Speed multi-threaded crawlers, 250+ pages/second per server.
Index 10 million pages per day on a single server.
Real-Time tunable ranking.
Ability to distribute load across multiple servers.
A Work In Progress
In many respects, we've come a long ways in our little research project. However, with
a global web index there are many challenges. So, please bear with us if some features
seem a little less than perfect as we're constantly making little tweaks. The text
below discusses a few highlights of the project. Certainly, if you have any comments
don't hesitate to let us know.
 |
Default Results Page
Our online public web index is somewhat smaller that our internal version but
we're hoping to update it each week.
It allows users to search either the directory and/or web listings separately. However, the
default result page will show both directory and web results.
The web directory is actually an extremely useful tool containing some very concise
and well written web site descriptions. We think it makes for a great companion to
the regular web results.
|
News Headlines
Along with the core search results, we also hit our news headline index and try to display up to 3 relevant headlines.
The news headline project runs 24x7 crawling headlines from the top news sites on the internet.
|
 |
Related Keywords
We've tried to integrate most of our internal tools into the web search project.
Using the Related Keyword Server (RKS), you'll also see related keywords displayed on the
results page.
|
 |
 |
DMOZ Integration
As with many other search engines, we've implemented a local copy of the DMOZ web directory.
We maintain both a local directory server that powers the browsing plus a local searchable index of the category listings.
Both were built from dumps available at the DMOZ web site.
|
 |
Real-Time Ranking Adjustments
Ranking and relevance of results are an important part of any search project. The query engine kernel
in our web index considers a number of ranking factors in its decision to assign a final ranking value
to a particular link.
Many of the ranking values are fixed, but there are about 15 dynamically tunable settings that
can be adjusted in real time via a simple web interface. Tweaks to the ranking algorithm are immediately
reflected in the search results.
|
Clients with custom search solutions have access to a similar web tool
to adjust their results.
For example, clients who prefer to rank listings with matches in
the title text a little higher would simply slide the title weight
bar over to the right then click update.
|
 |
|
In addition to real time ranking adjustments, we also have screens that allow us to view result pages in
statistical mode. Choosing to view the various ranking metrics in real-time is very important in fine tuning the ranking algorithms.
Commercial Applications
Although we focus more on developing vertical search applications that are topic specific, we're
certainly open to discussing commercial applications regarding our general web search index.
Please feel free to contact us if you have an application
that could benefit from our web index.
|
|
|
|