Active Projects
 Custom Search Solutions
 Web Search
 News Headlines
 Spellcheck Server
 Related Keywords
 Pay-per-Click Backfill

XML Feed Access
Do you have a project where our web search index might be useful but don't have a large budget?

We might be able to help. Tell us about it!

 
Web Search Project

Generally, we spend our energies developing custom search applications that are very topic specific. However, from time to time we've also been experimenting with some very broad crawls of the internet.

Although this has certainly presented some unique challenges our small footprint web search index has been showing some promise.

About the Web Index Project

  • Research Index Size - 25 million pages and growing.
  • High Speed multi-threaded crawlers, 250+ pages/second per server.
  • Index 10 million pages per day on a single server.
  • Real-Time tunable ranking.
  • Ability to distribute load across multiple servers.
    Keywords:  

    or try one of these sample searches:
  • history ancient egypt
  • learn french
  • prevent identity theft
  • homework help
  • hypertension
  • cancer research
  • A Work In Progress
    In many respects, we've come a long ways in our little research project. However, with a global web index there are many challenges. So, please bear with us if some features seem a little less than perfect as we're constantly making little tweaks. The text below discusses a few highlights of the project. Certainly, if you have any comments don't hesitate to let us know.

    Default Results Page
    Our online public web index is somewhat smaller that our internal version but we're hoping to update it each week.

    It allows users to search either the directory and/or web listings separately. However, the default result page will show both directory and web results.

    The web directory is actually an extremely useful tool containing some very concise and well written web site descriptions.

    We think it makes for a great companion to the regular web results.

    News Headlines
    Along with the core search results, we also hit our news headline index and try to display up to 3 relevant headlines. The news headline project runs 24x7 crawling headlines from the top news sites on the internet.

    Related Keywords
    We've tried to integrate most of our internal tools into the web search project. Using the Related Keyword Server (RKS), you'll also see related keywords displayed on the results page.

    DMOZ Integration
    As with many other search engines, we've implemented a local copy of the DMOZ web directory. We maintain both a local directory server that powers the browsing plus a local searchable index of the category listings. Both were built from dumps available at the DMOZ web site.

    Real-Time Ranking Adjustments
    Ranking and relevance of results are an important part of any search project. The query engine kernel in our web index considers a number of ranking factors in its decision to assign a final ranking value to a particular link.

    Many of the ranking values are fixed, but there are about 15 dynamically tunable settings that can be adjusted in real time via a simple web interface. Tweaks to the ranking algorithm are immediately reflected in the search results.

    Clients with custom search solutions have access to a similar web tool to adjust their results.

    For example, clients who prefer to rank listings with matches in the title text a little higher would simply slide the title weight bar over to the right then click update.

    In addition to real time ranking adjustments, we also have screens that allow us to view result pages in statistical mode. Choosing to view the various ranking metrics in real-time is very important in fine tuning the ranking algorithms.

    Commercial Applications
    Although we focus more on developing vertical search applications that are topic specific, we're certainly open to discussing commercial applications regarding our general web search index.

    Please feel free to contact us if you have an application that could benefit from our web index.







  •    About Solara  Contact Us  Privacy  Terms Copyright 2006. Solara.com All Rights Reserved.