System: Searchable Directory of User Agents: Indexing Tools

Tweet 0 Shares 0 Tweets 0 Comments

The following is a directory of user agents, including their source and general purpose as far as we can determine. Most entries link to an "official" site containing more detailed information. You can also paste a UA from your logs into the form below, hit [Go!] and see a list the relevant agents.

We currently have 946 distinct user agents in our database representing everything from search engines to software components and spambots. These have been collected from our log files over a number of years and researched manually.

Search for User Agent

To use this form just copy and paste an entire User-Agent string from your server log file into the input box and then submit the form. The search is case-sensitive so "nokia" will not match "Nokia".

Most user agent strings now contain a number of separate components so the search will return a list of everything that has a match in the database.

Indexing Tools

This is software that enables local or remote indexing of web pages and other content for the purposes of setting up a search engine.

1) AlkalineBOT - alkaline.vestris.com/docs/alkaline-faq/

2) ASPseek - www.aspseek.org/

ASPseek is an Internet search engine software developed by SWsoft and licensed as free software under GNU GPL.

3) Beholder - www.mesadynamics.com/beholder.htm

Beholder allows you to quickly search for images on the web, also scanning local image folders and iPhoto libraries.

4) COMBINE - www.lub.lu.se/combine/

Combine is an open system for harvesting and threshing (indexing) Internet resources.

5) CrawlConvera - www.convera.com/

Convera is a provider of search and categorization solutions.

6) DataparkSearch - www.dataparksearch.org/

DataparkSearch Engine is a web-based search engine released under the GPL and designed to organize search within a website, group of websites, intranet or local system.

7) DepSpid - about.depspid.net/

The DepSpid spider visits domains, analyses links and finally calculates scores about the link dependencies between individual domains.

8) dtSearchSpider - www.dtsearch.com/spider.html

9) egothor - www.egothor.org/

EGOTHOR is an Open Source, high-performance, full-featured text search engine written entirely in Java.

10) Enterprise_Search - www.innerprise.net/es-spider.asp

11) facebookexternalhit - www.facebook.com/externalhit_uatext.php

The Facebook system retrieves certain images or details only after a user provides us with a link. You may have found this page because a Facebook user sent a link from your website to other Facebook users.

12) FDSE - www.xav.com/scripts/search/

FDSE is an easy-to-install search engine for local and remote sites. It returns fast, accurate results from a template-driven architecture.

13) findlinks - wortschatz.uni-leipzig.de/nextlinks/findlinks.html

The objective of FindLinks is to provide NextLinks with data.

14) Ful/Text - www.hummingbird.com/products/searchserver/

Hummingbird SearchServer

15) GammaSpider - www.gammasite.com/

GammaSite develops and markets automatic categorization and tagging software

16) grub-client - grub.org/

Leveraging the power of distributed computing, Grub allows everyone with an Internet connection to participate in the last frontier of discovery. By downloading the unique screensaver, you can donate your computer's unused bandwidth to probing the hidden depths of the Web.

17) gsa-crawler - www.google.com/enterprise/gsa/

Google Search Appliance

18) holmes - www.ucw.cz/holmes/

Sherlock Holmes is a universal search engine - a system for gathering and indexing of textual data (text files, web pages, ...), both locally and over the network.

19) htdig - www.htdig.org/

A complete world wide web indexing and searching system for a small domain or intranet. Source code (GPL).

20) InsumaScout - www.insuma.de/insuma/en/SEscout.html

InsumaScout searches data situated in open data sources.

21) IXE Crawler

22) JavaCrawler

The JavaCrawler, a prototype next generation MetaCrawler written in Java, supports most of the features already present in the MetaCrawler

23) k2spider - www.verity.com/products/ultraseek/fab.html

24) larbin - larbin.sourceforge.net/

Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine.

25) Mnogosearch - mnogosearch.org/

mnoGoSearch is a full-featured web search engine software for intranet and internet servers. mnoGoSearch for UNIX is a free software covered by the GNU General Public License.

26) mobileGate-Spider - www.mobilegate.at/steiermarksuche.php

Unsere Suchmaschinen-Technologie garantiert präzise und themenrelevante Resultate.

27) Mozilla/4.7 (Windows; I; Win95) - www.panopticsearch.com/

The Panoptic system is based on research by the Enterprise Search Group in CSIRO and the ANU in Canberra, Australia

28) MS Search - www.microsoft.com/sharepoint/

By default, the string for SharePoint Portal Server is:

Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

29) NextopiaBOT - www.nextopia.com/

30) NLCrawler - www.nlsearch.com/

Northern Light provides search and content integration technology and solutions for enterprises and individuals.

31) nuSearch Spider - www.nurelm.com/

32) Nutch - www.nutch.org/docs/en/bot.html

When we crawl to populate our index, we advertise the "User-agent" string "NutchOrg". If you see the agent "Nutch" or "NutchCVS", that's probably a developer testing a new version of our robot, or someone running their own instance

33) Oracle Ultra Search - otn.oracle.com/products/ultrasearch/

Ultra Search can be used to search across Collaboration Suite Components, corporate Web servers, databases, mail servers, fileservers and Oracle10g Portal instances.

34) PageFetcher-Google-CoOp; - www.google.com/coop/

Google Co-op is a platform that enables you to customize the web search experience for users of both Google and your own website.

35) perform_crawl - ivia.ucr.edu/useragents.shtml

The Nalanda iVia Focused Crawler (NIFC) is a focused Web crawler.

36) Project XP5 - marty.anstey.ca/projects/robots/

37) RDSIndexer - www.dytech.com.au/projects/RDS.asp

Information Resource Management Tool/Web Portal

38) Reaper - marty.anstey.ca/projects/robots/reaper.html

39) SemioTagger - www.entrieva.com/entrieva/products/semiotagger.asp?Hdr=semiotagger

Entrieva's SemioTagger is a categorization and indexing engine

40) SiteScanGa - sitescanga.com/

SiteScan is designed to help you configure Google Analytics.

41) SwishSpider - swish-e.org/

SWISH-E is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other files

42) TeraText AGLS Harvester - www.teratext.com/

A text database system and search engine built for handling large text collections

43) TeraXML - www.doclinx.com/products/ftxml.html

44) T-H-U-N-D-E-R-S-T-O-N-E - www.thunderstone.com/texis/site/pages/webinator.html

45) Ultraseek - www.verity.com/products/ultraseek/

46) URL_Spider_Pro - www.innerprise.net/usp-bi.asp

With URL Spider Pro, you can create a search engine for any topic, no matter how specific.

47) vspider - www.macromedia.com/cfusion/search/?term=vspider

ColdFusion MX includes several Verity utilities to diagnose and manage your collections. These tools include the mkvdk, rcvdk, rck2, and vspider utilities...

48) WebRACE - www.cs.ucy.ac.cy/Projects/eRACE/webrace.html

WebRACE is a prototype HTTP Retrieval, Annotation and Caching Engine developed in Java

49) WSB - websearchbench.cs.uni-dortmund.de/websearch/features.html

WebSearchBench consists of the two software components Web Crawler and Search Engine (Repository, Indexer and search software)

50) Xbot - cdrnet.ch/projects/xbot/

The xbot software is a modular bot environment based on the .net framework for autonomous neuronal network, script or map driven mobile omniwheel robots using the SV203 controller.

For more information on the user agents listed you can click on the associated link. If you think any of the information here is incorrect or misleading please let us know using the Feedback link below.

Please be aware that we do not add user agents to the database on request, but rather wait to see them in our log files.

Browse User Agents by Category

Browser Extensions (42): Browser extensions are programs that change or enhnace your web browser. Some of them also collect data by sending information on your browsing habits back to a central server.
Content Management (13)
Data Collection - Commercial (47): These are sites that collect information for commercial benefit. As far as we are aware no useful information or reports are provided to the public.
Data Collection - Research (29): These agents are conducting research on the WWW. They may also offer commercial services.
Devices (23): Mobile phones and other gadgets with browser technology.
Download Managers (39): Programs that enable users to download or extract information from a website or web server.
Indexing Tools (50): This is software that enables local or remote indexing of web pages and other content for the purposes of setting up a search engine.
Link Checking Utilities (41): This is software that conducts remote or local link checking.
Media Players (5): Applications for playing music, video and other media over the Internet.
Other Resources (12): Links to online resources relating to robots and spiders.
Proxies (7): If several clients request the same content, the proxy can deliver that content from its cache, rather than requesting it from the origin server each time.
RSS/Atom Aggregators (43): These are browser extensions or search spiders that focus on indexing or aggregating RSS and Atom feeds.
Search Engine Spiders (220): These agents conduct Internet-wide indexing for various search engines.
Server Platforms (6)
Server Software (31)
Site Monitoring Services (15)
Software Components (58): These are code libraries or application development packages that can be used to build Internet-related applications. How they are used depends on the developer.
Spambots? (45): These are programs that are used predominately to harvest email addresses, find open guestbooks to post to, etc. They may also have legitimate uses.
Unclassified (174): The following user agents have either not been identified or do not fit neatly into other categories. New agents appear every day that have limited lifespans. Most (but not all) legitimate user agents identify themselves with a URI or email address.
Validation Tools (10): These are programs and sites that can be used to validate various aspects of your site: HTML, CSS, META tags, etc.
Web Browsers (36)

< System

Post your comment or question