System: Searchable Directory of User Agents: Indexing Tools
The following is a directory of user agents, including their source and general
purpose as far as we can determine. Most entries link to an
"official" site containing more detailed information. You can
also paste a UA from your logs into the form below, hit [Go!] and see a
list the relevant agents.
We currently have 946 distinct user agents in our database representing everything from search engines to software components and spambots. These have been collected from our log files over a number of years and researched manually.
Search for User Agent
To use this form just copy and paste an entire User-Agent
string from your server log file into the input box and then submit the
form. The search is case-sensitive so "nokia" will not
Most user agent strings now contain a number of separate components
so the search will return a list of everything that has a match in the
This is software that enables local or remote indexing of web pages and other content for the purposes of setting up a search engine.
- 1) AlkalineBOT - alkaline.vestris.com/docs/alkaline-faq/
- 2) ASPseek - www.aspseek.org/
- ASPseek is an Internet search engine software developed by SWsoft and licensed as free software under GNU GPL.
- 3) Beholder - www.mesadynamics.com/beholder.htm
- Beholder allows you to quickly search for images on the web, also scanning local image folders and iPhoto libraries.
- 4) COMBINE - www.lub.lu.se/combine/
- Combine is an open system for harvesting and threshing (indexing) Internet resources.
- 5) CrawlConvera - www.convera.com/
- Convera is a provider of search and categorization solutions.
- 6) DataparkSearch - www.dataparksearch.org/
- DataparkSearch Engine is a web-based search engine released under the GPL and designed to organize search within a website, group of websites, intranet or local system.
- 7) DepSpid - about.depspid.net/
- The DepSpid spider visits domains, analyses links and finally calculates scores about the link dependencies between individual domains.
- 8) dtSearchSpider - www.dtsearch.com/spider.html
- 9) egothor - www.egothor.org/
- EGOTHOR is an Open Source, high-performance, full-featured text search engine written entirely in Java.
- 10) Enterprise_Search - www.innerprise.net/es-spider.asp
- 11) facebookexternalhit - www.facebook.com/externalhit_uatext.php
- The Facebook system retrieves certain images or details only after a user provides us with a link. You may have found this page because a Facebook user sent a link from your website to other Facebook users.
- 12) FDSE - www.xav.com/scripts/search/
- FDSE is an easy-to-install search engine for local and remote sites. It returns fast, accurate results from a template-driven architecture.
- 13) findlinks - wortschatz.uni-leipzig.de/nextlinks/findlinks.html
- The objective of FindLinks is to provide NextLinks with data.
- 14) Ful/Text - www.hummingbird.com/products/searchserver/
- Hummingbird SearchServer
- 15) GammaSpider - www.gammasite.com/
- GammaSite develops and markets automatic categorization and tagging software
- 16) grub-client - grub.org/
- Leveraging the power of distributed computing, Grub allows everyone with an Internet connection to participate in the last frontier of discovery. By downloading the unique screensaver, you can donate your computer's unused bandwidth to probing the hidden depths of the Web.
- 17) gsa-crawler - www.google.com/enterprise/gsa/
- Google Search Appliance
- 18) holmes - www.ucw.cz/holmes/
- Sherlock Holmes is a universal search engine - a system for gathering and indexing of textual data (text files, web pages, ...), both locally and over the network.
- 19) htdig - www.htdig.org/
- A complete world wide web indexing and searching system for a small domain or intranet. Source code (GPL).
- 20) InsumaScout - www.insuma.de/insuma/en/SEscout.html
- InsumaScout searches data situated in open data sources.
- 21) IXE Crawler
- 22) JavaCrawler
- The JavaCrawler, a prototype next generation MetaCrawler written in Java, supports most of the features already present in the MetaCrawler
- 23) k2spider - www.verity.com/products/ultraseek/fab.html
- 24) larbin - larbin.sourceforge.net/
- Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine.
- 25) Mnogosearch - mnogosearch.org/
- mnoGoSearch is a full-featured web search engine software for intranet and internet servers. mnoGoSearch for UNIX is a free software covered by the GNU General Public License.
- 26) mobileGate-Spider - www.mobilegate.at/steiermarksuche.php
- Unsere Suchmaschinen-Technologie garantiert präzise und themenrelevante Resultate.
- 27) Mozilla/4.7 (Windows; I; Win95) - www.panopticsearch.com/
- The Panoptic system is based on research by the Enterprise Search Group in CSIRO and the ANU in Canberra, Australia
- 28) MS Search - www.microsoft.com/sharepoint/
- By default, the string for SharePoint Portal Server is:
Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft
- 29) NextopiaBOT - www.nextopia.com/
- 30) NLCrawler - www.nlsearch.com/
- Northern Light provides search and content integration technology and solutions for enterprises and individuals.
- 31) nuSearch Spider - www.nurelm.com/
- 32) Nutch - www.nutch.org/docs/en/bot.html
- When we crawl to populate our index, we advertise the "User-agent" string "NutchOrg". If you see the agent "Nutch" or "NutchCVS", that's probably a developer testing a new version of our robot, or someone running their own instance
- 33) Oracle Ultra Search - otn.oracle.com/products/ultrasearch/
- Ultra Search can be used to search across Collaboration Suite Components, corporate Web servers, databases, mail servers, fileservers and Oracle10g Portal instances.
- 34) PageFetcher-Google-CoOp; - www.google.com/coop/
- Google Co-op is a platform that enables you to customize the web search experience for users of both Google and your own website.
- 35) perform_crawl - ivia.ucr.edu/useragents.shtml
- The Nalanda iVia Focused Crawler (NIFC) is a focused Web crawler.
- 36) Project XP5 - marty.anstey.ca/projects/robots/
- 37) RDSIndexer - www.dytech.com.au/projects/RDS.asp
- Information Resource Management Tool/Web Portal
- 38) Reaper - marty.anstey.ca/projects/robots/reaper.html
- 39) SemioTagger - www.entrieva.com/entrieva/products/semiotagger.asp?Hdr=semiotagger
- Entrieva's SemioTagger is a categorization and indexing engine
- 40) SiteScanGa - sitescanga.com/
- SiteScan is designed to help you configure Google Analytics.
- 41) SwishSpider - swish-e.org/
- SWISH-E is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other files
- 42) TeraText AGLS Harvester - www.teratext.com/
- A text database system and search engine built for handling large text collections
- 43) TeraXML - www.doclinx.com/products/ftxml.html
- 44) T-H-U-N-D-E-R-S-T-O-N-E - www.thunderstone.com/texis/site/pages/webinator.html
- 45) Ultraseek - www.verity.com/products/ultraseek/
- 46) URL_Spider_Pro - www.innerprise.net/usp-bi.asp
- With URL Spider Pro, you can create a search engine for any topic, no matter how specific.
- 47) vspider - www.macromedia.com/cfusion/search/?term=vspider
- ColdFusion MX includes several Verity utilities to diagnose and manage your collections. These tools include the mkvdk, rcvdk, rck2, and vspider utilities...
- 48) WebRACE - www.cs.ucy.ac.cy/Projects/eRACE/webrace.html
- WebRACE is a prototype HTTP Retrieval, Annotation and Caching Engine developed in Java
- 49) WSB - websearchbench.cs.uni-dortmund.de/websearch/features.html
- WebSearchBench consists of the two software components Web Crawler and Search Engine (Repository, Indexer and search software)
- 50) Xbot - cdrnet.ch/projects/xbot/
- The xbot software is a modular bot environment based on the .net framework for autonomous neuronal network, script or map driven mobile omniwheel robots using the SV203 controller.
For more information on the user agents listed you can click on the
associated link. If you think any of the information here is incorrect
or misleading please let us know using the Feedback link below.
Please be aware that we do not add user agents to the database on
request, but rather wait to see them in our log files.
Browse User Agents by Category
- Browser Extensions (42)
- Browser extensions are programs that change or enhnace your web browser. Some of them also collect data by sending information on your browsing habits back to a central server.
- Content Management (13)
- Data Collection - Commercial (47)
- These are sites that collect information for commercial benefit. As far as we are aware no useful information or reports are provided to the public.
- Data Collection - Research (29)
- These agents are conducting research on the WWW. They may also offer commercial services.
- Devices (23)
- Mobile phones and other gadgets with browser technology.
- Download Managers (39)
- Programs that enable users to download or extract information from a website or web server.
- Indexing Tools (50)
- This is software that enables local or remote indexing of web pages and other content for the purposes of setting up a search engine.
- Link Checking Utilities (41)
- This is software that conducts remote or local link checking.
- Media Players (5)
- Applications for playing music, video and other media over the Internet.
- Other Resources (12)
- Links to online resources relating to robots and spiders.
- Proxies (7)
- If several clients request the same content, the proxy can deliver that content from its cache, rather than requesting it from the origin server each time.
- RSS/Atom Aggregators (43)
- These are browser extensions or search spiders that focus on indexing or aggregating RSS and Atom feeds.
- Search Engine Spiders (220)
- These agents conduct Internet-wide indexing for various search engines.
- Server Platforms (6)
- Server Software (31)
- Site Monitoring Services (15)
- Software Components (58)
- These are code libraries or application development packages that can be used to build Internet-related applications. How they are used depends on the developer.
- Spambots? (45)
- These are programs that are used predominately to harvest email addresses, find open guestbooks to post to, etc. They may also have legitimate uses.
- Unclassified (174)
- The following user agents have either not been identified or do not fit neatly into other categories. New agents appear every day that have limited lifespans. Most (but not all) legitimate user agents identify themselves with a URI or email address.
- Validation Tools (10)
- These are programs and sites that can be used to validate various aspects of your site: HTML, CSS, META tags, etc.
- Web Browsers (36)
Send a message to The Art of Web:
press <Esc> or click outside this box to close