System: Searchable Directory of User Agents: Data Collection - Research

Tweet 0 Shares 0 Tweets 0 Comments

The following is a directory of user agents, including their source and general purpose as far as we can determine. Most entries link to an "official" site containing more detailed information. You can also paste a UA from your logs into the form below, hit [Go!] and see a list the relevant agents.

We currently have 946 distinct user agents in our database representing everything from search engines to software components and spambots. These have been collected from our log files over a number of years and researched manually.

Search for User Agent

To use this form just copy and paste an entire User-Agent string from your server log file into the input box and then submit the form. The search is case-sensitive so "nokia" will not match "Nokia".

Most user agent strings now contain a number of separate components so the search will return a list of everything that has a match in the database.

Data Collection - Research

These agents are conducting research on the WWW. They may also offer commercial services.

1) ADSARobot - www.cnds.ucd.ie/adsa
2) Archive-It - www.rlg.org/: The goal of the Archive-It pilot is to determine if RLG can partner with Internet Archive to make a new, remotely hosted service for Web archiving available to our members.
3) BruinBot - webarchive.cs.ucla.edu/bruinbot.html: BruinBot is a crawler developed at UCLA, and used to download parts of the Web which are important for their research.
4) Computer_and_Automation_Research_Institute_Crawler - www.ilab.sztaki.hu/websearch/
5) deepak-USC - www.isi.edu/~ravichan/deepak-usc-isi.html: Downloading webpages for research in AI/Natural Language Processing as part of a PhD thesis project.
6) discobot - discoveryengine.com/discobot.html: Discobot is the experimental web crawler for Discovery Engine.
7) Dolphin - tele-house.ru/crawler.html: Dolphin Crawler is a web-crawler prototype.
8) DotBot - www.dotnetdotcom.org/: A few Seattle based guys trying to figure out how to make the best possible crawler.
9) e-SocietyRobot - www.yama.info.waseda.ac.jp/~yamana/es/index_eng.htm: The research project "Technologies for the Knowledge Discovery from the Internet" is one of the year 2003 leading projects of Ministry of Education, Culture, Sports, Science and Technology, Japan.
10) gazz - gazz.nttrd.com/
11) Generic
12) Gigamega.bot - gigamega.net/bot.html: gigamega.net serves as a testing ground for information search technologies and programs, developed by a group of young scientists in Russia. Full support of robots.txt will be launched soon.
13) Google Keyword Tool - adwords.google.com/select/KeywordToolExternal: Use the Keyword Tool to get new keyword ideas.
14) Haste - haste.kytoon.com/: We are checking the connectivity of the Web and 'Haste' checks selected websites only once, compiling averages on link numbers. If a Web site is detected to be on error, 'Haste' logs this information.
15) IRLbot - irl.cs.tamu.edu/crawler/: IRL-crawler is a Texas A&M research project that investigates algorithms for mapping the topology of the Internet.
16) KnowItAll - www.cs.washington.edu/research/knowitall/
17) LiteFinder - www.litefinder.net/about.html: LiteFinder Network Crawler is a research project started by a group of Indian candidates from the cities of Bangalore, Patna and Jaipur.
18) MLBot - www.metadatalabs.com/mlbot/: We are a building an index of media on the web.
19) Netcraft - netcraft.com/: Providing network security services, including application testing, code reviews, and automated penetration testing.
20) PDFBot - pdfind.com/: We are a building an index of PDF-media on the web.
21) PolyBot - cis.poly.edu/polybot/: Polybot is a part of an academic research project that aims to improve search and analysis techniques for the World Wide Web. We crawl data for academic/research purposes ONLY.
22) SapphireWebCrawler - boston.lti.cs.cmu.edu/crawler/: The Sapphire web crawler is operated by a research project at the Language Technologies Institute, a computer science department within Carnegie Mellon University's School of Computer Science.
23) Shim Crawler: The University of Tokyo
24) TAMU - www.cs.tamu.edu/
25) Twiceler - www.cuill.com/robots.html: Twiceler is an experimental web crawler. It should obey robots.txt
26) UIowaCrawler - www.cs.uiowa.edu/
27) WebVac - www.webvac.org/: Monthly crawls of government, news and general. Also special event based crawls. Pages are available for outside research.
28) wume_crawler - wume.cse.lehigh.edu/~xiq204/crawler/: WUME Crawler is WUME Lab's web crawler. It automatically downloads web pages and stores them for academic research use.
29) Zao - www.kototoi.org/zao/

For more information on the user agents listed you can click on the associated link. If you think any of the information here is incorrect or misleading please let us know using the Feedback link below.

Please be aware that we do not add user agents to the database on request, but rather wait to see them in our log files.

Browse User Agents by Category

Browser Extensions (42): Browser extensions are programs that change or enhnace your web browser. Some of them also collect data by sending information on your browsing habits back to a central server.
Content Management (13)
Data Collection - Commercial (47): These are sites that collect information for commercial benefit. As far as we are aware no useful information or reports are provided to the public.
Data Collection - Research (29): These agents are conducting research on the WWW. They may also offer commercial services.
Devices (23): Mobile phones and other gadgets with browser technology.
Download Managers (39): Programs that enable users to download or extract information from a website or web server.
Indexing Tools (50): This is software that enables local or remote indexing of web pages and other content for the purposes of setting up a search engine.
Link Checking Utilities (41): This is software that conducts remote or local link checking.
Media Players (5): Applications for playing music, video and other media over the Internet.
Other Resources (12): Links to online resources relating to robots and spiders.
Proxies (7): If several clients request the same content, the proxy can deliver that content from its cache, rather than requesting it from the origin server each time.
RSS/Atom Aggregators (43): These are browser extensions or search spiders that focus on indexing or aggregating RSS and Atom feeds.
Search Engine Spiders (220): These agents conduct Internet-wide indexing for various search engines.
Server Platforms (6)
Server Software (31)
Site Monitoring Services (15)
Software Components (58): These are code libraries or application development packages that can be used to build Internet-related applications. How they are used depends on the developer.
Spambots? (45): These are programs that are used predominately to harvest email addresses, find open guestbooks to post to, etc. They may also have legitimate uses.
Unclassified (174): The following user agents have either not been identified or do not fit neatly into other categories. New agents appear every day that have limited lifespans. Most (but not all) legitimate user agents identify themselves with a URI or email address.
Validation Tools (10): These are programs and sites that can be used to validate various aspects of your site: HTML, CSS, META tags, etc.
Web Browsers (36)

< System

Post your comment or question