System: Searchable Directory of User Agents: Data Collection - Commercial
The following is a directory of user agents, including their source and general
purpose as far as we can determine. Most entries link to an
"official" site containing more detailed information. You can
also paste a UA from your logs into the form below, hit [Go!] and see a
list the relevant agents.
We currently have 946 distinct user agents in our database representing everything from search engines to software components and spambots. These have been collected from our log files over a number of years and researched manually.
Search for User Agent
To use this form just copy and paste an entire User-Agent
string from your server log file into the input box and then submit the
form. The search is case-sensitive so "nokia" will not
Most user agent strings now contain a number of separate components
so the search will return a list of everything that has a match in the
Data Collection - Commercial
These are sites that collect information for commercial benefit. As far as we are aware no useful information or reports are provided to the public.
- 1) accelobot - www.accelobot.com/
- I search the Internet for online market trends and emerging technologies
- 2) aipbot - www.aipbot.com/
- see "Name Protect"
- 3) almaden - www.almaden.ibm.com/webfountain/
- Harnessing WebFountain's power will help enterprises gain insightful, highly synthesized, timely, and customized information that is not readily perceptible or available today. This includes information such as emerging trends and patterns, competitive activities, "buzz" about products, relationships affecting customers' businesses, and pathways to discovery.
- 4) attributor - www.attributor.com/
- Attributor gives you confidence in knowing all instances where your original content appears across the Internet.
- 5) australia-au.org - australia-au.org/directory/
- 6) Balihoo - www.balihoo.com/
- Balihoo is a suite of media planning software and service tools designed to make the lives of professional media buyers and media sellers easier.
- 7) BDFetch - branddimensions.com/
- 8) Biz360 - www.biz360.com/
- Biz360 gives you insight you’ve never had before so that you can be more empowered than you’ve ever been before.
- 9) Bloodhound - balihoo.com/
- 10) BPImageWalker - www.bdbrandprotect.com/water-marked-images-protection.html
- Users can embed copyright, owner identification and other digital information into images, so both authorized and unauthorized use of digital assets can be tracked as they travel across the Internet.
- 11) Butterfly - topsy.com/butterfly.html
- Find out how many Retweets each of your tweets gets.
- 12) CazoodleBot - www.cazoodle.com/cazoodlebot/
- As our current objective, Cazoodle focuses on developing a suite of solutions for enabling data-aware search to both the surface and the deep Web, aiming at supporting the pressing demand of building vertical search services in specialized domains on the Web.
- 13) Cerfinfo - www.cerfinfo.com/
- CERFinfo.com is a dynamic directory of tens of thousands of carefully selected, information-rich, safe K-12 websites.
- 14) Chilkat - www.chilkatsoft.com/
- 15) CJNetworkQuality - www.cj.com/networkquality/
- The network quality utility tool searches each publisher Web site that is registered in the Commission Junction network that generates traffic to monitor compliance to the Publisher Service Agreement, specifically, Sections 1 and/or 2.2.
- 16) Comodo HTTP(S) Crawler - www.instantssl.com/
- 17) ConveraMultiMediaCrawler - www.convera.com/
- 18) Crawl_Application
- IBM Crawl Application (?)
- 19) DomainsDB.net - domainsdb.net/
- Reverse IP & NS Lookup Tool
- 20) Exabot - www.exalead.com/
- "Exabot" is Exalead's robot engine. Its task is to collect and index data from around the world to provide search engine facilities to large and medium companies.
- 21) heritrix - crawler.archive.org/
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
- 22) IssueCrawler - www.govcom.org/about_us.html
- 23) IUSA Browser - infousa.com/
- 24) LinexLegal - www.linexlegal.com/
- 25) lmspider - www.scansoft.com/
- The lmspider collects text as part of a research project to improve the linguistic models used in Scansoft's speech recognition engine
- 26) MarkWatch - www.markwatch.com/
- MarkWatch scans and filters a wide range of Internet content to find your trademark, brand name or other information
- 27) maxamine.com - www.maxamine.com/
- 28) Mediapartners-Google - www.google.com/adsense/
- 29) Me.dium - www.oneriot.com/
- 30) miniRank - minirank.com/
- 31) MSRBOT - research.microsoft.com/research/sv/msrbot
- Microsoft is using the MSRBot web crawler to collect data from the web for further study.
- 32) MyFamilyBot - www.myfamilyinc.com/
- 33) NPBot - www.nameprotect.com/botinfo.html
- NPBot is the NameProtect Inc. web crawler. As a Digital Brand Asset Management company, NameProtect engages in crawling activity in search of a wide range of brand and other intellectual property violation.
- 34) nrsbot - loopip.com/robot.html
- Net Research Server (NRS) crawls pages all over the Internet in order to build a full-text search engine.
- 35) panscient - www.panscient.com/
- Panscient crawls the web and collects information on people and companies for vertical search applications.
- 36) Plugger - www.plugger.com.au/
- Plugger sources Australian business news across a vast set and variety of sources because what you read in the traditional media is only one perspective.
- 37) SBIder - www.sitesell.com/sbider.html
- SiteSell is gathering a statistical representation of topics presented on the Web as a whole.
- 38) semanticdiscovery - www.semanticdiscovery.com/sd/robot.html
- 39) Semiocast - semiocast.com/
- 40) SlySearch - www.slysearch.com/
- This is a previous name for TurnitinBot.
- 41) SurveyBot - www.whois.sc/info/webmasters/surveybot.html
- Each week SurveyBot will query websites for statistics and other useful information. This information goes into the creation of the Whois Source domain search engine.
- 42) Syntryx - www.syntryx.com/
- 43) Teemer - www.netseer.com/crawler.html
- NetSeer is a Los Angeles based Internet startup.
- 44) TurnitinBot - www.turnitin.com/robot/crawlerinfo.html
- This robot collects content from the Internet for the sole purpose of helping educational institutions prevent plagiarism. In particular, we compare student papers against the content we find on the Internet to see if we can find similarities.
- 45) TweetmemeBot - tweetmeme.com/
- Tweetmeme is a service which aggregates all the popular links on twitter to determine which links are popular.
- 46) Twitturly - twitturly.com/
- We track and rank what URLs people are talking about on Twitter.
- 47) Uptimebot - www.uptimebot.com/
- UptimeBot is a web crawler that checks return codes of web servers and calculates average number of current servers status.
For more information on the user agents listed you can click on the
associated link. If you think any of the information here is incorrect
or misleading please let us know using the Feedback link below.
Please be aware that we do not add user agents to the database on
request, but rather wait to see them in our log files.
Browse User Agents by Category
- Browser Extensions (42)
- Browser extensions are programs that change or enhnace your web browser. Some of them also collect data by sending information on your browsing habits back to a central server.
- Content Management (13)
- Data Collection - Commercial (47)
- These are sites that collect information for commercial benefit. As far as we are aware no useful information or reports are provided to the public.
- Data Collection - Research (29)
- These agents are conducting research on the WWW. They may also offer commercial services.
- Devices (23)
- Mobile phones and other gadgets with browser technology.
- Download Managers (39)
- Programs that enable users to download or extract information from a website or web server.
- Indexing Tools (50)
- This is software that enables local or remote indexing of web pages and other content for the purposes of setting up a search engine.
- Link Checking Utilities (41)
- This is software that conducts remote or local link checking.
- Media Players (5)
- Applications for playing music, video and other media over the Internet.
- Other Resources (12)
- Links to online resources relating to robots and spiders.
- Proxies (7)
- If several clients request the same content, the proxy can deliver that content from its cache, rather than requesting it from the origin server each time.
- RSS/Atom Aggregators (43)
- These are browser extensions or search spiders that focus on indexing or aggregating RSS and Atom feeds.
- Search Engine Spiders (220)
- These agents conduct Internet-wide indexing for various search engines.
- Server Platforms (6)
- Server Software (31)
- Site Monitoring Services (15)
- Software Components (58)
- These are code libraries or application development packages that can be used to build Internet-related applications. How they are used depends on the developer.
- Spambots? (45)
- These are programs that are used predominately to harvest email addresses, find open guestbooks to post to, etc. They may also have legitimate uses.
- Unclassified (174)
- The following user agents have either not been identified or do not fit neatly into other categories. New agents appear every day that have limited lifespans. Most (but not all) legitimate user agents identify themselves with a URI or email address.
- Validation Tools (10)
- These are programs and sites that can be used to validate various aspects of your site: HTML, CSS, META tags, etc.
- Web Browsers (36)
Send a message to The Art of Web:
press <Esc> or click outside this box to close