Sunday, March 14, 2010

Creating Robots File

Robots file is most important ,A robots.txt is a file placed on your server to tell or instruct the various search engine spiders not to crawl or index certain sections or pages of your website or blog.The robots.txt file itself is a simple text file, which can be created in Notepad or The robots.txt is a very simple text file that is placed on your root directory. An example would be This robots.txt file tells various search engine and other robots which areas of your website or blog they are allowed to visit and index.You can only have one robots.txt on your website or blog.


BAD : Won't work:

If you are using Wordpress a sample robots.txt file would be:


Disallow: /wp-

User-agent: User-agent means that all the search bots like Google, Yahoo, Bing ,Ask,MSN ,  Alexa,GigaBlast , DMOZ Checkerand ,Baidu and so on should use those instructions to crawl your website.

Disallow: /wp- Disallow means this will make sure that the search engines will not crawl the Wordpress files.

Web Robots are sometimes referred to as Web Crawlers, or Spiders. Therefore the process of a robot visiting your website is called "Spidering" or "Crawling". When we says that the search engines  spidered my website or blog, it means the search engine robots or Web Crawlers have visited their website.This robot Web Crawlers is known by a name and has an independent IP address.IP address is not importance to us, but knowing robot names will help in create a robots.txt file.This is why the file is called "robots.txt." Following are the list of the robots very popular Specific Search Robots names with there bot name:

 Specific Search Engines Robots

Engine                                      Bots ************* Googlebot   ************* Ia_Archiver   *************      Msnbot************* Scooter ************* ArchitextSpider ************* Arachnoidea ************* GenCrawler ************* UltraSeek ************* Slurp *************Naverbot, yeti************* MantraAgent ************* Lycos_Spider_(T-Rex) ************* Baiduspider ************* Twiceler ************* Gigabot ************* Gulper ************* MantraAgent  ************* Teoma_agent1 ************* Fluffy the spider ************* FAST-WebCrawler   ************* Arachnoidea

 Specific Special Bots

Google Image ************* Googlebot-Image

Google Mobile ************* Googlebot-Mobile

Yahoo MM ************* Yahoo-Mmcrawler

MSN PicSearch *************Psbot

No comments:

Post a Comment