The robots.txt
file is a file that sits at the root level of your web site and asks spiders and bots to behave themselves when they're on your site. You can take a look at it by pointing your browser to http://www.yourDrupalsite.com/robots.txt. Think of it like an electronic No Trespassing sign that can easily tell the search engines not to crawl a certain directory or page of your site. Using wildcards, you can even tell the engines not to crawl certain file types like .jpg or .pdf. This means none of your JPEG images or PDF files will show up in the search engines. (I'm not recommending that you do that…but you could.)
Note
The robots.txt file is required by Google
On December 1, 2008, John Mueller, a Google analyst, said that if the Googlebot can't access the robots.txt file (say the server is unreachable or returns a 5xx error result code) then it won't crawl the web site at all. In other words, the robots.txt file must be there if you want the web site...