How to remove your content from Google’s various web properties.
Some people are more than thrilled to have Google’s properties index their sites. Other folks don’t want the Google bot anywhere near them. If you fall into the latter category and the bot’s already done its worst, there are several things you can do to remove your materials from Google’s index. Each of Google’s properties—Web Search, Google Images, and Google Groups—has its own set of methodologies.
Here are several tips to avoid being listed.
While you can take steps to remove your content from the Google index after the fact, it’s always much easier to make sure the content is never found and indexed in the first place.
Google’s crawler obeys the “robot
exclusion protocol,” a set of instructions you put
on your web site that tells the crawler how to behave when it comes
to your content. You can implement these instructions in two ways:
via a META
tag that you put on each page (handy
when you want to restrict access to only certain pages or certain
types of content) or via a robots.txt
file that
you insert in your root directory (handy when you want to block some
spiders completely or want to restrict access to kinds or directories
of content). You can get more information about the
robots exclusion protocol
and how to implement it at http://www.robotstxt.org/.
There are several things you can have removed from Google’s results.
These instructions are for keeping your site out of Google’s index only. For information on keeping your site out of all major search engines, you’ll have to work with the robots exclusion protocol.
Use the robots exclusion protocol, probably with
robots.txt
.
Use the following META
tag in the
HEAD
section of each page you want to remove:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
A “snippet” is the little excerpt
of a page that Google displays on its search result. To remove
snippets, use the following META
tag in the
HEAD
section of each page for which you want to
prevent snippets:
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">
To keep Google from keeping cached versions of your pages in their
index, use the following META
tag in the
HEAD
section of each page for which you want to
prevent caching:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
Once you implement these changes, Google will remove or limit your
content according to your META
tags and
robots.txt
file the next time your web site is
crawled, usually within a few weeks. But if you want your materials
removed right away, you can use the automatic remover at http://services.google.com:8882/urlconsole/controller.
You’ll have to sign in with an account (all an
account requires is an email address and a password). Using the
remover, you can request either that Google crawl your newly created
robots.txt
file, or you can enter the URL of a
page that contains exclusionary META
tags.
You may like your content fine, but you might find that even if you have filtering activated you’re getting search results with explicit content. Or you might find a site with a misleading title tag and content completely unrelated to your search.
You have two options for reporting these sites to Google. And bear in mind that there’s no guarantee that Google will remove the sites from the index, but they will investigate them. At the bottom of each page of search results, you’ll see “Help Us Improve” link; follow it to a form for reporting inappropriate sites. You can also send the URL of explict sites that show up on a SafeSearch but probably shouldn’t to [email protected]. If you have more general complaints about a search result, you can send an email to [email protected].
Google Images’ database of materials is separate
from that of the main search index. To remove items from Google
Images, you should use
robots.txt
to specify that the Google bot Image
crawler should stay away from your site. Add these lines to your
robots.txt
file:
User-agent: Googlebot-Image Disallow: /
You can use the automatic remover mentioned in the web search section to have Google remove the images from its index database quickly.
There may be cases where someone has put images on their server for
which you own copyright. In other words, you don’t
have access to their server to add a robots.txt
file, but you need to stop Google’s indexing of your
content there. In this case, you need to contact Google directly.
Google has instructions for situations just like this at http://www.google.com/remove.html; look at
Option 2, “If you do not have any access to the
server that hosts your image.”
Like the Google Web Index, you have the option to both prevent material from being archived on Google and to remove it after the fact.
To prevent your material from being archived on Google, add the following line to the headers of your Usenet posts:
X-No-Archive: yes
If you do not have the options to edit the headers of your post, make that line the first line in your post itself.
If you want materials removed after the fact, you have a couple of options:
If the materials you want removed were posted under an address to which you still have access, you may use the automatic removal tool mentioned earlier in this hack.
If the materials you want removed were posted under an address to which you no longer have access, you’ll need to send an email to [email protected] with the following information:
Your full name and contact information, including a verifiable email address.
The complete Google Groups URL or message ID for each message you want removed.
A statement that says “I swear under penalty of civil or criminal laws that I am the person who posted each of the foregoing messages or am authorized to request removal by the person who posted those messages.”
Your electronic signature.
You may not wish to have your contact information made available via the phonebook searches on Google. You’ll have to follow one of two procedures, depending on whether the listing you want removed is for a business or for a residential number.
If you want to remove a business phone number, you’ll need to send a request on your business letterhead to:
Google PhoneBook Removal |
2400 Bayshore Parkway |
Mountain View, CA 94043 |
You’ll also have to include a phone number where Google can reach you to verify your request.
If you want to remove a residential phone number, it’s much simpler. You’ll need to fill out a form at http://www.google.com/help/pbremoval.html. The form asks for your name, city and state, phone number, email address, and reason for removal, a multiple choice: incorrect number, privacy issue, or “other.”