|
|
Web Design > Google Search >
Prevent Indexing
There are several ways to prevent some or all of your web pages from
being indexed.
If you need to get a page out of the index urgently, contact
the CUIT Helpdesk
online
or call 212-854-1919.
Use a robots meta tag
If you don't want a page to be indexed, you can insert this <meta> tag
within your page's HEAD section:
<meta name="robots" content="noindex, nofollow">
This tells all robots (not just Columbia's search engine) not to index
the page, and not to follow any links from the page. If the page has
already been indexed, it will be removed from the index the next time
Google crawls the page.
You should put this tag on all pages you don't want
indexed. If you have an entire directory of files you don't
want indexed, consider putting them in a no_crawl
directory).
If you want a page indexed but do not want any of the links on the
page to be followed, you can use the following instead:
<meta name="robots" content="index, nofollow">
Use googleoff/googleon tags
By embedding googleoff/googleon tags with their flags in your HTML
page, you can disable:
- The indexing of a word or portion of a web page
- The indexing of anchor text
- The use of text to create a snippet in search results
For details about the use of each googleoff/googleon flag, refer to
the following table:
| Flag |
Description |
Example |
Results |
| index |
Words between the tags are not indexed as occurring on the current page. |
fish <!--googleoff: index-->shark
<!--googleon: index-->mackerel |
The words fish and mackerel are indexed for this page, but the occurrence of shark is not indexed.
This page could appear in search results for the term shark only if the word appears elsewhere on the page or in anchortext for links to the page.
Hyperlinks that appear within these tags are followed. |
| anchor |
Anchor text that appears between the tags and in links to other pages is not indexed. This prevents the index from using the hyperlink to associate the link text with the target page in search results. |
<!--googleoff: anchor--><A href=sharks_rugby.html>
shark </A> <!--googleon: anchor--> |
The word shark is not associated with the page sharks_rugby.html. Otherwise this hyperlink would cause the page sharks_rugby.html to appear in the search results for the term shark. |
| snippet |
Text between the tags is not used to create snippets for search results. |
<!--googleoff: snippet-->Come to the fair!
<!--googleon: snippet--> |
The text Come to the fair! does not appear in snippets with the search results. |
| all |
Turns on all the attributes. Text between the tags is not indexed, followed to another linked-to page, or used for a snippet. |
<!--googleoff: all-->Come to the fair!
<!--googleon: all--> |
The text Come to the fair! is not indexed, is not associated with anchor text, and does not appear in snippets with the search results. |
Use a no_crawl directory
The Google Search Appliance will not crawl any directory named
"no_crawl." You can keep files and directories out of the
index by creating a directory called "no_crawl" and putting
all the files you want to hide from Google inside.
Using a "no_crawl" directory does not provide directory
security or block people from accessing the directory.
Use a robots.txt file
If you run your own webserver and don't want any pages to be visited
by one or more robots, you can use a robots.txt file. For more
information about how to do this, refer to
the Robots Exclusion Site.
|
 |