CU Home
Columbia University Information Technology
Prevent Indexing

Web Design > Google Search > Prevent Indexing

Keep pages out of the index

There are several ways to prevent some or all of your web pages from being indexed. If you need to get a page out of the index urgently, contact the CUIT Helpdesk online or call 212-854-1919.


Use a robots meta tag

If you don't want a page to be indexed, you can insert this <meta> tag within your page's HEAD section:

<meta name="robots" content="noindex, nofollow">
This tells all robots (not just Columbia's search engine) not to index the page, and not to follow any links from the page. If the page has already been indexed, it will be removed from the index the next time Google crawls the page.

You should put this tag on all pages you don't want indexed. If you have an entire directory of files you don't want indexed, consider putting them in a no_crawl directory).

If you want a page indexed but do not want any of the links on the page to be followed, you can use the following instead:

<meta name="robots" content="index, nofollow">

Use googleoff/googleon tags

By embedding googleoff/googleon tags with their flags in your HTML page, you can disable:
  • The indexing of a word or portion of a web page
  • The indexing of anchor text
  • The use of text to create a snippet in search results
For details about the use of each googleoff/googleon flag, refer to the following table:

Flag Description Example Results
index Words between the tags are not indexed as occurring on the current page. fish <!--googleoff: index-->shark
<!--googleon: index-->mackerel
The words fish and mackerel are indexed for this page, but the occurrence of shark is not indexed.
This page could appear in search results for the term shark only if the word appears elsewhere on the page or in anchortext for links to the page.
Hyperlinks that appear within these tags are followed.
anchor Anchor text that appears between the tags and in links to other pages is not indexed. This prevents the index from using the hyperlink to associate the link text with the target page in search results. <!--googleoff: anchor--><A href=sharks_rugby.html>
shark </A> <!--googleon: anchor-->
The word shark is not associated with the page sharks_rugby.html. Otherwise this hyperlink would cause the page sharks_rugby.html to appear in the search results for the term shark.
snippet Text between the tags is not used to create snippets for search results. <!--googleoff: snippet-->Come to the fair!
<!--googleon: snippet-->
The text Come to the fair! does not appear in snippets with the search results.
all Turns on all the attributes. Text between the tags is not indexed, followed to another linked-to page, or used for a snippet. <!--googleoff: all-->Come to the fair!
<!--googleon: all-->
The text Come to the fair! is not indexed, is not associated with anchor text, and does not appear in snippets with the search results.



Use a no_crawl directory

The Google Search Appliance will not crawl any directory named "no_crawl." You can keep files and directories out of the index by creating a directory called "no_crawl" and putting all the files you want to hide from Google inside.

Using a "no_crawl" directory does not provide directory security or block people from accessing the directory.


Use a robots.txt file

If you run your own webserver and don't want any pages to be visited by one or more robots, you can use a robots.txt file. For more information about how to do this, refer to the Robots Exclusion Site.