Sebastian's Pamphlets

If you've read my articles somewhere on the Internet, expect something different here.

MOVED TO SEBASTIANS-PAMPHLETS.COM

Please click the link above to read actual posts, this archive will disappear soon!

Stay tuned...

Tuesday, July 31, 2007

Handling Google's neat X-Robots-Tag - Sending REP header tags with PHP

It's a bad habit to tell the bad news first, and I'm guilty of that. Yesterday I linked to Dan Crow telling Google that the unavailable_after tag is useless IMHO. So todays post is about a great thing: REP header tags aka X-Robots-Tags, unfortunately mentioned as second news somewhat concealed in Google's announcement.

The REP is not only a theatre, it stands for Robots Exclusion Protocol (robots.txt and robots meta tag). Everything you can shove into a robots meta tag on a HTML page can now be delivered in the HTTP header for any file type:
  • INDEX|NOINDEX - Tells whether the page may be indexed or not
  • FOLLOW|NOFOLLOW - Tells whether crawlers may follow links provided on the page or not
  • ALL|NONE - ALL = INDEX, FOLLOW (default), NONE = NOINDEX, NOFOLLOW
  • NOODP - tells search engines not to use page titles and descriptions from the ODP on their SERPs.
  • NOYDIR - tells Yahoo! search not to use page titles and descriptions from the Yahoo! directory on the SERPs.
  • NOARCHIVE - Google specific, used to prevent archiving (cached page copy)
  • NOSNIPPET - Prevents Google from displaying text snippets for your page on the SERPs
  • UNAVAILABLE_AFTER: RFC 850 formatted timestamp - Removes an URL from Google's search index a day after the given date/time

So how can you serve X-Robots-Tags in the HTTP header of PDF files for example? Here is one possible procedure to explain the basics, just adapt it for your needs:

Rewrite all requests of PDF documents to a PHP script knowing wich files must be served with REP header tags. You could do an external redirect too, but this may confuse things. Put this code in your root's .htaccess:

RewriteEngine On
RewriteBase /pdf
RewriteRule ^(.*)\.pdf$ serve_pdf.php

In /pdf you store some PDF documents and serve_pdf.php:

...
$requestUri = $_SERVER['REQUEST_URI'];
...
if (stristr($requestUri, "my.pdf")) {
header('X-Robots-Tag: index, noarchive, nosnippet', TRUE);
header('Content-type: application/pdf', TRUE);
readfile('my.pdf');
exit;
}
...

This setup routes all requests of *.pdf files to /pdf/serve_pdf.php which outputs something like this header when a user agent asks for /pdf/my.pdf:

Date: Tue, 31 Jul 2007 21:41:38 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.4
X-Powered-By: PHP/4.4.4
X-Robots-Tag: index, noarchive, nosnippet
Connection: close
Transfer-Encoding: chunked
Content-Type: application/pdf

You can do that with all kind of file types. Have fun and say thanks to Google :)

Labels: , , , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

3 Comments:

Post a Comment

<< Home