Handling Google's neat X-Robots-Tag - Sending REP header tags with PHP
It's a bad habit to tell the bad news first, and I'm guilty of that. Yesterday I linked to Dan Crow telling Google that the unavailable_after tag is useless IMHO. So todays post is about a great thing: REP header tags aka X-Robots-Tags, unfortunately mentioned as second news somewhat concealed in Google's announcement.
The REP is not only a theatre, it stands for Robots Exclusion Protocol (robots.txt and robots meta tag). Everything you can shove into a robots meta tag on a HTML page can now be delivered in the HTTP header for any file type:
So how can you serve X-Robots-Tags in the HTTP header of PDF files for example? Here is one possible procedure to explain the basics, just adapt it for your needs:
Rewrite all requests of PDF documents to a PHP script knowing wich files must be served with REP header tags. You could do an external redirect too, but this may confuse things. Put this code in your root's .htaccess:
In /pdf you store some PDF documents and serve_pdf.php:
This setup routes all requests of *.pdf files to /pdf/serve_pdf.php which outputs something like this header when a user agent asks for /pdf/my.pdf:
You can do that with all kind of file types. Have fun and say thanks to Google :)
The REP is not only a theatre, it stands for Robots Exclusion Protocol (robots.txt and robots meta tag). Everything you can shove into a robots meta tag on a HTML page can now be delivered in the HTTP header for any file type:
- INDEX|NOINDEX - Tells whether the page may be indexed or not
- FOLLOW|NOFOLLOW - Tells whether crawlers may follow links provided on the page or not
- ALL|NONE - ALL = INDEX, FOLLOW (default), NONE = NOINDEX, NOFOLLOW
- NOODP - tells search engines not to use page titles and descriptions from the ODP on their SERPs.
- NOYDIR - tells Yahoo! search not to use page titles and descriptions from the Yahoo! directory on the SERPs.
- NOARCHIVE - Google specific, used to prevent archiving (cached page copy)
- NOSNIPPET - Prevents Google from displaying text snippets for your page on the SERPs
- UNAVAILABLE_AFTER: RFC 850 formatted timestamp - Removes an URL from Google's search index a day after the given date/time
So how can you serve X-Robots-Tags in the HTTP header of PDF files for example? Here is one possible procedure to explain the basics, just adapt it for your needs:
Rewrite all requests of PDF documents to a PHP script knowing wich files must be served with REP header tags. You could do an external redirect too, but this may confuse things. Put this code in your root's .htaccess:
RewriteEngine On
RewriteBase /pdf
RewriteRule ^(.*)\.pdf$ serve_pdf.php
In /pdf you store some PDF documents and serve_pdf.php:
...
$requestUri = $_SERVER['REQUEST_URI'];
...
if (stristr($requestUri, "my.pdf")) {
header('X-Robots-Tag: index, noarchive, nosnippet', TRUE);
header('Content-type: application/pdf', TRUE);
readfile('my.pdf');
exit;
}
...
This setup routes all requests of *.pdf files to /pdf/serve_pdf.php which outputs something like this header when a user agent asks for /pdf/my.pdf:
Date: Tue, 31 Jul 2007 21:41:38 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.4
X-Powered-By: PHP/4.4.4
X-Robots-Tag: index, noarchive, nosnippet
Connection: close
Transfer-Encoding: chunked
Content-Type: application/pdf
You can do that with all kind of file types. Have fun and say thanks to Google :)
Labels: .htaccess, crawler directives, Google, robots meta tags, SEO, X-Robots-Tag
Stumble It! |
Post it to del.icio.us |
-->
3 Comments:
At Wednesday, August 01, 2007, dockarl said…
Sebastian - Great post.
I didn't have any idea about that - plan to write a post referencing you.
Cheers,
Matt
At Thursday, August 02, 2007, Sebastian said…
Thanks Matt :)
Here is another good link from Hamlet Batista:
serving X-Robots-Tags with SetEnvIf and Header add in .htaccess - neat :)
At Thursday, August 02, 2007, Anonymous said…
Sebastian - That is very clever. Good job!
I thought about doing something similar, but decided the .htaccess solution might be easier for non-programmers.
I thought about using ScriptAliasMatch to map files to a cgi script that would add the header and using PATH_INFO to find the file on disk.
Post a Comment
<< Home