Sebastian's Pamphlets

If you've read my articles somewhere on the Internet, expect something different here.

MOVED TO SEBASTIANS-PAMPHLETS.COM

Please click the link above to read actual posts, this archive will disappear soon!

Stay tuned...

Monday, August 20, 2007

This is my last blog post ...

... on blogspot. Farewell Blogger. Sebastian's Pamphlets moved to sebastians-pamphlets.com. Please visit me over there, change the feed URL in your reader to
http://feeds.sebastians-pamphlets.com/SebastiansPamphlets
and update your blogrolls. Thank you!

This site will be kept as an archive for a while, but I'll soon make it uncrawlable and even uglier with blinking links pointing to the new location. You dear readers don't deserve that, so please move with me to sebastians-pamphlets.com! I'm looking forward to seeing you there :)

Sebastian

Labels: ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Wednesday, August 15, 2007

Google's 5 sure-fire steps to safer indexing

Nofollow plagueAre you wondering why Gray Hat Search Engine News (GHN) is so quiet recently? One reason may be that I've borrowed their Google savvy spy. I've sent him to Mountain View again to learn more about Google's nofollow strategy.

He returned with a copy of Google's recently revised mission statement, discovered in the wastebasket of a conference room near office 211 in building 43. Read the shocking and unbelievable head note printed in bold letters:

Google's mission is to condomize the world's information and make it universally uncrawlable and useless.

Read and reread it, then some weird facts begin to make sense. Now you'll understand why:

  1. The rel-nofollow plague was designed to maximize collateral damage by devaluing all hyperlinked votes by honest users of nearly all platforms you're using everyday, for example Twitter, Wikipedia, corporate blogs, GoogleGroups ... ostensibly to nullify the efforts of a few spammers.
  2. Nobody bothers to comment on your nofollow'ed blog.
  3. Google invented the supplemental index (to store scraped resources suffering from too many condomized links) and why it grows faster than the main index.
  4. Google installed the Bigdaddy infrastructure (to prevent Ms. Googlebot from following nofollow'ed links).
  5. Google switched to BlitzCrawling (to list timely contents for a moment whilst fat resources from large archives get buried in the supplemental index). RIP deep crawler and freshbot.

Seriously, the deep crawler isn't defunct, it's called supplemental crawler nowadays, and the freshbot is still alive as Feedfetcher.




Disclaimer: All these hard facts were gathered by torturing sources close to Google, robbery and other unfair methods. If anyone bothers to debunk all that as bad joke, one question still remains: Why does Google next to nothing to stop the nofollow plague? I mean, ongoing mass abuse of rel-nofollow is obviously counterproductive with regard to their real mission.

Labels: , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Monday, August 13, 2007

Ego food from John's barbecue

JohnMu grilled me ;)

Check out his folks bin frequently for readable Webmaster interviews.

Thanks John, it was fun :)

Labels:

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Just another victim of the nofollow plague

It's evil, it sucks even more than the crappy tinyurl nonsense obfuscating link destinations, nobody outside some SEO cliques really cares about or noticed it, I'm not sure it's newsworthy because it's perfectly in line with rel-nofollow semantics, but it annoys me and others so here is the news of late last week: Twitter drank the nofollow kool-aid.

Folks, remove Twitter from your list of PageRank sources and drop links for fun and traffic only. I wonder whether particular people change their linking behavior on Twitter or not. I won't.

Nofollow crap on TwitterFollowing Nofollow's questionably tradition of maximizing collateral damage Twitter nofollows even links leading to Matt's mom's charity site. More PageRank power to you, Betty Cutts! Your son deserves a bold nofollow for inventing the beast ;)

Labels: ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Saturday, August 11, 2007

ɹǝɟɟıp oʇ bǝq ı

:sdıʇ ɹǝpısuı sʞ1oɟ ɹǝɥʇo ʇdʎɹɔuǝ oʇ unɟ s,ʇı ʇnq .sdɐɥɹǝd ¿uoɹoɯʎxo uɐ ʇɐɥʇ sı .ʎ1ɟʎɐp ɐ ǝʞı1 ʇsnظ 'ǝɹnʇnɟ ɐ sɐɥ ɔıɟɟɐɹʇ 1ıɐʇ buo1 ǝsɹǝʌǝɹ buı11nd oǝs uʍopǝpısdn

Lyndon's insider tip

Ralph's insider tip

If you're bored, give it a try. Mark did it.

Labels:

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Friday, August 10, 2007

Lijit SERP

I'm testing the lijit search results page. In theory, when you submit a lijit search you should land here to view the results. Unfortunately it doesn't work as expected when you're surfing with HTTP_REFERER turned off. Sigh.


Labels:

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Thursday, August 09, 2007

How to bait link baiters and attention whores properly

What a brilliant marketing stunt. Click here! Err... click: Brilliant. Marketing. Stunt.

Best of luck John :)

Labels: , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Wednesday, August 08, 2007

Google manifested the axe on reciprocal link exchanges

Yesterday Fantomaster via Threadwatcher pointed me to this page of Google's Webmaster help system. The cache was a few days old and didn't show a difference, I don't archive each and every change of the guidelines, so I asked and a friendly and helpful Googler told me that this item was around for a while now. Today this page made it on Sphinn and probably a few other Webmaster hangouts too.

So what the heck is the scandal all about? When you ask Google for help on "link exchange", the help machine rattles for a second, sighs, coughs, clears its throat and then yells out the answer in bold letters: "Link schemes", bah!

Ok, we already knew what Google thinks about artificial linkage: "Don't participate in link schemes designed to increase your site's ranking or PageRank". Honestly, what is the intent when I suggest that you link to me and concurrently I link to you? Yup, it means I boost your PageRank and you boost mine, also we chose some nice anchor text and that makes the link deal perfect. In the eyes of Google even such a tiny deal is a link scheme, because both links weren't put up for users but for search engines.

Pre-Google this kind of link deal was business as usual and considered natural, but frankly back then the links were exchanged for traffic and not for search engine love. We can rant and argue as much as we want, that will not revert the changed character of link swaps nor Google's take on manipulative links.

Consequently Google has devalued artificial reciprocal links for ages. Pretty much simplified these links nullify each other in Google's search index. That goes for tiny sins. Folks raising the concept onto larger link networks got caught too but penalized or even banned for link farming.

Obviously all kinds of link swaps are easy to detect algorithmically, even triangular link deals, three way link exchanges and whatnot. I called that plain vanilla link 'swindles', but only just recently Google has caught up with a scalable solution and seems to detect and penalize most if not all variants covering the whole search index, thanks to the search quality folks in Dublin and Zurich even overseas in whatever languages.

The knowledge that the days of free link trading are numbered was out for years before the exodus. Artificial reciprocal links as well as other linkage considered link spam by Google was and is a pet peeve of Matt's team. Google sent lots of warnings, and many sane SEOs and Webmasters heard their traffic master's voice and acted accordingly. Successful link trading just went underground leaving the great unwashed alone with their obsession about exchanging reciprocal links in the public.

Also old news is, that Google does not penalize reciprocal links in general. Google almost never penalizes a pattern or a technique. Instead they try to figure out the Webmaster's intent and judge case by case based on their findings. And yes, that's doable with algos, perhaps sometimes with a little help from humans to compile the seed, but we don't know how perfect the algo is when it comes to evaluations of intent. Natural reciprocal links are perfectly fine with Google. That applies to well maintained blogrolls too, despite the often reciprocal character of these links. Reading the link schemes page completely should make that clear.

Google defines link scheme as "[...] Link exchange and reciprocal links schemes ('Link to me and I'll link to you.') [...]". The "I link to you and vice versa" part literally addresses link trading of any kind, not a situation where I link to your compelling contents because I like a particular page, and you return the favour later on because you find my stuff somewhat useful. As Perkiset puts it "linking is now supposed to be like that well known sex act, '68? - or, you do me and I'll owe you one'" and there is truth in this analogy. Sometimes a favor will not be returned. That's the way the cookie crumbles when you're keen on Google traffic.


The fact that Google openly said that link exchange schemes designed "exclusively for the sake of cross-linking" of any kind violate their guidelines indicates that first they were sure to have invented the catchall algo, and second that they felt safe to launch it without too much collateral damage. Not everybody agrees, I quote Fantomaster's critique not only because I like his inimitably parlance:
This is essentially a theological debate: Attempting to determine any given action's (and by inference: actor's) "intention" (as in "sinning") is always bound to open a can of worms or two.

It will always have to work by conjecture, however plausible, which makes it a fundamentally tacky, unreliable and arbitrary process.

The delusion that such a task, error prone as it is even when you set the most intelligent and well informed human experts to it (vide e.g. criminal law where "intention" can make all the difference between an indictment for second or first degree murder...) can be handled definitively by mechanistic computer algorithms is arguably the most scary aspect of this inane orgy of technological hubris and naivety the likes of Google are pressing onto us.
I've seen some collateral damage already, but pragmatic Webmasters will find --respectively have found long ago-- their way to build inbound links under Google's regime.

And here is the context of Google's definition link exchanges = link schemes which makes clear that not each and every reciprocal link is evil:
[…] However, some webmasters engage in link exchange schemes and build partner pages exclusively for the sake of cross-linking, disregarding the quality of the links, the sources, and the long-term impact it will have on their sites. This is in violation of Google’s webmaster guidelines and can negatively impact your site’s ranking in search results. Examples of link schemes can include:

• Links intended to manipulate PageRank
• Links to web spammers or bad neighborhoods on the web
• Link exchange and reciprocal links schemes ('Link to me and I'll link to you.')
• Buying or selling links [...]
Again, please read the whole page.


Bear in mind that all this is Internet history, it just boiled up yesterday as the help page was discovered.


Related article: Eric Ward on reciprocal links, why they do good, and where they do bad.

Labels: , , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Tuesday, August 07, 2007

NOPREVIEW - The missing X-Robots-Tag

Google provides previews of non-HTML resources listed on their SERPs:View PDF as HTML document
These "view as text" and "view as HTML" links are pretty useful when you for example want to scan a PDF document before you clutter your machine's RAM with 30 megs of useless digital rights management (aka Adobe Reader). You can view contents even when the corresponding application is not installed, Google's transformed previews should not stuff your maiden box with unwanted malware, etcetera. However, under some circumstances it would make sound sense to have a NOPREVIEW X-Robots-Tag, but unfortunately Google forgot to introduce it yet.

Google is rightfully proud of their capability to transform various file formats to readable HTML or plain text: Adobe Portable Document Format (pdf), Adobe PostScript (ps), Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku), Lotus WordPro (lwp), MacWrite (mw), Microsoft Excel (xls), Microsoft PowerPoint (ppt), Microsoft Word (doc), Microsoft Works (wks, wps, wdb), Microsoft Write (wri), Rich Text Format (rtf), Shockwave Flash (swf), of course Text (ans, txt) plus a couple of "unrecognized" file types like XML. New formats are added from time to time.

According to Adam Lasnik currently there is no way for Webmasters to tell Google not to include the "View as HTML" option. You can try to fool Google's converters by messing up the non-HTML resource in a way that a sane parser can't interpret it. Actually, when you search a few minutes you'll find e.g. PDF files without the preview links on Google's SERPs. I wouldn't consider this attempt a bullet-proof nor future-proof tactic though, because Google is pretty intent on improving their conversion/interpretation process.

I like the previews not only because sometimes they allow me to read documents behind a login screen. That's a loophole Google should close as soon as possible. When for example PDF documents or Excel sheets are crawlable but not viewable for searchers (at least not with the second click) that's plain annoying both for the site as well as for the search engine user.

With HTML documents the Webmaster can apply a NOARCHIVE crawler directive to prevent non paying visitors from lurking via Google's cached page copies. Thanks to the newish REP header tags one can do that with non-HTML resources too, but neither NOARCHIVE nor NOSNIPPET etch away the "view-as HTML" link.

<speculation>Is the lack of a NOPREVIEW crawler directive just an oversight, or is it stuck in the pipeline because Google is working on supplemental components and concepts? Google's yet inconsistent handling of subscription content comes to mind as an ideal playground for such a robots directive in combination with a policy change.</speculation>

Anyways, there is a need for a NOPREVIEW robots tag, so why not implement it now? Thanks in advance.

Labels: , , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Wednesday, August 01, 2007

SEOs home alone - Google's nightmare

Being a single parent of three monsters at the moment brings me newish insights. I now deeply understand the pain of father Google dealing with us, and what doing the chores all day long means to Matt's gang in building 43, Dublin, and whereever. What a nightmare of a household.

If you don't suffer from an offspring plague you won't believe what sneaky and highly intelligent monsters having too much time on their tiny greedy hands will do to gain control over their environment. Outsmarting daddy is not a hobby, it's their mission, and everything in perfect order is attackable. Each of them tries to get as much attention as possible, and if nothing helps, negative attention is fine too. There's no such thing as bad traffic, err ... mindfulness.

Every rule is breakable, and there's no way to argue seriously with a cute 5 yo gal burying her 3 yo brother in the mud whilst honestly telling me that she has nothing to do with the dirty laundry because she never would touch anything hanging on the clothesline. Then my little son speaks out telling me that's all her fault, so she promises to do it never, never, never again in her whole life and even afterwards. In such a situation I've not that much options: I archive my son's paid links report, accept her reconsideration request but throttle her rankings for a while, recrawl and remove the unpurified stuff from the ... Oups ... I clear the scene with a pat on her muddy fingers, forgive all blackhatted kids involved in the scandal and just do the laundry again, writing a note to myself to improve the laundry algo in a way that muddy monsters can't touch laundered bed sheets again.

Anything not on the explicit don'ts list goes, so while I'm still stuffing the washer with muddy bed sheets I hear a weird row in the living room. Running upstairs I spot my 10 yo son and his friend playing soccer with a ball I had to fish out of a heap of broken crockery and uprooted indoor plants to confiscate it just two hours ago. Yelling that's against our well known rules and why the heck is that [...] ball in the game again I get stopped immediately by the boys. First, they just played soccer and the recent catastrophe was the result of a strictly forbidden basketball joust. I've to admit that I said they must not play basketball in the house. Second, it's my fault when I don't hide the key to the closet where I locked the confiscated ball away. Ok, enough is enough. I banned my son's friend and grounded himself for a week, took away the ball, and ran to the backyard to rescue two bitterly crying muddy dwarfs from the shed's roof. Later on, while two little monsters play games in the bath tub which I really don't want to watch too closely currently, I read a thread titled "Daddy is soooo unfair" in the house arrest forum where my son and his buddy tell the world that they didn't do anything wrong, just sheer whitehatted stuff, but I stole their toy and banned them from the playground. Sigh.

I'm exhausted. I'm supposed to deliver a script to merge a few feeds giving fresh contents, a crawlability review, and whatnot tonight, but I just wonder what else will happen when I leave the monsters alone in their beds after supper and story hour, provided I get them into their beds without a medium-size flame war. Now I understand why another daddy supplemented the family with a mom.

Labels: ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Tuesday, July 31, 2007

Handling Google's neat X-Robots-Tag - Sending REP header tags with PHP

It's a bad habit to tell the bad news first, and I'm guilty of that. Yesterday I linked to Dan Crow telling Google that the unavailable_after tag is useless IMHO. So todays post is about a great thing: REP header tags aka X-Robots-Tags, unfortunately mentioned as second news somewhat concealed in Google's announcement.

The REP is not only a theatre, it stands for Robots Exclusion Protocol (robots.txt and robots meta tag). Everything you can shove into a robots meta tag on a HTML page can now be delivered in the HTTP header for any file type:
  • INDEX|NOINDEX - Tells whether the page may be indexed or not
  • FOLLOW|NOFOLLOW - Tells whether crawlers may follow links provided on the page or not
  • ALL|NONE - ALL = INDEX, FOLLOW (default), NONE = NOINDEX, NOFOLLOW
  • NOODP - tells search engines not to use page titles and descriptions from the ODP on their SERPs.
  • NOYDIR - tells Yahoo! search not to use page titles and descriptions from the Yahoo! directory on the SERPs.
  • NOARCHIVE - Google specific, used to prevent archiving (cached page copy)
  • NOSNIPPET - Prevents Google from displaying text snippets for your page on the SERPs
  • UNAVAILABLE_AFTER: RFC 850 formatted timestamp - Removes an URL from Google's search index a day after the given date/time

So how can you serve X-Robots-Tags in the HTTP header of PDF files for example? Here is one possible procedure to explain the basics, just adapt it for your needs:

Rewrite all requests of PDF documents to a PHP script knowing wich files must be served with REP header tags. You could do an external redirect too, but this may confuse things. Put this code in your root's .htaccess:

RewriteEngine On
RewriteBase /pdf
RewriteRule ^(.*)\.pdf$ serve_pdf.php

In /pdf you store some PDF documents and serve_pdf.php:

...
$requestUri = $_SERVER['REQUEST_URI'];
...
if (stristr($requestUri, "my.pdf")) {
header('X-Robots-Tag: index, noarchive, nosnippet', TRUE);
header('Content-type: application/pdf', TRUE);
readfile('my.pdf');
exit;
}
...

This setup routes all requests of *.pdf files to /pdf/serve_pdf.php which outputs something like this header when a user agent asks for /pdf/my.pdf:

Date: Tue, 31 Jul 2007 21:41:38 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.4
X-Powered-By: PHP/4.4.4
X-Robots-Tag: index, noarchive, nosnippet
Connection: close
Transfer-Encoding: chunked
Content-Type: application/pdf

You can do that with all kind of file types. Have fun and say thanks to Google :)

Labels: , , , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Monday, July 30, 2007

Unavailable_After is totally and utterly useless

I've a lot of respect for Dan Crow, but I'm struggling with my understanding, or possible support, of the unavailable_after tag. I don't want to put my reputation for bashing such initiatives from search engines at risk, so sit back and grab your popcorn, here comes the roasting:

As a Webmaster, I did not find a single scenario where I could or even would use it. That's because I'm a greedy traffic whore. A bazillion other Webmasters are greedy too. So how the heck is Google going to sell the newish tag to the greedy masses?

Ok, from a search engine's perspective unavailable_after makes sound sense. Outdated pages bind resources, annoy searchers, and in a row of useless crap the next bad thing after an outdated page is intentional Webspam.

So convincing the great unwashed to put that thingy on their pages inviting friends and family to granny's birthday party on 25-Aug-2007 15:00:00 EST would improve search quality. Not that family blog owners care about new meta tags, RFC 850-ish date formats, or search engine algos rarely understanding that the announced party is history on Aug/26/2007. Besides there may be painful aftermaths worth submitting a desperate call for aspirins the day after in the comments, what would be news of the day after expiration. Kinda dilemma, isn't it?

Seriously, unless CMS vendors support the new tag, tiny sites and clique blogs aren't Google's target audience. This initiative addresses large sites which are responsible for a huge amount of outdated contents in Google's search index.

So what is the large site Webmaster's advantage of using the unavailable_after tag? A loss of search engine traffic. A loss of link juice gained by the expired page. And so on. Losses of any kind are not that helpful when it comes to an overdue raise nor in salary negotiations. Hence the Webmaster asks for the sack when s/he implements Google's traffic terminator.

Who cares about Google's search quality problems when it leads to traffic losses? Nobody. Caring Webmasters do the right thing anyway. And they don't need no more useless meta tags like unavailable_after. "We don't need no stinking metas" from "Another Brick in the Wall Part Web 2.0" expresses my thoughts perfectly.

So what separates the caring Webmaster from the 'ruthless traffic junky' who Google wants to implement the unavailable_after tag? The traffic junkie lets his stuff expire without telling Google about it's state, is happy that frustrated searchers click the URL from the SERPs even years after the event, and enjoys the earnings from tons of ads placed above the content minutes after the party was over. Dear Google, you can't convince this guy.

[It seems this is a post about repetitive "so whats". And I came to the point before the 4th paragraph ... wow, that's new ... and I've put a message in the title which is not even meant as link bait. Keep on reading.]


So what does the caring Webmaster do without the newish unavailable_after tag? Business as usual. Examples:


Say I run a news site where the free contents go to the subscription area after a while. I'd closely watch which search terms generate traffic, write a search engine optimized summary containing those keywords, put that on the sales pitch, and move the original article to the archives accessible to subscribers only. It's not my fault that the engines think they point to the original article after the move. When they recrawl and reindex the page my traffic will increase because my summary fits their needs more perfectly.

Say I run an auction site. Unfortunately particular auctions expire, but I'm sure that the offered products will return to my site. Hence I don't close the page, but I search my database for similar offerings and promote them under a H3 heading like "[product] (stuffed keywords) is hot" /H3 P buy [product] here: /P followed by a list of identical products for sale or similar auctions.

Say I run a poll expiring in two weeks. With Google's newish near real time indexing that's enough time to collect keywords from my stats, so the textual summary under the poll's results will attract the engines as well as visitors when the poll is closed. Also, many visitors will follow the links to related respectively new polls.


From Google's POV there's nothing wrong with my examples, because the visitor gets what s/he was searching for, and I didn't cheat. Now tell me, why should I give up these valuable sources of nicely targeted search engine traffic just to make Google happy? Rather I'd make my employer happy. Dear Google, you didn't convince me.



Update: Tanner Christensen posted a remarkable comment at Sphinn:
I'm sure there is some really great potential for the tag. It's just none of us have a need for it right now.

Take, for example, when you buy your car without a cup holder. You didn't think you would use it. But then, one day, you find yourself driving home with three cups of fruit punch and no cup holders. Doh!

I say we wait it out for a while before we really jump on any conclusions about the tag.
John Andrews was the first to report an evil use of unavailable_after.

Also, Dan Crow from Google announced a pretty neat thing in the same post: With the X-Robots-Tag you can now apply crawler directives valid in robots meta tags to non-HTML documents like PDF files or images.

Labels: , , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Saturday, July 28, 2007

Analyzing search engine rankings by human traffic

Recently I've discussed ranking checkers at several places, and I'm quite astonished that folks still see some value in ranking reports. Frankly, ranking reports are --in most cases-- a useless waste of paper and/or disk space. That does not mean that SERP positions per keyword phrase aren't interesting. They're just useless without context, that is traffic data. Converting traffic pays the bills, not sole rankings. The truth is in your traffic data.

That said, I'd like to outline a method to get a particular useful information out of raw traffic data: underestimated search terms. That's not a new attempt, and perhaps you have the reports already, but maybe you don't look at the information which is somewhat hidden in stats ordered by success, not failure. And you should be --respective employ-- a programmer to implement it.


The first step is gathering data. Create a database table to record all hits, then in a footer include or so, when the complete page got outputted already, write all data you have in that table. All data means URL, timestamp, and variables like referrer, user agent, IP, language and so on. Be a data rat, log everything you can get hold of. With dynamic sites it's easy to add page title, (product) IDs etcetera, with static sites write a tool to capture these attributes separately.

For performance reasons it makes sense to work with a raw data table, which has just a primary key, to log the requests, and normalized working tables which have lots of indexes to allow aggregations, ad hoc queries, and fast reports from different perspectives. Also think of regular purging the raw log table and historization. While transferring raw log data to the working tables in low traffic hours or on another machine you can calculate interesting attributes and add data from other sources which were not available to the logging process.

You'll need that traffic data collector anyway for a gazillion of purposes where your analytics software fails, is not precise enough, or just can't deliver a particular evaluation perspective. It's a prerequisite for the method discussed here, but don't build a monster sized cannon to chase a fly. You can gather search engine referrer data from logfiles too.


For example an interesting information is on which SERP a user clicked a link pointing to your site. Simplified you need three attributes in your working tables to store this info: search engine, search term, and SERP number. You can extract these values from the HTTP_REFERER.

http://www.google.com/search?q=keyword1+keyword2~
&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

1. "google" in the server name tells you the search engine.
2. The "q" variable's value tells you the search term "keyword1 keyword2".
3. The lack of a "start" variable tells you that the result was placed on the first SERP. The lack of a "num" variable lets you assume that the user got 10 results per SERP, so it's quite safe to say that you rank in the top 10 for this term. Actually, the number of results per page is not always extractable from the URL because it's pulled from a cookie usually, but not so many surfers change their preferences (e.g. less than 0.5% surf with 100 results according to JohnMu and my data as well). If you've got a "num" value then add 1 and divide the result by 10 to make the data comparable. If that's not precise enough you'll spot it afterwards, and you can always recalculate SERP numbers from the canned referrer.

http://www.google.co.uk/search?q=keyword1+keyword2~
&hl=en&start=10&sa=N

1. and 2. as above.
3. The "start" variable's value 10 tells you that you got a hit from the second SERP. When start=10 and there is no "num" variable, most probably the searcher got 10 results per page.

http://www.google.es/search?q=keyword1+keyword2~
&rls=com.microsoft:*&ie=UTF-8&oe=UTF-8&startIndex=~
&startPage=1

1. and 2. as above.
3. The empty "startIndex" variable and startPage=1 are useless, but the lack of "start" and "num" tells you that you've got a hit from the 1st spanish SERP.

http://www.google.ca/search?q=keyword1+keyword2~
&hl=en&rls=GGGL,GGGL:2006-30,GGGL:en&start=20~
&num=20&sa=N

1. and 2. as above.
3. num=20 tells you that the searcher views 20 results per page, and start=20 indicates the second SERP, so you rank between #21 and #40, thus the (averaged) SERP# is 3.5 (provided SERP# is not an integer in your database).

You got the idea, here is a cheat sheet and official documentation on Google's URL parameters. Analyze the URLs in your referrer logs and call them with cookies off what disables your personal search preferences, then play with the values. Do that with other search engines too.


Now a subset of your traffic data has a value in "search engine". Aggregate tuples where search engine is not NULL, then select the results for example where SERP number is lower or equal 3.99 (respectively 4), ordered by SERP number ascending, hits descending and keyword phrase, break by search engine. (Why sorted by traffic descending? You have a report of your best performing keywords already.)

The result is a list of search terms you rank for on the first 4 SERPs, beginning with keywords you've probably not optimized for. At least you didn't optimize the snippet to improve CTR, so your ranking doesn't generate a reasonable amount of traffic. Before you study the report, throw away your site owner hat and try to think like a consumer. Sometimes those make use of a vocabulary you didn't think of before.

Research promising keywords, and decide whether you want to push, bury or ignore them. Why bury? Well, in some cases you just don't want to rank for a particular search term, [your product sucks] being just one example. If the ranking is fine, the search term smells somewhat lucrative, and just the snippet sucks in a particular search query's context, enhance your SERP listing.

Every once in a while you'll discover a search term making a killing for your competitors whilst you never spotted it because your stats package reports only the best 500 monthly referrers or so. Also, you'll get the most out of your rankings by optimizing their SERP CTRs.


Be crative, over time your traffic database becomes more and more valuable, allowing other unconventional and/or site specific reports which off-the-shelf analytics software usually does not deliver. Most probably your competitors use standard analytics software, individually developed algos and reports can make a difference. That does not mean you should throw away your analytics software to reinvent the wheel. However, once you're used to self developed analytic tools you'll think of more interesting methods not only to analyse and monitor rankings by human traffic than you can implement in this century ;)


Bear in mind that the method outlined above does not and cannot replace serious keyword research.


Another --very popular-- approach to get this info would be automated ranking checks mashed up with hits by keyword phrase. Unfortunately, Google and other engines do not permit automated queries for the purpose of ranking checks, and this method works with preselected keywords, that means you don't find (all) search terms created by users. Even when you compile your ranking checker's keyword lists via various keyword research tools, you'll still miss out on some interesting keywords in your seed list.


Related thoughts: Why regular and automated ranking checks are necessary when you operate seasonal sites by Donna

Labels: , , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Friday, July 27, 2007

Rediscover Google's free ranking checker!

Nowadays we're searching via toolbar, personalized homepage, or in the browser address bar by typing in "google" to get the search box, typing in a search query using "I feel lucky" functionality, or -my favorite- typing in google.com/search?q=free+pizza+service+nearby.

Old fashioned, uncluttered and nevertheless sexy user interfaces are forgotten, and pretty much disliked due to the lack of nifty rounded corners. Luckily Google still maintains them. Look at this beautiful SERP:
Google's free ranking checker
It's free of personalized search, wonderful uncluttered because the snippets appear as tooltip only, results are nicely numbered from 1 to 1,000 on just 10 awesome fast loading pages, and when I've visited my URLs before I spot my purple rankings quickly.

http://google.com/ie?num=100&q=keyword1+keyword2 is an ideal free ranking checker. It supports &filter=0 and other URL parameters, so it's a perfect tool when I need to lookup particular search terms.

Mass ranking checks are totally and utterly useless, at least for the average site, and penalized by Google. Well, I can think of ways to semi-automate a couple queries, but honestly, I almost never need that. Providing fully automated ranking reports to clients gave SEO services a more or less well deserved snake oil reputation, because nice rankings for preselected keywords may be great ego food, but they don't pay the bills. I admit that with some setups automated mass ranking checks make sense, but those are off-topic here.

By the way, Google's query stats are a pretty useful resource too.

Labels: , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

Wednesday, July 25, 2007

Blogger to rule search engine visibility?

Via Google's Webmaster Forum I found this curiosity:
http://www.stockweb.blogspot.com/robots.txt
User-agent: *
Disallow: /search
Disallow: /
A standard robots.txt at *.blogspot.com looks different:
User-agent: *
Disallow: /search
Sitemap: http://*.blogspot.com/feeds/posts/default?orderby=updated

According to the blogger the blog is not private, what would explain the crawler blocking:
It is a public blog. In the past it had a standard robots.txt, but 10 days ago it changed to "Disallow: /"

Copyscape thinks that the blog in question shares a fair amount of content with other Web pages. So does blog search:
http://stockweb.blogspot.com/2007/07/ukraine-stock-index-pfts-gained-97-ytd.html
has a duplicate, posted by the same author, at
http://business-house.net/nokia-nok-gains-from-n-series-smart-phones/,
http://stockweb.blogspot.com/2007/07/prague-energy-exchange-starts-trading.html
is reprinted at
http://business-house.net/prague-energy-exchange-starts-trading-tomorrow/
and so on. Probably a further investigation would reveal more duplicated contents.

It's understandable that Blogger is not interested in wasting Google's resources by letting Ms. Googlebot crawl the same contents from different sources. But why do they block other search engines too? And why do they block the source (the posts reprinted at business-house.net state "Originally posted at [blogspot URL]")?

Is this really censorship, or just a software glitch, or is it all the blogger's fault?


Update 07/26/2007: The robots.txt reverted to standard contents for unknown reasons. However, with a shabby link neigborhood as expressed in the blog's footer I doubt the crawlers will enjoy their visits. At least the indexers will consider this sort of spider fodder nauseous.

Labels: , , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->