published: 2008/09/17. tags: seo, search's long tail, architecture, product catalog

Taming The Search's Long Tail

Survival of the fittest; It can shrink over time!

When Cuil announced that it indexed more than 120 billion pages using more efficient architecture, just before or may be after Google said they have hit the trillion page mark (1000 billion). Further, Google also announced earlier about indexing the deep web. Perhaps that is the reason for the trillion pages but even otherwise, with so many people blogging, twittering and who knows how many new verbs would be invented in the near future especially when the web 2.0 cathes up with work 2.0 (what the heck am I talking about? Huh, it's talk 2.0) it wouldn't be a surprise that we eventually end up with a Googol pages.

So, as the content increases, it gets harder and harder to be showing up in the SERPs. With the online search inching towards Google's monopoly, and it's ever changing algorithms to keep the PageRank manipulations in check, SEO has become a full time job for the large companies. If you follow Amazon's online activity like launching separate websites for sub categories like endless.com for handbags and shoes and amazonfresh.com for, well, fresh produce you would know what the stakes are and how the big players are retaining and expanding their online turf. Obviously, if you are a small company with limited budget for SEO activity, you should spend time not only in making people visit your website but also to ensure that those who visited your website stay there and get the information they need. The later is actually completely in your hands.

For websites that are not so popular (pagerank of 3 or lower), most visits to the website through search engines are going to be due to people searching for several keywords. Like searching for "14k yellow gold charm with diamonds", which contains 6 words and if you remove the common word 'with', 5. Sometimes, it could be as high as 10 or more. It's mainly because some people like to Ask the search engines as if they are asking their friend or may be simply because they are cutting and pasting a phrase from their failed compilation error or a product name for comparison shopping. Whatever be the reason, when people search for such long phrases, the probability of most of those keywords appearing on a given website decreases. Did you ever wanted to appear on the frontpage of a major news paper? Well, it may be possible for the wrong reasons but is difficult. But if you ever dreamt of showing up in the first place on Google's search results page, all you have to do is create a stupid combination of words that never ever would exist otherwise and wait and watch till your site is indexed. Obviously, only you will be searching, but it is possible to show up first, isn't it? :). Anyway, you get the point, that when there are billions of searches conducted each day, you can expect certain type of statistical distribution to the number of searches containing only 1 word, 2 words, 3 words and so on. I don't know what such a distribution would look like, but I would imagine that it would taper off rapidly and be asymptotic, so perhaps, it's a poisson distribution.

As is the people searching for such long phrases and ending up at your website is very little, if you can't offer them what they are looking for, then it is going to be pretty hard to make any further progress with the website. In addition, temporality of the content and the noise due to aggregation is going to make it even harder for the end users to find what they are looking for. These two are explained below

So, Search's long tail, while it offers a relatively easy way to obtain search engine traffic, unless you know how to tame it, that traffic would be pretty much useless. Note that the goal is not to optimize a page for receiving the Search's long tail traffic. The ultimate goal is to be placed on SERP for the main keywords of your business (or whatever purpose the website is being optimized for). However, the intent of this article is to talk about how the long tail traffic can be engaged on your website and hopefully convert the lead to a customer.

So how to engage these users? The context is the key. What if you know what exact set of keywords the users have searched on the search engine before visiting your website? If you have this information, then instead of showing the page as is, you could analyze the keywords and show the most relavant information. I know that it is not a good practice to present a search engine with some content and the end users with a different content. This is frowned up on. But trust me, if you have the temporality and the aggregate noise issue, you have no choice but to do this. Doing this only makes the search engine look smart, that it brought the user to a more appropriate content.

So how do we obtain the context? Well, that's where the HTTP_REFERER variable of a cgi-script comes into handy. This variable returns you the URL from which the current page has been visited. So, when a user does a search, the search results url typically contains the search keywords used by the user. The http referer returns this search results url and you can extract the keywords from it. For example, below is a snippet of perl code on how to extract the search keywords from the referer.


  if($ENV{HTTP_REFERER} =~ /google[.]com.*?[?&]q=([^&]+)/) {
    $search = $1;
  }

While there are plenty of search engines, you can just pick the top 2 or 3 and code it for them. So, it is not really so much of a daunting task. Note that it is possible to change the settings in some browsers like Firefox such that the referer information is not sent by the browser. If a visitor with such a setting visits your website from a search engine, then you don't have the context and there is not much that can be done in that case but to show the page as is.

Once the context is obtained, then these keywords can be used to query up your database for the blog post or the product or whatever information that is dynamic on your website. You would need to make sure that your information retrieval algorithm is robust with a decent full text search like behavior that can return the most relevant results first.

GiftPickr makes use of the above concepts and tries to display the most relevant gifts to the user visiting through a search engine based on the search keywords.

© 2008 Dirisala.Net/articles/