Your search keyword data has sample bias
Those of you who have been following this blog since its inception a couple of weeks ago will have noted that one of my first posts was on (not provided). Fortunately Adobe has conceded to the wishes of the masses and has now put its own version into the highly popular SiteCatalyst tool. From Tuesday 8th November you will start seeing ‘Keyword Unavailable’ in your reports (although, personally, it would have made most sense to stick with the same as Google, but whatever).
But the real story is that SEOers are going to be livid when they start seeing graphs that look like this:
This is a graph from one of our clients that shows the volume of (not provided) that they are getting. It has ramped up significantly in the last couple of days and is now about 1% of all visits, however Search Engine land is seeing figures of up from 7% – 14%. This may even continue to rise.
So what do you do about it? Well there isn’t much you can do about it, but you should take into consideration when you do your analysis.
Web Analytics data has always been a sample, rather than a full data set. Not every visit and page view is counted and many are even filtered out. It may be a very good sample, but a sample it is none the less. Given that it is a sample, you can look at the sort of people who you are not counting: People who block javascript from running, people on filtered IP addresses, people who navigate off the page before the code has finished loading, etc. The assumption we always make (rightly or wrongly) is that these people will have typical browsing behaviour, so we don’t have any data bias because of it (or if you do, it isn’t a large percentage, so you can ignore it).
People who are logged into Google and using secure searching now appearing in (not provided) means that you have an additional filtering on one of your reports. You are not looking at a sample of the sample. 90% data from that sample is still a high percentage but you should note that this sample isn’t necessarily unbiased.
People who use Google Plus for example haven’t really had that extensive profiling, but any that has done has found that there is a high propensity to be technologically minded, probably younger and more affluent, whereas those that use Gmail (another way you could be logged in) are more likely to be female, young and wealthier.
These things need to be taken into consideration now – your sample bias is increased. However going back to the original question, there isn’t really anything you can do about the data other than accept it may be biased. What you can do is let that affect your decisions and your actions because of the analysis of the data (and I don’t mean ignore it because it is wrong).
As an example, say you have seen a recent trend towards less technical, more mainstream search terms arriving at the site. That may be be due to biased results, so you don’t necessarily decide to optimise solely for the mainstream terms, you keep optimising for both sets.




Pingback: Google paid search becomes personalised – Digital Transparency – powered by Adversitement | Digital Transparency
Pingback: How accurate is your data? – Digital Transparency – powered by Adversitement | Digital Transparency