Ferreting out spam in Google Analytics
Okay, confession time. I only went with this blog title so I could include a picture of a ferret. So, there it is…
Now that the important business is out of the way, on to the Spam.
Occasionally, you might find that there are visits to your site that are spam, and you will want to take those out before doing any analysis. It can be tricky to find the spam though, so this post will take you through a couple main things to look at when determining what to filter out.
Time on site
You want to watch out for time on site that is one second or less. Generally, when a robot hits a site it stays on only long enough to be counted and then leaves. This means that in your analytics, they will show up as 00:00:00 or 00:00:01 for the most part.
Because robots usually just hit one page and then leave, you want to look for bounce rates of 100%.
Finding the source
Finding the source of your spam can me a very difficult and frustrating process. There are some places that you can always go to look, which are outlined below, but you might need to do some searching of your own as well. Here are some common places where we have found spam.
One of the first places we tend to look for spam in Google Analytics is by region. Select Audience -> Demographics -> Location and then add a secondary dimension of region. At a glance, you should be able to see if something looks off.
If you know that you average about 3,000 visits from the United States in a week and when you glance at your regions and you see 3,000 visits from Ohio alone, you are probably looking at spam from that area.
Filtering down to Browser, under the Audience -> Technology Section, you can add Browser Version as a secondary dimension. As with region, this can usually pretty quickly call out anything that is exceedingly high.
Of course, these are just two areas where we have noticed spam hides, there could be more or different dimensions in which your own spam is appearing, so you will want to make sure you do a through search of your own analytics.
After you have determined where you think the spam is hitting, or from which browser version, you should set up a filter to include only that dimension for further analysis. Once you have done that, you want to look into a few more dimensions to determine which is the culprit so that you do not filter out any good data inadvertently.
A good place to start is with the service provider. This can be found under Audience -> Technology -> Network. We have found that this is frequently the source of bad data. You might also try browser version and region again, or move on to something else, like keyword.
No matter what you find, you are going to have to do a lot of manual work to make sure that you are only filtering spam. This can be a long, difficult process, but clean data is worth it in the end, and hopefully these tips will get you started!