Jan 19 2007

Statistiko

Web statsYou know, statistics are a really interesting thing. They can be completely useless when used incorrectly, or really powerful if you know how to interpret them properly. Half the battle is collecting the data in the first place but of course, computers offer a streamline way of obtaining statistics - especially on the web.

So “Why is James ranting about statistics?”, you might be thinking. Well it’s simple, I use them all the time on this site. It’s how I know who’s looking at the page, what parts of the site are most visited and who finds what articles more interesting than others. Now I’ve always had statistics, but in the latest revision of the site, I decided to bump it up a notch and I have more information than ever before.

So how does it work?

Well the problem is multi-faceted, so it requires a robust solution. But basically I collect data from a few different sources, and combine it - all in aid of reducing errors, while allowing for meaningul analysis. The main sources of statistical data are:

  • Web server logs
  • User cookies
  • Search query logging
  • Real-time Ajax and Java logging

It all sounds quite complex, but it’s not as bad as you might think. The scary thing is, I hardly changed anything in WordPress to get up and running. In fact, the user cookies are built right in, and I simply read them into PHP on each page to determine which user is looking at which article or image in the gallery. When you combine each of these elements, the data you can obtain is quite remarkable.

Hits vs Visits, Total vs Unique

The real trick with web tracking is how you determine how many people make it to your site and there are a few different ways of doing this. A “hit” is when someone makes it to any page on your site. So if I go to a page, and then two more pages after that, I’d register three hits even though I’m only one person. A “visit” ignores the number of hits and counts the number of individual people who arrive at the site. You can then take it one step further and count unique visitors. A unique visitor ignores multiple visits by the same person over a given period. For instance I could visit a site five times in a week, but over that week, I am only counted as one unique visitor, since I’m technically the same person returning multiple times.

Websites often misrepresent themselves by counting hits as visits as this skews the results considerably. For this reason, I only track total and unique visits over various time periods (past hour, day, week, month and year). The differences are only really visible when you have a lot of data to refer to, for instance in the past two weeks, I’ve clocked up 585 visits, and 74 unique visits, which represents some 60,000+ hits.

But wait, there’s more!

If you thought that was useful, I even count really obscure things that help me make design decisions about the site. Things like…

  • How wide your browser window is when you load the page
  • How you get to the site
  • Where you go when you leave
  • What browser you are using

The disconcerting part about this whole arrangment, is the fact that all this information is readily taken from you without you even realising as you innocently browse the web. While you might take offence to your session being tracked on a website, the information it provides helps build websites that better cater to the needs of their users. It just goes to show, big brother is always watching…