In a manner of speaking semantics won’t do (has been: Speculation in Web Analytics)
Thursday, September 30th, 2010– Summary –
“Speculation in Web Analytics” is our original topic. This article deals with the growing limitations in clickstream data capturing and analysis, which seem to be rooted primarily in the semantic dimension of the whole Analytics discipline: Certain technical patterns are to be interpreted as equivalent for particular real-world events and entities.
The first part of the article questions some of the essential assumptions on what could make “a page view” or “a visitor”, and explicates the fundamental challenge of semantic data richness (or lack of).
Originally featuring only a side track on the importance of bijective “real-person” attributions, the particular challenges in the anonymous nature of the clickstream data are explicated next with relations to password-protected “walled gardens” (the used example for that is Facebook and its “Facebook Insights” service) which slightly twists the view on Analytics as we’ve practiced it for a decade now.
As the much loved band Tuxedomoon has put it earlier: “In a manner of speaking semantics won’t do” (hence the title of this post).
Thus the last part of this post is focusing on the syntactical side of things and the levels of analysis derived from that: how to attribute the presence of particular events within a sequence and what can go wrong here as our tools are increasingly unaware of their unawareness of disturbing influences happening online and offline.
– Here’s the full-length 12″ version of the article –
Although the idea sounds counter-intuitive at first: Web Analytics involves as much speculation as any other serious attempt of pattern recognition (as, for example, horoscopes, or the SETI program). “Data accuracy” has been a much discussed topic amongst Web Analysts for more than a decade now. And “distrusting the data” has become a commodity amongst business stakeholders around the world.
The reason for this is easy to imagine and roots in the deeply semantic (for some: semiotic) foundation and interwoven structure of the whole Analytics discipline: “A page view” in your favourite Web Analytics tool is never directly observed, but is the result of a technical attribution of the form: if a request is sent in a particular format to a specific server and the request is recognized in a particular manner, associated with a specific account and written to a dedicated data base – that makes a page view. Pretty much everything in that attribution chain could possible go wrong, aye?
To make matters worse: we all know that “a visitor” is just the result from yet another attribution: an http request is coming from a specific browser on a particular computer – and even if this event is recognized as a recurring pattern from that same browser, you still can’t be sure that it is in fact the same person operating the computer as during the earlier operations.
As everybody has had to learn – in particular those who lightheartedly have created the meme of “multi-device ownership” to increase their sales volumes:
These days, the “personal computer” (that goes for all other computing devices like abacuses and pocket calculators as well) is becoming less and less personal. In fact: at work you may use one machine, at home you may use yet another (if your spouse or pet is not occupying this particular machine for their very own purposes). In between you may use a mobile device for checking one or two things from the web while being on the bus – and in all cases a different visitor is recorded to any technical system involved in tracking clickstream data.
Only if you (as a user) are inclined to recurring patterns on repeating machines, you can make analysts and marketing managers happy, as they get recurring events from the same machines to analyze. This, and only this, can make a profile and a pattern. Yet it has nothing to do with the person behind the request.
For this precise reason smart people have thought of alternative service building mechanisms: walled gardens (or: communities), secured by a login, have become one way of taming the beast; focusing on the smaller and less complex “visit” entities instead of “visitors”, has been another. Let’s take a closer look at both.
These walled gardens and communities, secured by logins, are justified by their service core proposition as “personal services”. They are utilizing a specific value proposition which presuppose personalization. This, of course, excludes all those who don’t want to go through any signup process but just want to see what is so cool about the “social media thing”. You can’t really peak over the fences of social networks these days.
In return, social network sites makes certain pieces of anonymized data exploitable for analysts, as you can see from services like “Facebook Insights”: As an analyst you get some superficial performance metrics (such as: page views. Yay!), combined with some superficial demographic data (that is: gender and age group. Yieppie!). They (Facebook) have just set up a new dashboard format for Facebook Insights which allows to see aggregated counts for wall posts, uploads, and certain other engagement metrics on a Facebook fan page (FBFP). Facebook can track and display anonymized user activities across different tabs, yes, and therefore their service could potentially qualify for becoming an alternative to your regular average boring company home page by now.
The interesting thing with this is that on an operational level you still are dealing with people inside a community, while the anonymity of the web visit partly disappears (the “social” thing you can’t have without authentication. Not even, if you are using Snoobi, as they are limited by the very same borders of the technical attribution logic as explicated above).
People seeking to contact you within such a walled garden have to grant you certain access to their own contact data (that is pretty similar to a contact form) – but the new thing is that a second-order layer of contacts is created (friends of friends – as in LinkedIn) and that posts from you as a FBPF owner will appear in profiles of your primary contacts from now on.
The Facebook Insights dashboard is allowing you to follow genuinely anonymous amplification metrics on your contents (“Likes”, “Comments”) which you can simply use for monitoring the contents you are putting to your FBFP – but you can as well gain data from externally embedded “Like” buttons on your corporate web site (even copy-paste ready code is provided for that).
With a bit of tweaking you can make your complete Facebook presence trackable with tools like Google Analytics these days. The tracking goes amazingly deep (and of course I had to become a Facebook fan of the company (Web Digi) drilling into this topic and providing the according code sniplets).
However: any insight needs context, and that is yet lacking from the fenced community data in Facebook Insights. Although I much appreciate a timeline view on the increase of “fans” from a certain age group in a certain gender I surely would appreciate not having it as a stacked line graph from where I need to handpick and note down the timely developments per group.
Instead, I would actually love it to see a service retention clustering with it: are all of my fans actually freshmen (and: ~women!) to the service, or are they hangarounds for years by now? This way I could figure out whether the activity patterns are attributable to giggling fresh signups, keen on posting all sorts of unrelated crap on my FB page – or whether I am dealing with an audience which has an inventory sticker on their forehead already and which are gently, but quietly, appreciating posts and topics.
I would consider the latter behaviour much more appropriate for mature social network users. This participation mode was called “lurking” with regard to prevailing newbie behaviour in the discussion boards and IRCs in the Nineties. But contrary to the historic interpretation of this habit as a preceding step to participation, I guess these days this behaviour goes more for people who are beyond participation. Not exactly “retards”, but people who have given up on the idea that their active participation could make any difference.
The way analytics and insights are utilized on the Facebook Insights service resembles (yet) generic focus on the “Reach” dimension: “Do I reach the right audience?” Well: How could I know? If my target group is the infamous flock of over-eighty-old-homosexual-males-living-in-tents-in-Iceland I can only get two attributes out of five from these Insights statistics at this point.
I just may have a lot of young female fans who dig and appreciate eighty-year-old-homosexual-males-living-in-tents-in-Ireland. And they may ring up their grandpas having moved to Iceland after their coming-out. How could I know?
For the rest of the analysis currently possible I am thrown back to my own imagination and a friendly consultant would probably advise me to “generate more engaging content that will lift up the users’ participation. Y’know: makes them click”, before charging me EUR 5.000 for “social media guruship” with a cold, cold smile.
Bollocks!
Rather looking at the other last Analytics resort, the “visits“, then.
We can concede that there is one ineradicable assumption about visits: visits are said to have a purpose and a goal (telos). They are performed by a complex entity called “a visitor” for which we have seen how damn hard it is to track them. Visits appear to be simpler, and one visitor can perform a lot of visits. With regard to being a sense-making entity, a visit might currently be the better bet.
Well – indeed this could be true. But more and more often I come to witness: all of my colleagues (in the – duh – agency) have at least eight visits on different web sites going on at the same time. The phenomenon is called “tabbed browsing”, and the effect on Web Analytics data interpretation is disastrous.
As a result we see more and more visits, lasting shorter and shorter across all the click stream data we are collecting. For blogs like this one it is quite clear (it has no RSS feed): people come to the main page, check for new posts, find none, and leave. High bounce rates, short time on site (particularly amongst repeated visitors), all clear, case closed.
For plenty of other visits on other sites a load of more complex things happens: Opening two links in new tabs from the main page upon arrival on the site, browsing to the “Products” pages in one tab, to the “news” section in another, playing with the products here, closing this tab then after ten minutes and looking at the news articles for which a view has been opened in another tab ten minutes ago.
We can leave out all the countless visits expiring from unattended browser tabs altogether, but the poor analyst who has to analyze the user path for such an improperly sessionized visit! My oh my!
Minutes and minutes of “time on page” while the user was busy updating their pet’s Facebook status in yet another browser tab (we traditionally have had to interpret that as “careful reading” and genuine interest in the website contents for this particular page that we have so carefully crafted four year ago), occasional erratic “open in new tab”/”close tab” actions from users across a session (a reminder: these actions are taking place within the browser itself. Contemporary web analytics tools are blind to actions like that!), and any occasional call from mum for eighteen minutes (“You haven’t been visiting me for ages! And you never call back” – “Yes, mum… No mum… sure, mum”. Things like that…), which distracts the user completely from doing any browsing. No matter how clear the visit goal was originally, we can’t truly expect to reconstruct the visit’s goal from such a fragmented session log.
Most people today are still said to use the computer not in the aforesaid manner.
I answer: what would prevent them from behaving erratically? A flatrate has become a commodity these days. Mobile phones are all around us. Tabbed browsing has become a habit. Earlier I had to pay my online activities by the minute (as I was using the phone line for being online in the pre-broadband and pre-mobile phone age, mum could not even call me as the line was busy, occupied by my Rockwell 2400 bps modem. So: no distraction from anything except the telly!). Today (particularly since I am using a Mac), I simply close the computer’s lid after having checked something small from a site for just two minutes. And it may happen that I open the lid once more an hour later to look at some other tiny detail on the same site. “Well – that’s just you. You are a geek!”, you say.
But a matter of fact is, actually: access ubiquity makes geeks!
A rather ideal situation (compared to the analyst’s nightmare I’ve just coloured before) would be that somebody with a goal sits in front of a computer and focuses entirely on reaching that goal, no matter what.
This data is collected together with all the other data within our favourite tools. And we could indeed assume that visits which have led to a conversion (that could be: a purchase, a download, or a sign-up) are motivated by purposes and aim at reaching that goal. Fair enough!
Look at that data for an hour. It tells you a lot about people’s determination, about how deep you have hidden contents that you thought nobody would ever look at, it tells you about how many page views were needed before the deed was done and the deal was closed. It tells you about how fast your site’s checkout process is to grasp and to go through.
But how could you utilize that data for identifying the underlying reasons for people NOT buying from your web shop? Which, at average conversion rates between two to three per cent, still is a major concern for most site operators.
Or: How could you figure out whether the goal-reaching experience was a pleasant one? Or: How could you determine whether the reason for an interrupted visit was mere user frustration or a call from mum?
From looking at clickstream data: you can’t. For the sake of sanity: don’t even try to!
Today’s filtering criteria for and richness of our clickstream data is not sufficiently mapping what we would need to know: I want to get all sessions which contained a user’s inactivity period (that is: no clicks recorded!) of at least five minutes. I want to compare that to visits where the inactivity period was at least ten minutes. What I wanna know is: Is there a correlation between the length of inactivity and the patterns in visit conclusion? Is there an accumulation of the same non-action pages across different visits? And: Would this pattern probably depend on the nature of the disturbance (i.e. mum calling, Scarlett Johansson or the Chippendales walking past the computer)?
Well – supposedly all of that. As the disturbances are not recorded – we can’t know.
And even if the Chippendales, mum, and Scarlett Johansson could leave a cookie behind: we couldn’t see the users’ point of frustration, or defection from the service altogether – they may just as well have continued their browsing session after having been on the phone with mum for 30 minutes and 40 seconds – which would make this particular visit technically a new visit due to the industry standard of the 30 minutes visit timeout. And be warned: In your favourite web analytics tool you may see this particular visit start from a very strange, deep entry page, too.
We are not yet there. But the growing fragmentation of our data sources gives me a hard time, occasionally. Visits are getting shorter – that seems to be a genuine trend – and the process of making a purchase decision seems to be less and less observable these days.
The true consequence of these thoughts remains somewhat unclear to me. It seems as if the principle of telos is no longer applicable to once well anticipated sense-making entities in Web Analytics (like “visitors”, or even “visits”).
The “moving target” paradigm which has threatened marketing for over three decades now has undoubtedly led to fragmented usage patterns where an inherent pattern of sense-making is no longer clearly attributable to agents outside the medium itself.
The clickstream data still has ’nuff of clear and distinct goal seeking and goal reaching patterns – but these structures are on retreat, and (as so often) the eigenvalues will take over sooner or later. “We return to the icon.” (Marshall McLuhan)
When a customer next asks me: “Yes – but is the data accurate?” I may soon have no other answer than: “OK – let’s pretend that matters.”
