I see many users of graphical dashboard tools using built-in Google Analytics connectors (like Power BI, QlikView, and Tableau) having trouble understanding what the numbers really mean – and why they differ from the Google Analytics dashboard.
Maybe you’ve had the same frustrating experience as many others:
- A user of Tableau reports a “mismatch” between GA and Tableau
- A user of Power BI complains that his GA data “don’t make sense”
- A user of QlikView wonders why his GA data “is not matching”
- A user of Power BI reports that his imported GA data “is wrong”
If you have observed something similar, or just if you’re wondering what the heck is going on here, then please read on.
I’m going to settle once and for all why GA connectors are so confusing, and why you should probably avoid them altogether.
The core of the problem
The reason why Power BI, QlikView, and Tableau have become so popular in recent years is not only because they are superior at visual reporting, but also because they use so-called “in-memory databases” which work blazingly fast, giving the user an exploration tool with sub-second response time.
For these in-memory databases to work as intended, data-sets should be as “original” or “raw” as possible. No grouping, aggregating, or filtering. Just the events, line-by-line, imported from your sales tables, hit tables, etc. This gives you maximum freedom of analysis while the in-memory database will happily crunch the growing volume of data.
The problem? GA connectors do not deliver raw data!
Instead, they import GA metrics that are already aggregated (summed, counted, calculated, etc.). And while this works fine in a few cases, users are potentially misled to think these metrics are always reliable to use in their dashboard tool.
They are NOT!
You need the non-aggregated data behind these numbers to avoid faulty double aggregations.
Let me explain.
Beware of double aggregation!
Consider this overview of how two people, let’s call them John and Carla, visit a website during a week. While Carla visits only once, John makes four visits on four different days, as seen in the table below:
|Day||Number of Users||The Names of Users|
Now ask yourself, “What is the total number of users during this week?” Looking at the names of users in the rightmost column, the total is obviously only two: John and Carla.
However, this is not the picture we get from GA connectors. The GA connector only shows the other columns: Day and Number of Users (or “Users per Day”). If we try to roll them up to the week, the result is five (2 + 1 + 0 + 0 + 0 + 1 + 1 = 5).
Do you see what happened here? Summing up users across days leads to the wrong result – in this case, five instead of two!
Why is that?
Semi-additive metrics and distinct count
Here’s the thing, Users is an example of what database geeks (like me) call semi-additive metrics.
It’s a metric you can sum meaningfully across some dimensions, but not all, and the time dimension is not one of them.
The technical explanation for this is that Users per Day constitutes a data stream with repeated elements (i.e., a user may repeatedly appear during the week, like John). Calculating users across days will, therefore, most likely result in multiple counts of the same user, leading to a meaningless total such as the final number 5 in the table above.
The way to handle data streams with repeated elements is to use distinct count instead of sum.
This is what Google Analytics does behind the scenes when it returns a query involving users, and that’s why you will always see the correct totals in Google Analytics.
The problem is, that distinct count is only possible if you have access to the raw data (e.g., cookie IDs of users or, in my example, the names in the table above). You can’t derive distinct count from a table which is already aggregated. Yet this is exactly what users of Power BI, Tableau, QlikView, etc., are led to believe when they use the tools’ built-in GA connectors.
And that’s the reason they get so confused and frustrated!
Is there a workaround?
“Okay,” you say, “But can’t I just avoid importing Users per Day? Can’t I just import the totals I need directly from Google Analytics?”
Well, yes, you can – but you’ll be severely limited in what you can then do in your dashboard.
Obviously, you wouldn’t be able to add a time-period selector in your dashboard, which can be annoying to end-users. But you would also have to be careful about other types of filters, since Users is a non-additive metric not just across time, but across several other dimensions as well.
For example, you can’t sum Users meaningfully across traffic sources such as your ad campaigns, emails, organic traffic, etc. You also can’t sum Users across landing pages or other types of URLs.
And mind you, Users is not the only semi- or non-additive metric in Google Analytics. Sessions is another example, as well as all calculated rates and ratios such as your Goal Conversion Rate, Bounce Rate, Average Session Duration, Average Order Value, and so on.
So, yes, you can try to circumvent the problem of semi- and non-additive metrics in your dashboard, but it would prevent you from taking full advantage of its features to the point where it probably becomes useless.
Un-aggregate your GA data with Scitylana
The frustrating experience many people have using GA connectors in dashboards is one reason we created Scitylana.
This tool lets you extract data from your free Google Analytics account, transforming them into a raw, hit-level form. This is the perfect format to have when you load data into Power BI, Tableau, QlikView, Targit, and others.
It gives you maximum freedom to analyze, visualize, and integrate GA data with other data sources. It enables you to slice and dice the data and to use whichever dimension you like as a filter.
With Scitylana you never have to worry about semi-additive metrics or distinct count.
The best thing? Signing up is free, and you can do it right now in just a couple of seconds: www.scitylana.com