Tap into the power of segmentation with hit-level Google Analytics data!

Success in marketing is all about matching the right content with the right audience. You need segmentation for this.

Segmentation is what allows you to zoom in on your target group. It allows you to identify, then target, high-performing users whilst at the same time improving or removing low performers.

Measuring the total conversion rate is a start, but you should also break down your data to see if some users convert more than others.

For example, if you find that desktop users convert more than mobile users, you could adjust your campaign target-audience accordingly or — even better — make your website more mobile-friendly.

If you optimize systematically across many user groups (geographical location, content viewed, visit frequency, previous purchases, basket abandonment, etc.), imagine the effect on your overall campaign effectiveness and conversion!

Segmentation with hit-level data

While it is possible to segment users in Google Analytics, the data will often be sampled and, therefore, not necessarily accurate. In addition, there are a number of limitations to custom segments in GA such as a maximum date range of 90 days.

Fortunately, it is possible to avoid all of this!

Simply signup for and implement SCITYLANA (it’s free), pull out data from your GA account and transform it into a raw, hit-level form. Next, load the data into a self-service BI tool such as Power BI, Excel, Qlik, Tableau or Data Studio / BigQuery.

As it turns out, segmentation becomes easier, more powerful and more actionable once data is loaded into one of these tools.

In the rest of this post, I will walk you through how segmentation works in Power BI. You can use any of the tools, of course, but Power BI is free, easy to use and has great segmentation features.

Custom segments in Power BI

If you haven’t done so already, go ahead and download our free Power BI template for GA data extracted through SCITYLANA.

If you are not a SCITYLANA user, you can download a Power BI Desktop file with demo data here (requires Power BI Desktop).

Here is an overview of the main segmentation options once you have loaded your data into the template:

  • Date range selectors
  • Slicers
  • Multi-level filters
  • Calculated columns

SegmentsFilters

Date range selectors

You can add as many slicers as you like directly to your dashboard (i.e. none of these limitations), and you can choose between different types of slicer, including date range selectors.

Simply select the Slicer icon, then drag & drop the relevant field onto the Slicer visual.

SegmentsDate

If you drag the Date field onto the visual, Power BI will automatically turn the slicer into a date range selector. You can switch between different types such as relative period, fixed period or lists with weeks, months, quaters and years.

SegmentsDateVersions

Dimension slicers

Suppose you want to zoom in on visitors who entered your site through Paid Search, used a mobile or tablet device to browse your content, was situated in Copenhagen and had never visited your site before.

Would that be possible…? Absolutely!

Simply pull in the dimensions you need, turn them into slicers and select the relevant values.

SegmentsDimensions

Notice the logic behind these selections. Values selected across slicers are joined with AND logic (e.g. “Display” AND “Copenhagen”). Values selected within the same slicer are joined with OR logic (e.g. “mobile” OR “tablet”).

Multi-level filters

Multi-level filters work basically the same way as slicers. The difference is you can specify which report level they should affect. To do so, pull in the filter dimension to either the visual-level, page-level or report-level area under FILTERS:

SegmentMultiLevel

To some extent, you can also define filtering levels for date range selectors and slicers. By default a slicer will affect the entire page – however,  you can overrule this setting by clicking Edit interactions of the Home ribbon (learn more here).

Calculated columns

Suppose you want to build a new variable, which doesn’t already exist in GA. Suppose you want to recode the deviceCategory dimension so that it only has two values instead of three:

SegmentsRecode

You can do this by adding a conditional column in Power BI using DAX in much the same way as you would do in Excel.

Simply right-click on the Scitylana table under FIELDS and select New column:

SegmentsNewCol

Now the DAX formula bar will appear:

SegmentsFormulaBar

Enter the following text:

deviceCategory NEW =
IF ( Scitylana[deviceCategory] = "desktop", "desktop", "mobile/tablet" )

… which reads: If the deviceCategory variable equals “desktop”, then “desktop”, otherwise “mobile/tablet”.

And that’s it!

You have successfully created a new column which you can use as a slicer or as input in a chart:

SegmentChart

[Download a demo Power BI file with all DAX examples in this post.]

More advanced segments

The IF() function is great if you only have a few items in your conditional expression. If you have many, however, you can take advantage of the SWITCH() function. With this function you avoid writing complex nested IF() expressions.

Suppose you want to divide all the US states into US regions: Northeast, Midwest, South and West. You can do so with the following SWITCH() expression:

US Region =
SWITCH ( 
    Scitylana[region], 
    "Connecticut", "Northeast",
    "Maine", "Northeast",
    "Massachusetts", "Northeast",
    "New Hampshire", "Northeast",
    "Rhode Island", "Northeast",
    "Vermont", "Northeast",
    "New Jersey", "Northeast",
    "New York", "Northeast",
    "Pennsylvania", "Northeast",
    "Illinois", "Midwest",
    "Indiana", "Midwest",
    "Michigan", "Midwest",
    "Ohio", "Midwest",
    "Iowa", "Midwest",
    "Kansas", "Midwest",
    "Minnesota", "Midwest",
    "Missouri", "Midwest",
    "Nebraska", "Midwest",
    "North Dakota", "Midwest",
    "South Dakota", "Midwest",
    "Wisconsin", "Midwest",
    "Delaware", "South",
    "Florida", "South",
    "Georgia", "South",
    "Maryland", "South",
    "North Carolina", "South",
    "South Carolina", "South",
    "Virginia", "South",
    "Washington D.C.", "South",
    "West Virginia", "South",
    "Alabama", "South",
    "Kentucky", "South",
    "Mississippi", "South",
    "Tennessee", "South",
    "Arkansas", "South",
    "Louisiana", "South",
    "Oklahoma", "South",
    "Texas", "South",
    "District of Columbia", "South",
    "Arizona", "West",
    "Colorado", "West",
    "Idaho", "West",
    "Montana", "West",
    "Nevada", "West",
    "New Mexico", "West",
    "Utah", "West",
    "Wyoming", "West",
    "Alaska", "West",
    "California", "West",
    "Hawaii", "West",
    "Oregon", "West",
    "Washington", "West",
    BLANK ()
)

The above expression reads: If the variable region equals Connecticut, then Northeast, else if region equals Maine, then Northeast [….], otherwise Blank.

[Download a demo Power BI file with all DAX examples in this post.]

This new calculated column can now be used as a slicer in Power BI, enabling the end-user to zoom in on the region he or she is interested in. For example, in the setup below, selecting Northeast will make the map dynamically zoom in on this region:

SegmentUsRegion

Now, suppose you want to build an even more complex segment. Suppose you want to create a variable which shows a specific geographical market division.

For example, you might want to create an overview of the US as your main market and Europe, Asia, Oceania and the rest of Americas as secondary markets.

The challenge here, however, is that whereas Europe, Asia and Oceania are continents, the US is a country belonging to the continent Americas.

To get an overview of your markets, you’ll have to create a new calculated column which draws on both Continent and Country.

You can do so by using SWITCH() in combination with TRUE():

Market =
SWITCH (
    TRUE (),
    AND ( Scitylana[Continent] = "Americas", Scitylana[Country] = "United States" )"The US",
    AND ( Scitylana[Continent] = "Americas", Scitylana[Country] <> "United States" )"Rest of Americas",
Scitylana[Continent] = "Africa""Africa",
Scitylana[Continent] = "Asia""Asia",
Scitylana[Continent] = "Europe""Europe",
Scitylana[Continent] = "Oceania""Oceania",
    BLANK ()
)

Notice how the TRUE() function allows for logical expressions in your conditions, including not only the equal operator, but also the AND operator.

[Download a demo Power BI file with all DAX examples in this post.]

Again, the new variable can be used as a slicer or as input in a chart:

SegmentTrueFunctionl

Beware of filtering scope

Now that you know how to use segments in both basic and advanced ways, you should know that variables in Google Analytics (dimensions and metrics) have different scope:

  • User scope
  • Session scope
  • Hit scope
  • Product scope

When building segments, user-scope variables filter users, session-scope variables filter sessions, hit-scope variables filter hits and so on.

For example, since the dimension Continent is session scope, selecting Europe will give you all sessions from Europe. Even if a user has previously visited your site from the US, you will still only get the sessions he or she made from Europe.

You can get an overview of GA variables and their scopes available in Scitylana here.

Scope is important because sometimes you want to filter a unit which is different from the variable’s scope.

Suppose you want to  build a segment with converted users, i.e. users who have completed one or more goals. The challenge here is that goals are hit scope, so a simple filter expression such as goal1Completion > 0, will not give you converted users, but only the specific hits that count as goal 1.

In my next post I will explain how to overcome this problem, how to give your goals and other hit-scope variables a broader scope.

Why Google Analytics connectors are useless

I see many users of graphical dashboard tools using built-in Google Analytics connectors (like Power BI, QlikView, and Tableau) having trouble understanding what the numbers really mean – and why they differ from the Google Analytics dashboard.

Maybe you’ve had the same frustrating experience as many others:

  • A user of Tableau reports a “mismatch” between GA and Tableau
  • A user of Power BI complains that his GA data “don’t make sense”
  • A user of QlikView wonders why his GA data “is not matching”
  • A user of Power BI reports that his imported GA data “is wrong

If you have observed something similar, or just if you’re wondering what the heck is going on here, then please read on.

I’m going to settle once and for all why GA connectors are so confusing, and why you should probably avoid them altogether.

The core of the problem

The reason why Power BI, QlikView, and Tableau have become so popular in recent years is not only because they are superior at visual reporting, but also because they use so-called “in-memory databases” which work blazingly fast, giving the user an exploration tool with sub-second response time.

For these in-memory databases to work as intended, data-sets should be as “original” or “raw” as possible. No grouping, aggregating, or filtering. Just the events, line-by-line, imported from your sales tables, hit tables, etc. This gives you maximum freedom of analysis while the in-memory database will happily crunch the growing volume of data.

The problem? GA connectors do not deliver raw data!

Instead, they import GA metrics that are already aggregated (summed, counted, calculated, etc.). And while this works fine in a few cases, users are potentially misled to think these metrics are always reliable to use in their dashboard tool.

They are NOT!

You need the non-aggregated data behind these numbers to avoid faulty double aggregations.

Let me explain.

Beware of double aggregation!

Consider this overview of how two people, let’s call them John and Carla, visit a website during a week. While Carla visits only once, John makes four visits on four different days, as seen in the table below:

 Day Number of Users The Names of Users
Monday 2 John, Carla
Tuesday 1 John
Wednesday 0
Thursday 0
Friday 0
Saturday 1 John
Sunday 1 John
Total 5 2

Now ask yourself, “What is the total number of users during this week?” Looking at the names of users in the rightmost column, the total is obviously only two: John and Carla.

However, this is not the picture we get from GA connectors. The GA connector only shows the other columns: Day and Number of Users (or “Users per Day”). If we try to roll them up to the week, the result is five (2 + 1 + 0 + 0 + 0 + 1 + 1 = 5).

Do you see what happened here? Summing up users across days leads to the wrong result – in this case, five instead of two!

Why is that?

Semi-additive metrics and distinct count

Here’s the thing, Users is an example of what database geeks (like me) call semi-additive metrics.

It’s a metric you can sum meaningfully across some dimensions, but not all, and the time dimension is not one of them.

The technical explanation for this is that Users per Day constitutes a data stream with repeated elements (i.e., a user may repeatedly appear during the week, like John). Calculating users across days will, therefore, most likely result in multiple counts of the same user, leading to a meaningless total such as the final number 5 in the table above.

The way to handle data streams with repeated elements is to use  distinct count instead of sum.

This is what Google Analytics does behind the scenes when it returns a query involving users, and that’s why you will always see the correct totals in Google Analytics.

The problem is, that distinct count is only possible if you have access to the raw data (e.g., cookie IDs of users or, in my example, the names in the table above). You can’t derive distinct count from a table which is already aggregated. Yet this is exactly what users of Power BI, Tableau, QlikView, etc., are led to believe when they use the tools’ built-in GA connectors.

And that’s the reason they get so confused and frustrated!

Is there a workaround?

“Okay,” you say, “But can’t I just avoid importing Users per Day? Can’t I just import the totals I need directly from Google Analytics?”

Well, yes, you can – but you’ll be severely limited in what you can then do in your dashboard.

Obviously, you wouldn’t be able to add a time-period selector in your dashboard, which can be annoying to end-users. But you would also have to be careful about other types of filters, since Users is a non-additive metric not just across time, but across several other dimensions as well.

For example, you can’t sum Users meaningfully across traffic sources such as your ad campaigns, emails, organic traffic, etc. You also can’t sum Users across landing pages or other types of URLs.

And mind you, Users is not the only semi- or non-additive metric in Google Analytics. Sessions is another example, as well as all calculated rates and ratios such as your Goal Conversion Rate, Bounce Rate, Average Session Duration, Average Order Value, and so on.

So, yes, you can try to circumvent the problem of semi- and non-additive metrics in your dashboard, but it would prevent you from taking full advantage of its features to the point where it probably becomes useless.

Un-aggregate your GA data with Scitylana

The frustrating experience many people have using GA connectors in dashboards is one reason we created Scitylana.

This tool lets you extract data from your free Google Analytics account, transforming them into a raw, hit-level form. This is the perfect format to have when you load data into Power BI, Tableau, QlikView, Targit, and others.

It gives you maximum freedom to analyze, visualize, and integrate GA data with other data sources. It enables you to slice and dice the data and to use whichever dimension you like as a filter.

With Scitylana you never have to worry about semi-additive metrics or distinct count.

The best thing? Signing up is free, and you can do it right now in just a couple of seconds: www.scitylana.com