The role of leadership and strategy in digital transformation

We live in a rapidly changing world. Every part of our society is undergoing fundamental transformation driven by a fast-paced digital revolution. The revolution started more than a decade ago but is now accelerating and becoming more and more overwhelming.

We feel it especially in the business world. New business models are wreaking havoc across all industries. Google is transforming the advertising world, Amazon is transforming the publishing sector, Netflix is transforming the television industry, Airbnb is transforming the hotel business and Uber is transforming taxi services.

The list goes on and yet it is only the beginning. Emerging technologies such as the Internet of Things (IoT), Artificial Intelligence (AI), machine learning, cloud computing, blockchain and 3D printing are giving rise to new amazing products: Self-driving cars, health-monitoring wearables, robotic surgery, precision agriculture, smart factories, etc.

With this tech wave coming at them, business leaders around the world find themselves at an inflection point. They are grappling with the implications of the digital revolution and feel an increasing pressure to act now – either to prevent disruption or to seize a competitive advantage.

Opportunities and threats

According to research, it is indeed important to act now. Companies that embrace new digital technologies combined with strong leadership tend to outperform their peers. They are 26% more profitable and generate 9% more revenue than their average industry competitor (Westerman et al. 2014).

If tackled right, digital transformation presents a tremendous opportunity for almost any type of company. A passive approach, on the other hand, poses the risk of disruption, leading to declining market shares or even bankruptcy.

A good example of the risks involved is the recent transformation of the video business. Netflix entered as a challenger to Blockbuster in the late 1990’s offering DVD subscriptions based on physical delivery. In the early 2000’s, however, the company switched to video on-demand, taking advantage of the improved internet bandwidth.

The growing popularity of streaming video coupled with lower costs turned Netflix into a deadly competitor. After years of declining income and many desparate attempts to catch up with Netflix, Blockbuster closed its remaining 300 stores in 2014 (Rogers 2016).

How to tackle the digital revolution

To take advantage of digital technologies, it is obviously necessary to invest in them. Investment, however, is not enough. Strong leadership is also needed. Without leadership, companies risk implementing trendy, but incompatible, platforms or services that make it harder to achieve true digital mastery.

Strong leadership means, first and foremost, commitment from top management. Chief executives must agree to make the digital journey a core part of the business. They must follow up by setting direction, building momentum and ensuring backing from the entire organization.

According to Ross et al. (2016), leaders should take their organizations through at least the following three steps:

  1. They should craft a compelling digital vision, which motivates employees, informs strategic planning and guides day-to-day decisions across the organization
  2. They should implement an operational backbone capable of serving as the single source of truth (i.e. integrating customer data, product data, transaction data, etc.)
  3. They should develop innovative digital services (websites, mobile apps, etc.) on the basis of the operational backbone

In doing the above, companies may use in-house developed platforms, purchased platforms or cloud-based platforms. The key to success, however, is integration of these platforms.

Crafting a digital vision

A digital vision is perhaps the most important part of the digital journey. It requires not only an understanding of what is technically possible, but also the ability to discover new business opportunities on the basis of this insight. It is not the technology itself that matters, but how it is used to create a strong value proposition in the context of a business plan.

Digital visions are best understood through examples. However, they come in many different forms and are difficult to categorize, let alone conceptualize. Some focus on improved customer relations, some on streamlining operations, some on transforming products and others on creating completely new business models.

The proliferation of digital visions is driven by the development of technology, which continues to create new opportunities. As an example, Artificial Intelligence (AI) is now giving rise to new digital visions in the retail sector. This is illustrated by the competition between Nordstrom and Stitch Fix.

Nordstrom: Bridging the on- and offline world

Nordstrom is a large fashion retailer in the US that uses technology in a clever way to improve customer service. At the core of its strategy is a reward program linked to a payment card that tracks spending and builds detailed customer profiles. Nordstrom can see how often each customer shops, what he or she buys, his or her favorite brands, etc. The data is collected, integrated, analyzed and used in different types of application.

For example, using a personal-book app integrated with Nordstrom’s Point of Sales (POS) system, shop assistants in physical stores can look up a customer’s unique profile when he or she asks for help. After the store visit, the assistant can even use the personal-book app to send emails to the customer about new arrivals or upcoming sales events.

PersonalBookNordstrom’s Personal Book (Sreekanth 2018)

Nordstrom keeps customers engaged beyond the physical store. Customers can log in to Nordstrom’s website or use its mobile app to access styling tips as well as tools that facilitate physical shopping. For example, using the mobile app they can look up availability of a product in a specific store, order the product for pick up or have a shop assistant reserve it for trial in a fitting room.

Nordstrom’s digital strategy has required significant investments, but is not based on the most advanced technology available. Rather, it is guided by a clear vision of excellent customer service. The purpose is not to replace human work, but to help shop assistants become better at what they do in the physical stores. Every technological investment the company makes seems to be based on how well it fits into this vision.

Over the years, this approach has helped Nordstrom build strong loyalty and become a multi-billion dollar corporation.

Stitch Fix: Machine learning with a human touch

While Nordstrom continues to invest in technology and uses it in a sophisticated way, it faces a growing competition from new digitally savvy entrants. A case in point is Stitch Fix, a fast-growing fashion retailer in the US, which uses Artificial Intelligence (AI) to scale personalized customer service.

The vision of Stitch Fix is similar to Nordstrom: It wants to provide excellent customer service on the basis of a deep understanding of individual customer profiles. But there is a difference in the company’s business model and the way it leverages technology.

Stitch Fix operates purely online to reduce operating costs. Like Nordstrom, it uses humans (called “stylists”) to assist customers choosing the right clothes. As opposed to Nordstrom, however, Stitch Fix uses AI to help stylists keep track of customer profiles and to deliver personalized recommendations.

Customers start by filling out an online style profile, which provides basic information on preference, size, shape, budget and lifestyle. Next, both humans and machine handpick a selection of five clothing items that match each customer’s profile. Finally, customers receive the clothes by mail, try them on at home, buy what they like and return the rest.

StitchFix

The choice of customers – what they buy and don’t buy – is registered by Stitch Fix and fed back into the learning algorithms of the recommendation engine. This enables the engine to learn more and more about the taste of the individual customer.

However, the engine also learns from Stitch Fix’s entire customer base. It uses advanced algorithms to learn which types of clothing combinations customers tend to like. As the customer base grows, more and more successful combinations are added to the machine’s memory. By cross referencing this memory with the customer’s unique profile, the machine is often able to recommend new matching items that surprises, but also inspires, the stylist.

Stitch Fix is much smaller than Nordstrom, but its heavy reliance on data science has given it a strong foothold in a specific market segment: career-oriented millennials who have little time for shopping, but who still care about clothes and appearance.

The threat of disruptive innovation

For now Stitch Fix doesn’t seem to be much of a threat to Nordstrom, or to any other large fashion retailer for that matter. After all less than 3% of US consumers have ever tried Stitch Fix, whereas more than 33% have shopped at Nordstrom. Moreover, Stitch Fix operates at the fringe of the market and doesn’t seem to attract customers from traditional fashion retailers (Liverence 2018).

In a longer term perspective, however, Stitch Fix may well turn out to be what Christensen (2015) calls a “disruptive innovator”. Disruptive innovators start by targeting segments overlooked by incumbents. Their offerings typically provide unique benefits to their target groups, but do not appeal to mainstream customers. For this reason, incumbents tend to leave them alone. As their products or services mature, however, disruptive innovators begin to move upmarket, targeting mainstream customers while preserving the advantage that drove their early success. When they start winning over mainstream customers in volume, it is too late for incumbents to retaliate and disruption has occurred.

Although Stitch Fix isn’t yet targeting mainstream customers, it is quietly building a strong competitive advantage in an area of the market, which large fashion retailers don’t care much about. Using an advanced emerging technology mastered by few people in the word, it is not only accumulating data about fashion preferences, but also developing a method for turning this into successful recommendations with the help of human stylists.

It would be unwise for Nordstrom and others to ignore this. If Stitch Fix’s success continues, it may well decide to move upmarket and target mainstream customers by, for example, opening physical stores in addition to its online presence. The preference data it has accumulated, the learning algorithms it has developed, the network of stylists it has built could easily be leveraged in the context of physical stores.

This would unlock demand from mainstream customers and pave the way for a deadly attack on incumbent fashion retailers. Regardless of the response of these retailers, however, Stitch Fix is unlikely to enjoy an unobstructed path to success. A major threat to its business model is the continuous growth and diversification of Amazon – itself a recent entrant well positioned to disrupt the retail sector by means of innovative technology.

References

Christensen, C. M., Raynor, M. E. and McDonald, R. (2015) What Is Disruptive Innovation?

Liverence, B. (2018) Fashion retailers have nothing to fear (yet) from the rise of Stitch Fix.

Rogers, D. (2016) The Digital Transformation Playbook: Rethink Your Business for the Digital Age. Kindle. Harvard Business Review Press.

Ross, J. et al. (2016) Designing Digital Organizations, MIT Center for Information Systems Research.

Sreekanth, S. (2018) The Nordstrom Way to Successful Enterprise Digital Transformation.

Westerman, G., Bonnet, D. and McAfee, A. (2014) Leading Digital: Turning Technology into Business Transformation. Kindle. Harvard Business Review Press.

How long does it take Google Analytics to delete a user’s data from a property?

We at Scitylana are curious about how long it takes for Google Analytics to clean up after a user requests deletion of data. We set out to test it.

Expecting it could take days  – we will be testing once every day.

Since we have the Scitylana plugin installed we have a Google Analytics clientId attached to all hits in the view we tested.

Baseline setup 2018-05-29

As a baseline for the test we query our Google Analytics property for hits on my personal clientId 373080416.1509547250

This is the query I’ll run every day.

The data extract shows each individual data point tracked on the site in a periode from 2018-01-01 thru 2018-05-29.

 

Blogpost - Baseline 2018-05-29

Requesting deletion

Now we have to send a request for deletion to Google Analytics Deletion API.

We are doing this with our free and public tool. First we authorize access to our Google Analytics account. Then select the property we want to delete from. And finally select Client ID and paste the our specific clientId  373080416.1509547250 and hit Delete.

Capture

Now the request is sent successfully to Google Analytics, we can start checking if we have been deleted.

First check 2018-05-29 (post deletetion)

Running my query gives me the exact same data as my baseline query.

#Update 1: 2018-05-30

Still no changes. Data remains exactly the same. Except one additional hit I made yesterday visiting my sites frontpage.

Blogpost - Baseline 2018-05-30

 

#Update 2: 2018-05-31

Screenshot of the day – still no deleted user – Since I haven’t visited the site since the 29th, the dataset is exactly the same as yesterday.

Capture

@Analytical_Tom came with interesting feedback on Twitter.

#Update 3: 2018-06-01

@Analytical_Tom found that Google has released a delete user button in then User Report interface. When using this button we get some more info on how long the deletion process takes and what is deleted.

Once a delete is requested Google States to following about when the deletion is complete.

Capture

This means is can expect the User Report table to be cleaned within 72 hours. But this is really not of interest to me since I’m only interested in what the API responds. And since the User Report is not available thru the API it makes do difference here.

It seems as I need to wait a total of up to 63 days before I see changes to the response via the API.

I will update this article again in 2 weeks time to see if I should get lucky.

 

#Update 4: 2018-07-06

OK, I waited a bit longer than 2 weeks. I was expecting that is will take quite a while before i see the my visitor deleted. I can report that the data is still available in Google Analytics. I’am only 38 days in the experiment and I have an estimated 25 days to go – sigh… 🙂

My long wait is relieved by watching the football/soccer world cup even though my home team Denmark is out – but anyway I can cheer for my danish mates in Tour de France instead. 🙂

How we handle GDPR article 17 requests using Google Tag Manager and email?

FYI: Also try our GDPR Delete tool

We needed an easy way to delete users from our Google Analytics Free to comply with GDPR article 17 and we wanted it to be as easy as possible.

We succeeded and decided to make it easy for you as well – and yes it is free, but we would appreciate it if you helped us spread the word.

Here is what we did and what you should do!

So! we wanted to spend as little time as possible doing the implementing as we have very little time and many things to do.

The solution has 4 steps:

Step 1 – provide the user with easy access to their clientId

Step 2 – make it easy for the user to send us their clientId

Step 3 – delete the user’s data with this tool

To do this we needed 2 things.

  1. To read the clientId while the user is on our site.
  2. Write out the clientID to the user, so he/she can copy/paste it to us.

We decided that Google Tag Manager (GTM) was a good tool for this.


Part 1: Reserve a spot on the website for the clientId

We decided that the user should click on some text to reveal the clientId. So we added this HTML to our page.

<p>Your Client Id on this page is: <i id=”clientId”>[CLICK TO REVEAL]</i></p>

Part 2: Monitor clicks on “clientId” element then write the clientId

In GTM, create a trigger and call it e.g. “Get-Set clientID”

Capture

Configure to Click ID, equals, clientId

Click Save

In GTM, create a new Tag of the type Custom HTML

Paste following code:

(function(){(function(){ function getClientIdCookie(name) {  var value = "; " + document.cookie; var parts = value.split("; " + name + "=");  if (parts.length == 2) return parts.pop().split(";").shift(); } var clickedElement = document.getElementById("{{Click ID}}"); var _ga = getClientIdCookie("_ga"); if(!!_ga) {   var parts = _ga.split(".").reverse();   if(parts.length > 2)   clickedElement.innerHTML = parts[1] + '.' + parts[0]; })()

Add the trigger we created above

Capture

Click Save. Click Submit. Click Publish

Load the page where you added the HTML text. This is how the result looks on our site.

Capture

When the user clicks the [CLICK HERE TO REVEAL] text is looks like

Capture

See for yourself here.

We would love ideas and feedback. Let us know on Twitter,  or via the chat on our main site.

Tap into the power of segmentation with hit-level Google Analytics data!

Success in marketing is all about matching the right content with the right audience. You need segmentation for this.

Segmentation is what allows you to zoom in on your target group. It allows you to identify, then target, high-performing users whilst at the same time improving or removing low performers.

Measuring the total conversion rate is a start, but you should also break down your data to see if some users convert more than others.

For example, if you find that desktop users convert more than mobile users, you could adjust your campaign target-audience accordingly or — even better — make your website more mobile-friendly.

If you optimize systematically across many user groups (geographical location, content viewed, visit frequency, previous purchases, basket abandonment, etc.), imagine the effect on your overall campaign effectiveness and conversion!

Segmentation with hit-level data

While it is possible to segment users in Google Analytics, the data will often be sampled and, therefore, not necessarily accurate. In addition, there are a number of limitations to custom segments in GA such as a maximum date range of 90 days.

Fortunately, it is possible to avoid all of this!

Simply signup for and implement SCITYLANA (it’s free), pull out data from your GA account and transform it into a raw, hit-level form. Next, load the data into a self-service BI tool such as Power BI, Excel, Qlik, Tableau or Data Studio / BigQuery.

As it turns out, segmentation becomes easier, more powerful and more actionable once data is loaded into one of these tools.

In the rest of this post, I will walk you through how segmentation works in Power BI. You can use any of the tools, of course, but Power BI is free, easy to use and has great segmentation features.

Custom segments in Power BI

If you haven’t done so already, go ahead and download our free Power BI template for GA data extracted through SCITYLANA.

If you are not a SCITYLANA user, you can download a Power BI Desktop file with demo data here (requires Power BI Desktop).

Here is an overview of the main segmentation options once you have loaded your data into the template:

  • Date range selectors
  • Slicers
  • Multi-level filters
  • Calculated columns

SegmentsFilters

Date range selectors

You can add as many slicers as you like directly to your dashboard (i.e. none of these limitations), and you can choose between different types of slicer, including date range selectors.

Simply select the Slicer icon, then drag & drop the relevant field onto the Slicer visual.

SegmentsDate

If you drag the Date field onto the visual, Power BI will automatically turn the slicer into a date range selector. You can switch between different types such as relative period, fixed period or lists with weeks, months, quaters and years.

SegmentsDateVersions

Dimension slicers

Suppose you want to zoom in on visitors who entered your site through Paid Search, used a mobile or tablet device to browse your content, was situated in Copenhagen and had never visited your site before.

Would that be possible…? Absolutely!

Simply pull in the dimensions you need, turn them into slicers and select the relevant values.

SegmentsDimensions

Notice the logic behind these selections. Values selected across slicers are joined with AND logic (e.g. “Display” AND “Copenhagen”). Values selected within the same slicer are joined with OR logic (e.g. “mobile” OR “tablet”).

Multi-level filters

Multi-level filters work basically the same way as slicers. The difference is you can specify which report level they should affect. To do so, pull in the filter dimension to either the visual-level, page-level or report-level area under FILTERS:

SegmentMultiLevel

To some extent, you can also define filtering levels for date range selectors and slicers. By default a slicer will affect the entire page – however,  you can overrule this setting by clicking Edit interactions of the Home ribbon (learn more here).

Calculated columns

Suppose you want to build a new variable, which doesn’t already exist in GA. Suppose you want to recode the deviceCategory dimension so that it only has two values instead of three:

SegmentsRecode

You can do this by adding a conditional column in Power BI using DAX in much the same way as you would do in Excel.

Simply right-click on the Scitylana table under FIELDS and select New column:

SegmentsNewCol

Now the DAX formula bar will appear:

SegmentsFormulaBar

Enter the following text:

deviceCategory NEW =
IF ( Scitylana[deviceCategory] = "desktop", "desktop", "mobile/tablet" )

… which reads: If the deviceCategory variable equals “desktop”, then “desktop”, otherwise “mobile/tablet”.

And that’s it!

You have successfully created a new column which you can use as a slicer or as input in a chart:

SegmentChart

[Download a demo Power BI file with all DAX examples in this post.]

More advanced segments

The IF() function is great if you only have a few items in your conditional expression. If you have many, however, you can take advantage of the SWITCH() function. With this function you avoid writing complex nested IF() expressions.

Suppose you want to divide all the US states into US regions: Northeast, Midwest, South and West. You can do so with the following SWITCH() expression:

US Region =
SWITCH ( 
    Scitylana[region], 
    "Connecticut", "Northeast",
    "Maine", "Northeast",
    "Massachusetts", "Northeast",
    "New Hampshire", "Northeast",
    "Rhode Island", "Northeast",
    "Vermont", "Northeast",
    "New Jersey", "Northeast",
    "New York", "Northeast",
    "Pennsylvania", "Northeast",
    "Illinois", "Midwest",
    "Indiana", "Midwest",
    "Michigan", "Midwest",
    "Ohio", "Midwest",
    "Iowa", "Midwest",
    "Kansas", "Midwest",
    "Minnesota", "Midwest",
    "Missouri", "Midwest",
    "Nebraska", "Midwest",
    "North Dakota", "Midwest",
    "South Dakota", "Midwest",
    "Wisconsin", "Midwest",
    "Delaware", "South",
    "Florida", "South",
    "Georgia", "South",
    "Maryland", "South",
    "North Carolina", "South",
    "South Carolina", "South",
    "Virginia", "South",
    "Washington D.C.", "South",
    "West Virginia", "South",
    "Alabama", "South",
    "Kentucky", "South",
    "Mississippi", "South",
    "Tennessee", "South",
    "Arkansas", "South",
    "Louisiana", "South",
    "Oklahoma", "South",
    "Texas", "South",
    "District of Columbia", "South",
    "Arizona", "West",
    "Colorado", "West",
    "Idaho", "West",
    "Montana", "West",
    "Nevada", "West",
    "New Mexico", "West",
    "Utah", "West",
    "Wyoming", "West",
    "Alaska", "West",
    "California", "West",
    "Hawaii", "West",
    "Oregon", "West",
    "Washington", "West",
    BLANK ()
)

The above expression reads: If the variable region equals Connecticut, then Northeast, else if region equals Maine, then Northeast [….], otherwise Blank.

[Download a demo Power BI file with all DAX examples in this post.]

This new calculated column can now be used as a slicer in Power BI, enabling the end-user to zoom in on the region he or she is interested in. For example, in the setup below, selecting Northeast will make the map dynamically zoom in on this region:

SegmentUsRegion

Now, suppose you want to build an even more complex segment. Suppose you want to create a variable which shows a specific geographical market division.

For example, you might want to create an overview of the US as your main market and Europe, Asia, Oceania and the rest of Americas as secondary markets.

The challenge here, however, is that whereas Europe, Asia and Oceania are continents, the US is a country belonging to the continent Americas.

To get an overview of your markets, you’ll have to create a new calculated column which draws on both Continent and Country.

You can do so by using SWITCH() in combination with TRUE():

Market =
SWITCH (
    TRUE (),
    AND ( Scitylana[Continent] = "Americas", Scitylana[Country] = "United States" )"The US",
    AND ( Scitylana[Continent] = "Americas", Scitylana[Country] <> "United States" )"Rest of Americas",
Scitylana[Continent] = "Africa""Africa",
Scitylana[Continent] = "Asia""Asia",
Scitylana[Continent] = "Europe""Europe",
Scitylana[Continent] = "Oceania""Oceania",
    BLANK ()
)

Notice how the TRUE() function allows for logical expressions in your conditions, including not only the equal operator, but also the AND operator.

[Download a demo Power BI file with all DAX examples in this post.]

Again, the new variable can be used as a slicer or as input in a chart:

SegmentTrueFunctionl

Beware of filtering scope

Now that you know how to use segments in both basic and advanced ways, you should know that variables in Google Analytics (dimensions and metrics) have different scope:

  • User scope
  • Session scope
  • Hit scope
  • Product scope

When building segments, user-scope variables filter users, session-scope variables filter sessions, hit-scope variables filter hits and so on.

For example, since the dimension Continent is session scope, selecting Europe will give you all sessions from Europe. Even if a user has previously visited your site from the US, you will still only get the sessions he or she made from Europe.

You can get an overview of GA variables and their scopes available in Scitylana here.

Scope is important because sometimes you want to filter a unit which is different from the variable’s scope.

Suppose you want to  build a segment with converted users, i.e. users who have completed one or more goals. The challenge here is that goals are hit scope, so a simple filter expression such as goal1Completion > 0, will not give you converted users, but only the specific hits that count as goal 1.

In my next post I will explain how to overcome this problem, how to give your goals and other hit-scope variables a broader scope.

SCITYLANA: “Hello BigQuery”

Sign-up
and get raw hit-level Google Analytics data on scitylana.com

The clouds are coming in…

Today we announce the BETA version of our first Cloud destination – BigQuery!

But what does this mean?

Well, it means from today we extract hit-level / not sampled data from your favorite FREE GA view and upload it automatically to your BigQuery project. Just like the big boys do with GA 360 but just less expensive.

Swoosh… !

It updates the data automatically behind the scenes every morning. And with NO HANDS – YAY!

Now take that fine reporting tool of yours with a BigQuery connector and put it to use. Use it to connect to your SCITYLANA transformed FREE GA hit-level / not sampled BigQuery database and do reporting on it.

Let it be Microsoft Power BI, Tableau, Google Data Studio or some other tool.

Let’s start the engine…

To get started – log-in to your www.scitylana.com account and authorize with a BigQuery account with modify permissions (Owner, Editor or Admin) in your data extraction settings. Click the Authorize BigQuery Access button:

BigQueryAuthorize

Now enter your Google Project ID, in my case it is scitylana-1048

BigQueryProjectID

Save it – and off you go. If you already have extracted data on the disk, these files will immediately be uploaded to BigQuery.

Do note that the app will continue to download the files from GA on to your hard-drive. From here the SCITYLANA app will upload the files to BigQuery.

While we wait for the files to be uploaded – let’s set-up the first report.

Connect to BigQuery from Power BI

The easiest way to do this is through our Power BI template for BigQuery. Simply download the template, open it (you must have Microsoft Power BI) and sign in to BigQuery when prompted.

You can also build your report from scratch in the following way.

  1. Open PBI Desktop and click Get Data

PowerBIGetData

2. Select Google BigQuery (Beta) connector

PBIBigQuery

3. Click Connect

4. Expand from your project id to your tables and views.

PBIBQDataset

5. Select second view from the top and click Load

6. Select DirectQuery

PBIDirectQuery

7. Click OK

8. Power BI now creates a model and should end up listing all the dimensions and metrics in the Fields list.

PBIBQFields

9. Right-click the table called “VIEW ya-da ya-da” and select New Measure

10. Write M_users = DISTINCTCOUNT(VIEW ya-da ya-da…[sl_userId]) where “VIEW ya-da ya-da” is the id of your own view. E.g. M_users = DISTINCTCOUNT(VIEW136604982DAYS007[sl_userId])

11. Check your new view M_users in the fields list. (Find it using the search)

12. And VIOLÁ – you get your first chart

PBIBQChart

13. Now search for the date dimension. And check it. Now we get…

PBIBQChartTime

14. I hope you can take the rest from here. 🙂

Nerdy details for the interested

OK, what have we done with your BigQuery account?

Well in the BigQuery Cloud console you can see we have added the following:

BigQueryOnline

(The screenshot displays the output for Google Analytics View 136604982 –  scitylana.com)

For each view you’ll get a date partitioned table (the one with the blue icon) named after your Google Analytics view idThe reason why we partition the data by date is because Google recommends it for getting better query performance. Currently this is the only way BigQuery can partition a table.

Table Schema

Here is a snippet of the schema definition.

BigQuerySchema

The sl_ dimensions (e.g. sl_userId) are some of the extra stuff SCITYLANA adds to the dataset. _PARTITIONTIME is BigQuery internal column used for date partitioning. Read more about it here.

All Google Analytics metrics a re-named from ga: to M_ e.g. M_bounces is the re-named version of ga:bounces. All dimensions are re-named from ga: to E.g. ga:userType is called userType.

BigQuery Views to help

Additionally we created 7 BigQuery views. BigQuery Views are a subset of data, a stored SQL query if you like. We made these helper views for your convenience.

In the data view you get integration between the SCITYLANA hit-level table and BigQuery’s public date table called bigquery-public-data.common_us.date_greg.

This view comes in 6 variations. Latest 7 days and latest 14, 30, 90, 180, 365 days. This is to get better query performance when using them with a BigQuery DirectQuery connector in e.g. Microsoft Power BI. The views have the following format, VIEWXXXXXXXXXDAYSYYY, where XXXXXXXXX is the view id and YYY is the number of days it queries. E.g. VIEW120558169DAYS030 which is returning the last 30 days of hit-level data for GA view with id 120558169.

Another view lets you see the data partitions behind the partitioned table for your convenience. E.g. LIST120558169PARTITIONS

This view is not super important, but it’s practical when you need to get an idea of how many days have been uploaded to the partitioned table.

 

Please feel free to comment here or write us, support@scitylana.com

Sign-up
and get raw hit-level Google Analytics data on scitylana.com

Retroactive goals with your raw Google Analytics data

Goals in Google Analytics are not retroactive. They only start showing data once you have configured them.

So, if you forget to set them up, this will happen:

But don’t despair!

If you have downloaded your raw GA data with Scitylana, you can use tools like Power BI, Excel, Tableau, Qlik or similar to show goals retroactively.

Once the data are outside of GA, you can define or re-define your goals on the fly and see the effect immediately on the entire time range.

How to set up retroactive goals in Power BI

You can use Scitylana with any BI tool. However, here we will show you how to set up goals in Power BI simply because we know this tool better. It’s easy, even if you don’t know DAX.

  1. Open the Power BI file with your raw GA data
  2. Right-click on the Scitylana table under Fields and select New measure
  3. In the formula bar above the reporting canvas, enter this DAX expression:
Conversions = CALCULATE ( [Users], Scitylana[Page] = "/thank-you" )

But remember to replace “/thank-you” with the URL of your own conversion page!

And that’s it!

You have a new measure called Conversions which you can drag and drop into a chart, break down on time, traffic sources, geography and more – and it works instantly on all of your historical data.

Switch your base metric instantly

In the example above, we calculated the goal in terms of users, not sessions. We wanted to know how many users have seen the thank-you page.

If you want to calculate by sessions instead, simply replace [Users]  with [Sessions]:

Conversions = CALCULATE ( [Sessions], Scitylana[Page] = "/thank-you" )

Turn your goal into a percentage

You can also express your new measure as a percentage and turn it into a conversion rate. To do so, simply create a new measure called % Conversions and define it as the number of converted users (i.e. your original measure) divided by the total number of users:

% Conversion = [Conversions] / [Users]

Notice how we use the original measure within the new measure, which is a nice, powerful feature of DAX.

Conversions divided by users gives you a decimal number, but you can easily change format to percentage from the top Modeling ribbon.

Select the new % Conversion measure, then click the % icon:

Add as many filters as you like

Suppose you want to define your goal as the number of users who have seen the thank-you page with a mobile device while staying in London.

No problem!

You simply add more filters to your CALCULATE () function:

Conversions =
CALCULATE (
    [Users],
    Scitylana[Page] = "/thank-you",
    Scitylana[DeviceCategory] = "mobile",
    Scitylana[City] = "London"
 )

… and you can add as many as you like!

Troubleshooting: Beware of your regional settings!

Please notice that the regional setting of your computer affects the required format of DAX (your goal expression).

In the above we have used US / UK regional setting, which requires you to use comma ( , ) as the list separator and period ( . ) as the decimal point.

If your computer has a different regional setting, you must use semicolon ( ; ) as the list separator and comma ( , ) as the decimal point. In this case, your goal expression would look like this:

Conversions = CALCULATE ( [Users]; Scitylana[Page] = "/thank-you" )

Next step

Now that you have created your own conversion measure and conversion rate, it’s time to add it to the Power BI template.

You can create a blank page dedicated to goals. Here you can insert a KPI chart that shows the status of your goal relative to a target.

You can also insert a combined line and bar chart that shows how your total number of users and conversion rate develop over time. The visualization possibilities are almost endless.

We hope you have enjoyed reading about how GA goals can be defined retroactively within Power BI. As always, we welcome any comments or questions you might have.

Why Google Analytics connectors are useless

I see many users of graphical dashboard tools using built-in Google Analytics connectors (like Power BI, QlikView, and Tableau) having trouble understanding what the numbers really mean – and why they differ from the Google Analytics dashboard.

Maybe you’ve had the same frustrating experience as many others:

  • A user of Tableau reports a “mismatch” between GA and Tableau
  • A user of Power BI complains that his GA data “don’t make sense”
  • A user of QlikView wonders why his GA data “is not matching”
  • A user of Power BI reports that his imported GA data “is wrong

If you have observed something similar, or just if you’re wondering what the heck is going on here, then please read on.

I’m going to settle once and for all why GA connectors are so confusing, and why you should probably avoid them altogether.

The core of the problem

The reason why Power BI, QlikView, and Tableau have become so popular in recent years is not only because they are superior at visual reporting, but also because they use so-called “in-memory databases” which work blazingly fast, giving the user an exploration tool with sub-second response time.

For these in-memory databases to work as intended, data-sets should be as “original” or “raw” as possible. No grouping, aggregating, or filtering. Just the events, line-by-line, imported from your sales tables, hit tables, etc. This gives you maximum freedom of analysis while the in-memory database will happily crunch the growing volume of data.

The problem? GA connectors do not deliver raw data!

Instead, they import GA metrics that are already aggregated (summed, counted, calculated, etc.). And while this works fine in a few cases, users are potentially misled to think these metrics are always reliable to use in their dashboard tool.

They are NOT!

You need the non-aggregated data behind these numbers to avoid faulty double aggregations.

Let me explain.

Beware of double aggregation!

Consider this overview of how two people, let’s call them John and Carla, visit a website during a week. While Carla visits only once, John makes four visits on four different days, as seen in the table below:

 Day Number of Users The Names of Users
Monday 2 John, Carla
Tuesday 1 John
Wednesday 0
Thursday 0
Friday 0
Saturday 1 John
Sunday 1 John
Total 5 2

Now ask yourself, “What is the total number of users during this week?” Looking at the names of users in the rightmost column, the total is obviously only two: John and Carla.

However, this is not the picture we get from GA connectors. The GA connector only shows the other columns: Day and Number of Users (or “Users per Day”). If we try to roll them up to the week, the result is five (2 + 1 + 0 + 0 + 0 + 1 + 1 = 5).

Do you see what happened here? Summing up users across days leads to the wrong result – in this case, five instead of two!

Why is that?

Semi-additive metrics and distinct count

Here’s the thing, Users is an example of what database geeks (like me) call semi-additive metrics.

It’s a metric you can sum meaningfully across some dimensions, but not all, and the time dimension is not one of them.

The technical explanation for this is that Users per Day constitutes a data stream with repeated elements (i.e., a user may repeatedly appear during the week, like John). Calculating users across days will, therefore, most likely result in multiple counts of the same user, leading to a meaningless total such as the final number 5 in the table above.

The way to handle data streams with repeated elements is to use  distinct count instead of sum.

This is what Google Analytics does behind the scenes when it returns a query involving users, and that’s why you will always see the correct totals in Google Analytics.

The problem is, that distinct count is only possible if you have access to the raw data (e.g., cookie IDs of users or, in my example, the names in the table above). You can’t derive distinct count from a table which is already aggregated. Yet this is exactly what users of Power BI, Tableau, QlikView, etc., are led to believe when they use the tools’ built-in GA connectors.

And that’s the reason they get so confused and frustrated!

Is there a workaround?

“Okay,” you say, “But can’t I just avoid importing Users per Day? Can’t I just import the totals I need directly from Google Analytics?”

Well, yes, you can – but you’ll be severely limited in what you can then do in your dashboard.

Obviously, you wouldn’t be able to add a time-period selector in your dashboard, which can be annoying to end-users. But you would also have to be careful about other types of filters, since Users is a non-additive metric not just across time, but across several other dimensions as well.

For example, you can’t sum Users meaningfully across traffic sources such as your ad campaigns, emails, organic traffic, etc. You also can’t sum Users across landing pages or other types of URLs.

And mind you, Users is not the only semi- or non-additive metric in Google Analytics. Sessions is another example, as well as all calculated rates and ratios such as your Goal Conversion Rate, Bounce Rate, Average Session Duration, Average Order Value, and so on.

So, yes, you can try to circumvent the problem of semi- and non-additive metrics in your dashboard, but it would prevent you from taking full advantage of its features to the point where it probably becomes useless.

Un-aggregate your GA data with Scitylana

The frustrating experience many people have using GA connectors in dashboards is one reason we created Scitylana.

This tool lets you extract data from your free Google Analytics account, transforming them into a raw, hit-level form. This is the perfect format to have when you load data into Power BI, Tableau, QlikView, Targit, and others.

It gives you maximum freedom to analyze, visualize, and integrate GA data with other data sources. It enables you to slice and dice the data and to use whichever dimension you like as a filter.

With Scitylana you never have to worry about semi-additive metrics or distinct count.

The best thing? Signing up is free, and you can do it right now in just a couple of seconds: www.scitylana.com

Visualize your raw GA data instantly with this Power BI template

Okay, so you’ve implemented SCITYLANA and pulled out loads of nice hit-level data from your Google Analytics account.

Now you want to put all that data to impressive use! But what to focus on and how to get started?

Use this free Power BI template

Although SCITYLANA works with all sorts of tools (BigQuery, Data Studio, Tableau, Targit, Qlik, etc.), we suggest you start with this template for Power BI.

We’ve chosen Power BI as our starter tool, because it’s free, easy to use and extremely versatile when it comes to analyzing, reporting and integrating data.

In this post we’ll give you a quick rundown on the contents of the template. At the end, we’ll also show you how to populate it with your own data.

Overview

The first report tab in the Power BI template gives an high-level view of your traffic.

You can use each visual as a filter. For example, clicking on Organic search in the pie chart will immediately filter all other visuals.

You can also define a custom date range using the slicer in the upper right corner.

PbiTemplateOverview

Traffic sources

The next tab shows your traffic sources. The bar chart shows all traffic sources, but you can easily filter these by channel grouping using the slicer to the right.

PbiTemplateTrafficSources

We’ve included a bubble chart showing visit engagement by channel grouping. The y-axis shows the average number of pages viewed by a session, while the x-axis shows the average time spent on each page.

Notice the Organic Search bubble in the upper right corner? These sessions view many pages and spend a long time on the site.

Content usage

What’s special about this report? Well, it shows Hostname (domains) and Page combined with Sessions.

In GA you actually can’t do this. You can’t combine Hostname or Page with Sessions. This is because of the way GA has organized its data storage.

What is great about the report tab below is that you can easily filter your top pages by hostname. Simply click on one of the domains in the bar chart to the left, and you will see the other one changes immediately.

PbiTemplateContent

Visiting time

Here you can see traffic during working hours, weekends and even seasons such as summer or winter. Again we’ve gone a bit beyond what you can do in GA. We’ve added a calendar and a time of day dimension with more interesting attributes.

PbiTemplateVisitingTimeClickstreams

Now comes our favorite part: Clickstreams! This is kind of the “proof” that we deliver hit-level GA data. Each row is a hit (a pageview or en event).

You can see exactly what individual visitors do!

PbiTemplateClickstreams

Geography

Geography: Isn’t that more or less the same as in the GA interface? Not quite! The treemap in the upper left corner divides North Americans into Northeast, West, South, and Midwest. Yet this dimension doesn’t exist in GA!

PbiTemplateGeography

Technology

Waiting for another surprise? Well, this one is not that special, but nonetheless useful. It shows statistics on devices, browsers and operating systems.

PbiTemplateTechnology

How to use the template with your own data

  1. Sign up for and implement SCITYLANA (it’s free!)
  2. Start a new data extraction so that your GA data is downloaded to your hard drive
  3. Download and install Power BI Desktop (it’s free!)
  4. Download and open our Power BI template
  5. Enter the path of the data folder (the destination you chose during step 2)

 

PbiTemplateDataFolder

Now it’s your turn 🙂 Do you have ideas on how to visualize GA data in new interesting ways? Please let us know and we’ll add them to the template.

The Cost of Sampling in Google Analytics

Recently, Jonathan Weber at LunaMetrics wrote a great post regarding the accuracy of Google Analytics sampling, revealing how users shouldn’t just take it at face value but also check for potential inaccuracies.

Weber shows that while sampling works well for overall trends on your website, it can become very inaccurate. One good example of this is when you’re looking at smaller subsets of data, such as the conversion rate of a specific campaign.

To help calculate the accuracy of your data, Weber even included an online tool, which you can use to determine the “margin of error” for your campaign conversion rate.

In this post, I want to take his argument a bit further. I wanted to ask a slightly different question and see what the sampling inaccuracy could actually COST you, particularly if it led to taking the wrong decision in a campaign.

First, calculate margin of error

Imagine that you have a website with some 10M sessions in a particular time period, and that sampling kicks in. Now, suppose you’re comparing five different online campaigns, each of which is reported to send 4K sessions to your website with conversions of 2%.

Using Jonathan’s tool, linked above, we can now calculate the margin of error for these reported conversion rates. Here is the result:

SamplingCalculator

As you can see, even though Google Analytics reports a conversion rate of 2%, the real rate for each of the five campaigns is actually between 0.55% and 5.04%. A significant difference, whichever way you look at it.

Next, calculate sampling cost

Let’s assume you have invested $6K in each of the five campaigns (for a total cost of $30K). Let’s then say that each converted session is worth around $100, making your total revenue from the campaigns $40K. This gives you a total profit of $10K:

CampaignCost1

Now assume that Campaign A is doing much better that the rest. Assume that its conversion rate is 5.04%, while the other campaigns only convert to 1.24%:

CampaignCost3

In this case, you would still see a profit of $10K across all your online campaigns, but you would only profit from Campaign A.

Now imagine that you have an additional $30K to invest in the five campaigns. If you allocated this money based on GA’s sampled report, you would still end up with $10K profit.

But, if you had had access to the unsampled data, you would have known that Campaign A is the only profitable one, and that investing the entirety of your $30K into this campaign was the smart move.

Indeed, this simple decision would net you nothing less than $70.8K in profit:

CampaignCost2

As you can see, sampling in this hypothetical case would have cost you $70.8K – $10K = $60.8K.

That’s not small change by any means. Think about that next time you run into this seemingly innocuous message:

SamplingMessage

Unsample your data with Scitylana

The potential cost of sampling in Google Analytics for business is one reason we created  Scitylana. This tool lets you extract data from your free Google Analytics account, transforming them into a raw, hit-level form.

The data is free from sampling, giving you more freedom to analyze, visualize, and integrate GA and CRM data.

The best thing? Signing up is absolutely free, and you can do it right now in just a couple of seconds: www.scitylana.com

 

Marketing-to-sales funnel insights finally made easy!

Base your campaigns on facts instead of feelings. Combine your data sources in MS Power BI.

Get to know the whole funnel – from source to conversion.

  • Load raw Google Analytics data in Power BI through Scitylana
  • Load online campaign data from Facebook etc.

Once loaded, find the fields that your various data sources have in common.

  • For Facebook campaigns, look for the utm-code from your campaigns. This is the key you use to combine your data from Facebook to Google Analytics in Power BI.

Now you have impressions, reach, CTR, bounce rate, conversion etc at your fingertips:

FacebookFunnel

It does not have to take a huge investment in expensive tools to achieve the whole funnel, and not only using the website purchase conversion value in Facebook business manager. This is based on the Facebook attribution model – not your own.

You already have the data. Don’t be afraid to use them.