Funnel Based Attribution in a single SQL statement in BigQuery

Previously I wrote about applying Markov Model Attribution calculations on a Google Analytics click-stream data-set in BigQuery. Shapley Value is another similar Machine Learning algorithm that is very popular for calculating the worth of a campaign. Attribution becomes important if you spend a lot of resources on said activity.

Shapley Value is used by Google Analytics (data-driven in GA 360), Google Adwords and Google Attribution 360.

I will address applying Shapley Value on your Scitylana data in another time.

This article is about about airing another attribution model called Funnel Based Attribution (FBA). Currently this is my favorite algorithm. I’ll get back to that in a bit.

The goal of this blog post is to use FBA and not describe it in detail. It has already been beautifully described by the inventor Vladyslav Flaks. CEO at OWOX.

The Background and Prerequisites

What I like about the algorithm is that is very easy to understand. It seems reasonable to me that steps in on a customers journey is as much about the campaign as about is the site communication itself. This is what makes the FBA way more actionable than any other attribution model. The model entangles campaigns and funnel steps into one system.

To calculate Funnel Based Attribution we must have access to at least the individual pages/events etc. that makes up a session. These informations will be used for defining the micro funnel steps on the site – from front page to order confirmation page and maybe even beyond. I recommend you try using the Scitylana click-stream data that we build from your Google Analytics Free data and our Google Analytics tracking script plugin.

A funnel step can potentially be made up of more or less complex patterns. Maybe even so complex that we need a cluster analysis to find out what pages/events, click-fragments a funnel is made up of. Typically there is a simpel solution when you get started. Complexity may appear at a later state.

The Setup

In this article I will present a solution for calculating FBA with the Scitylana data-set directly in BigQuery with a single SQL statement. Your data-set needs the following columns populated to work. (contact us if you are missing some of the columns in your data-set, we will help you get them)

  • sl_userId
  • sl_sessionId
  • sl_timestamp
  • M_transactionRevenue
  • channelGrouping
  • pagePath

You  can change M_transactionRevenue to goals or any other metric you have that represents value. You can also, and should, experiement with changing channelGrouping. E.g. if you have different kinds of email categories then drill down into and build join them with the channelGrouping building a custom channel grouping dimension.

The SQL statement  consists of 3 main steps

  1. For each funnel step, calculate the “attribution” value on historic data.
  2. Filter sessions so we have the sessions that contributes with a unique funnel progression – from one funnel step/state to another funnel step/state.
  3. Group by channelGrouping and sum the total attribution value multiplied with the revenue/value of a Purchase (or our most wanted response).

Funnel

I chose to model the funnel in this article to fit the AIDA model. I define the funnel steps in the following way.

  • Attention All sessions
  • Interest Non-bounced sessions (> 1 page view)
  • Desire Sessions with Add-to-basket event
  • Action Sessions with a transaction

Funnel Based Attribution Chart

The chart above shows an example of how the funnel stats on a website could look like. The green attribution values can be calculated on the historic data. You can calculate these numbers on full history or just a certain time range using the formular in Flaks’ article. All visitors starts out in the Attention state (they have entered our site with or without a campaign as source). We are interested in the sessions where a visitors state changes. We only look at visitors that have completed the full funnel.

The SQL statement provided below has state definitions for the 3 steps above, Interest, Desire and Action.

The Value calculation for each step is done with the following SQL

      SELECT
      scoreInterest/total AS valueInterest,
      scoreDesire/total AS valueDesire,
      scoreAction/total AS valueAction
    FROM (
      SELECT
        *,
        scoreInterest+scoreDesire+scoreAction AS total
      FROM (
        SELECT
          Attention,
          1 - Interest/Attention AS scoreInterest,
          1 - Desire/Attention AS scoreDesire,
          1 - Action/Attention AS scoreAction
        FROM (
          SELECT
            (
            SELECT
              COUNT(DISTINCT sl_userId) AS users
            FROM
              `scitylana.XXXXXXXX`) AS Attention,
            (
            SELECT
              COUNT(*)
            FROM (
              SELECT
                sl_userId,
                SUM(sl_pageviewOrder)
              FROM
                `scitylana.XXXXXXXX`
              GROUP BY
                sl_userId
              HAVING
                SUM(sl_pageviewOrder) > 1)) AS Interest,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%DESIRE%') AS Desire,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%ACTION%') AS Action ) ) )

To adopt the query to your own data-set you just need to search and replace XXXXXXXX with the your Google Analytics View Id and you need to define each of the Desire and Action steps with you own SQL filter.

Change the part pagePath LIKE ‘%DESIRE%’ with your own filter and the also the part pagePath LIKE ‘%ACTION%’ 

This is the complete statement where you can change the funnel constraints.

SELECT
  channelGrouping AS Channel,
  CAST( SUM(order_revenue_weighted) AS Int64) AS AttributedValue
FROM (
  SELECT
    sl_userId,
    timestamp,
    channelGrouping,
    valueInterest * ifnull(interest,
      0) + valueDesire * ifnull(desire,
      0) + valueAction * ifnull(action,
      0) AS value_weight,
    order_revenue,
    (valueInterest * ifnull(interest,
        0) + valueDesire * ifnull(desire,
        0) + valueAction * ifnull(action,
        0)) * order_revenue AS order_revenue_weighted
  FROM (
    SELECT
      sl_userId,
      timestamp,
      channelGrouping,
      CASE
        WHEN __interest=1 AND interest IS NULL THEN 1
        ELSE NULL
      END AS interest,
      CASE
        WHEN __desire=1 AND desire IS NULL THEN 1
        ELSE NULL
      END AS desire,
      CASE
        WHEN __action=1 AND action IS NULL THEN 1
        ELSE NULL
      END AS action,
      order_revenue
    FROM (
      SELECT
        sl_userId,
        timestamp,
        channelGrouping,
        interest AS _interest,
        desire AS _desire,
        action AS _action,
        interest AS __interest,
        desire AS __desire,
        action AS __action,
        LAG(interest) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS interest,
        LAG(desire) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS desire,
        LAG(action) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp) AS action,
        order_revenue
      FROM (
        SELECT
          sl_userId,
          timestamp,
          channelGrouping,
          interest AS _interest,
          desire AS _desire,
          action AS _action,
          LAST_VALUE(interest IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS interest,
          LAST_VALUE(desire IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS desire,
          LAST_VALUE(action IGNORE NULLS) OVER (PARTITION BY sl_userId, order_time ORDER BY timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS action,
          order_revenue,
          order_time
        FROM (
          SELECT
            sl_userId,
            timestamp,
            channelGrouping,
            interest,
            desire,
            action,
            FIRST_VALUE(CASE
                WHEN revenue IS NOT NULL THEN timestamp
                ELSE NULL
              END IGNORE NULLS) OVER (PARTITION BY sl_userId ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS order_time,
            FIRST_VALUE(revenue IGNORE NULLS) OVER (PARTITION BY sl_userId ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS order_revenue
          FROM (
            SELECT
              sl_userId,
              timestamp,
              channelGrouping,
              SUM(interest) interest,
              SUM(desire) desire,
              SUM(action) action,
              SUM(revenue) revenue
            FROM (
                -- Select Interest sessions
              SELECT
                sl_userId,
                MIN(sl_timeStamp) timestamp,
                channelGrouping,
                1 interest,
                NULL desire,
                NULL action,
                NULL revenue
              FROM
                `scitylana.XXXXXXXX`
              WHERE
                sl_userId IN (
                SELECT
                  sl_userId
                FROM
                  `scitylana.XXXXXXXX`
                GROUP BY
                  1
                HAVING
                  SUM(M_transactionRevenue) > 0)
                AND sl_sessionId != '(not set)'
              GROUP BY
                sl_userId,
                channelGrouping
              HAVING
                SUM(sl_pageviewOrder) > 1
              UNION DISTINCT
                -- Select Desire sessions
              SELECT
                sl_userId,
                MIN(sl_timeStamp) timestamp,
                channelGrouping,
                NULL interest,
                1 desire,
                NULL action,
                NULL revenue
              FROM
                `scitylana.XXXXXXXX`
              WHERE
                sl_userId IN (
                SELECT
                  sl_userId
                FROM
                  `scitylana.XXXXXXXX`
                GROUP BY
                  1
                HAVING
                  SUM(M_transactionRevenue) > 0)
                AND sl_sessionId != '(not set)'
                AND (pagePath LIKE '%DESIRE%'
                  OR pagePath LIKE '%ACTION%')
              GROUP BY
                sl_userId,
                channelGrouping
              UNION DISTINCT
                -- Select Action sessions
              SELECT
                sl_userId,
                MIN(sl_timeStamp) timestamp,
                channelGrouping,
                NULL interest,
                NULL desire,
                1 action,
                SUM(M_transactionRevenue) revenue
              FROM
                `scitylana.XXXXXXXX`
              WHERE
                sl_userId IN (
                SELECT
                  sl_userId
                FROM
                  `scitylana.XXXXXXXX`
                GROUP BY
                  1
                HAVING
                  SUM(M_transactionRevenue) > 0)
                AND sl_sessionId != '(not set)'
                AND pagePath LIKE '%ACTION%'
              GROUP BY
                sl_userId,
                channelGrouping )
            GROUP BY
              1,
              2,
              3 ) )
        WHERE
          order_revenue IS NOT NULL) ) ),
    -- Calculate friction/value score for each step in the funnel
    (
    SELECT
      scoreInterest/total AS valueInterest,
      scoreDesire/total AS valueDesire,
      scoreAction/total AS valueAction
    FROM (
      SELECT
        *,
        scoreInterest+scoreDesire+scoreAction AS total
      FROM (
        SELECT
          Attention,
          1 - Interest/Attention AS scoreInterest,
          1 - Desire/Attention AS scoreDesire,
          1 - Action/Attention AS scoreAction
        FROM (
          SELECT
            (
            SELECT
              COUNT(DISTINCT sl_userId) AS users
            FROM
              `scitylana.XXXXXXXX`) AS Attention,
            (
            SELECT
              COUNT(*)
            FROM (
              SELECT
                sl_userId,
                SUM(sl_pageviewOrder)
              FROM
                `scitylana.XXXXXXXX`
              GROUP BY
                sl_userId
              HAVING
                SUM(sl_pageviewOrder) > 1)) AS Interest,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%DESIRE%') AS Desire,
            (
            SELECT
              COUNT(DISTINCT sl_userId)
            FROM
              `scitylana.XXXXXXXX`
            WHERE
              pagePath LIKE '%ACTION%') AS Action ) ) ) )
  VALUES
  WHERE
    interest IS NOT NULL
    OR desire IS NOT NULL
    OR action IS NOT NULL )
GROUP BY
  1
ORDER BY
  2 DESC

 

When running the query against BigQuery you get a result-set containing the total attributed value for each channel group.

 

Example FBA Real world2

If you want to get started with Funnel Based Attribution, I can suggest starting with a Scitylana 30 day Trial license. This license automatically downgrades to out Personal (Free) license.

If you don’t track revenue with Google Analytics on your site, maybe you don’t sell your product online, then you can change the metric from transaction revenue (M_transactionRevenue) to a Goal Completion metric or your own metric. To use your GA goals you need to upgrade to our Startup Plan. If you are from a Startup, NGO or NON profit, we offer a full licence at a very low price. Apply to see if you are eligible here.

We are very helpful – connect with us on Scitylana support. Or connect with me on LinkedIn and I’ll help you get started. If you are tech savvy you can absolutely get started now and start using the model tomorrow when your first batch of data arrives in BigQuery.

If you want to know more about Google Analytics data in BigQuery, read more here

Future work

If we focus too much on the campaign value itself we might overlook the effects of our website. On our web site we could interpret a product page as a campaign that is leading the visitor to the checkout flow. The checkout can be evaluated as a micro funnel on the site. We can calculate a kind of double funnel analysis. (1) The full customer journey – evaluating the campaign – and (2) the micro funnel of the site – evaluating product related pages.

I will address this approach in future article.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.