Product List Performance - Matching BigQuery Data with Google Analytics
This will be the final in my series of matching Bigquery data with Google Analytics. This is actually where the idea all started. I had to try and recreate the product list report and couldn’t find anything to help me. So, if you can’t find it, do it yourself.
In the past, I have covered:
To get the product list performance you need 3 parts:
The views, click throughs and add to carts
The checkout and transactions
Link them together and account for products with no lists
The views, click throughs and add to carts
Depending on how you set up your site there is usually 3 times that you list the product list that is being used:
For product impressions when they are viewing a collection of products
For product clicks so you know which products are most popular products being clicked in a collection of products
On some add to carts, these days a lot of ecommerce sites allow you to add to cart from a collection, so you would add the product list there
So, let’s get all the details when we have a product list stored. We will need hit number for later use. The rest is just putting a 1 next to each action that happened to a product in a session for the three events that usually have product lists stored with them.
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.hitNumber,
product.productListName,
product.productSKU,
product.v2ProductName as name,
MAX(if(product.isImpression, 1,0)) AS product_impressions,
MAX(if(product.isClick,1,0)) AS product_click,
MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name
But what happens if a user clicks on a product from a list, views a product detail page and adds to cart there. Most implementations of enhanced ecommerce don’t remember the list they came from but Google Analytics still stitches it together some how.
Though I couldn’t find anything offical what I figure is this, if there is list before they add to cart, attribute it to that list - think of it like it’s last non-direct attribution, it will go back through the events to see if there was a product click on any product list for that product and if there was that gets attribution. So to do that we need to look back in our table, using lead
and lag
. Lead
looks forward 1 row (or you can tell it more) to see the values. So if the following row (when ordered by hit number) has an add to cart event with a not set product list name then you want to take that value and add it to your own. Like this:
SELECT
* EXCEPT(product_add_to_cart),
CASE
WHEN LEAD(productListName) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber) = "(not set)" THEN LEAD(product_add_to_cart) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber)+product_add_to_cart
ELSE
product_add_to_cart
END
AS product_add_to_cart_updated
FROM (
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.hitNumber,
product.productListName,
product.productSKU,
product.v2ProductName as name,
MAX(if(product.isImpression, 1,0)) AS product_impressions,
MAX(if(product.isClick,1,0)) AS product_click,
MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name)
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name,
product_impressions,
product_click,
product_add_to_cart
So now we just clean it up by removing all the not sets:
SELECT
sessionID,
productListName,
productSKU,
name,
SUM(product_impressions) AS product_impressions,
SUM(product_click) AS product_click,
SUM(product_add_to_cart_updated) AS product_add_to_cart
FROM (
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.hitNumber,
product.productListName,
product.productSKU,
product.v2ProductName as name,
MAX(if(product.isImpression, 1,0)) AS product_impressions,
MAX(if(product.isClick,1,0)) AS product_click,
MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name)
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name,
product_impressions,
product_click,
product_add_to_cart)
WHERE
productListName != "(not set)"
GROUP BY
sessionID,
productListName,
productSKU,
name
The checkout and transactions
Now this is the easy part, similar to how we did it in product performance we just want the checkout, transactions and revenue for a product, we just need the session ID to match it up with. I’ve split you checkout and transactions as two seperate queries:
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
product.productSKU,
1 AS product_add_to_cart,
1 AS product_purchase,
ROUND(SUM(product.productRevenue)/1000000,2) AS revenue
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND hits.eCommerceAction.action_type = "6"
GROUP BY
sessionID,
productSKU
and the checkout:
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
product.productSKU,
1 AS product_add_to_cart,
1 AS product_checkout
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND hits.eCommerceAction.action_type = "5"
GROUP BY
sessionID,
productSKU
Now you might be thinking, why is there a 1 for product_add_to_cart
, we have already looked at add to cart, and we aren’t doing any test that they did add to cart. Well, you are right, smarty pants. But take this scenario, when I get to the home page there is a product impression for the dress I like but I am too much in mission mode, I go straight to the dresses collection, the dress pops up again but I know it’s a short sleeve dress so I go to that collection. So far I had 3 product impressions for that 1 dress, I think click on the dress in short sleeve dress collection and add to cart to buy. Now if we only join up our tables on session id and product sku, all 3 of these collections would get attributed for my purchase and would count the purchase 3 times. But if we only count the one time I added to cart, then it will match correctly.
Joining it all together
Like I said above, we then join them on not just session id and product sku but also the product list that had them add to cart. After that we can summarize it.
SELECT
productListName,
SUM(product_impressions) product_list_views,
SUM(product_click) product_list_click,
SUM(product_click)/ SUM(product_impressions) as product_list_ctr,
SUM(product_add_to_cart) product_add_to_cart,
SUM(product_checkout) product_checkout,
SUM(product_purchase) product_purchase,
SUM(revenue) revenue
FROM (
SELECT
sessionID,
productListName,
productSKU,
name,
SUM(product_impressions) AS product_impressions,
SUM(product_click) AS product_click,
SUM(product_add_to_cart_updated) AS product_add_to_cart
FROM (
SELECT
* EXCEPT(product_add_to_cart),
CASE
WHEN LEAD(productListName) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber) = "(not set)" THEN LEAD(product_add_to_cart) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber)+product_add_to_cart
ELSE
product_add_to_cart
END
AS product_add_to_cart_updated
FROM (
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.hitNumber,
product.productListName,
product.productSKU,
product.v2ProductName AS name,
MAX(if(product.isImpression, 1,0)) AS product_impressions,
MAX(if(product.isClick,1,0)) AS product_click,
MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name)
GROUP BY
sessionID,
hitNumber,
productListName,
productSKU,
name,
product_impressions,
product_click,
product_add_to_cart)
WHERE
productListName != "(not set)"
GROUP BY
sessionID,
productListName,
productSKU,
name)
LEFT JOIN (
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
product.productSKU,
1 AS product_add_to_cart,
1 AS product_purchase,
ROUND(SUM(product.productRevenue)/1000000,2) AS revenue
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND hits.eCommerceAction.action_type = "6"
GROUP BY
sessionID,
productSKU)
USING
(sessionID,
productSKU,
product_add_to_cart)
LEFT JOIN (
SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
product.productSKU,
1 AS product_add_to_cart,
1 AS product_checkout
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits,
UNNEST (hits.product) product
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND hits.eCommerceAction.action_type = "5"
GROUP BY
sessionID,
productSKU)
USING
(sessionID,
productSKU,
product_add_to_cart)
GROUP BY
productListName
If you stuck with me through all that, congratulations and feel free to copy and paste what ever you need to get the job done. Sharing is caring.