The GA4 update: matching big query data with Google analytics - Acquisition Channel report
With the widespread usage of GA4 these days, it seems only fitting to continue my series of matching BigQuery data with Google Analytics.
See my previous post for basic audience metrics.
In this post, I will look at :
transactions
eCommerce conversion rate
revenue
Goal completions
Goal conversion rate
Before we get into each individual metric, a little about channels in BQ. Firstly, BigQuery doesn’t store the channel groupings, only source, medium and campaign. Also, if you use trafficSource, this is the user traffic source, not the session. so to get the session channel groupings we need to work a bit of magic. I got mine from this site: https://www.ga4bigquery.com/traffic-source-dimensions-metrics-ga4/.
Get the Source/Mediums
It turns out that the source, medium and campaign listed under traffic_source is the first users’ traffic source. For you to get the session traffic source, you need to look at the parameters stored with each hit. For this, we do the following:
with acq_table as ( select distinct date, user_pseudo_id, session_id, FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source, FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium, FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp) AS campaign, FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid, from( Select * except(traffic_source), array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid) is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source, from( SELECT PARSE_DATE("%Y%m%d",event_date) AS date, user_pseudo_id, event_timestamp, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, (select as struct (select value.string_value from unnest(event_params) where key = 'source') as source, (select value.string_value from unnest(event_params) where key = 'medium') as medium, (select value.string_value from unnest(event_params) where key = 'campaign') as campaign, collected_traffic_source.gclid as gclid ) as traffic_source, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name NOT IN ('first_visit','session_start')) group by 1,2,3,4) ) Select case when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct' when regexp_contains(campaign, 'cross-network') then 'Cross-network' when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$')) and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping' when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video' when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display' when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video' when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') or medium = 'organic' then 'Organic Search' when regexp_contains(source,'email|e-mail|e_mail|e mail') or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email' when medium = 'affiliate' then 'Affiliates' when medium = 'referral' then 'Referral' when medium = 'audio' then 'Audio' when medium = 'sms' then 'SMS' when medium like '%push' or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications' else 'Unassigned' end as channel_grouping_session, count(distinct concat(user_pseudo_id, session_id)) as sessions, from acq_table group by 1
Note: this won’t work in the sample data for GA4 because the GCLID field was only added later. This is required otherwise your paid search is grossly underreported.
Transactions and Revenue
This is done by counting transaction IDs for purchase events. We can also sum up the purchase revenue in this table. This gives us the number of purchases, conversion rate and total revenue.
with acq_table as ( select distinct date, user_pseudo_id, session_id, FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source, FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium, FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp) AS campaign, FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid, from( Select * except(traffic_source), array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid) is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source, from( SELECT PARSE_DATE("%Y%m%d",event_date) AS date, user_pseudo_id, event_timestamp, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, (select as struct (select value.string_value from unnest(event_params) where key = 'source') as source, (select value.string_value from unnest(event_params) where key = 'medium') as medium, (select value.string_value from unnest(event_params) where key = 'campaign') as campaign, collected_traffic_source.gclid as gclid ) as traffic_source, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name NOT IN ('first_visit','session_start')) group by 1,2,3,4) ), purchases_tbl as ( Select user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, count(distinct ecommerce.transaction_id) as purchases, sum(ecommerce.purchase_revenue) as revenue, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name = "purchase" ), Select case when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct' when regexp_contains(campaign, 'cross-network') then 'Cross-network' when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$')) and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping' when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video' when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display' when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video' when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') or medium = 'organic' then 'Organic Search' when regexp_contains(source,'email|e-mail|e_mail|e mail') or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email' when medium = 'affiliate' then 'Affiliates' when medium = 'referral' then 'Referral' when medium = 'audio' then 'Audio' when medium = 'sms' then 'SMS' when medium like '%push' or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications' else 'Unassigned' end as channel_grouping_session, count(distinct concat(user_pseudo_id, session_id)) as sessions, sum(purchases) as purchases, sum(purchases)/count(distinct concat(user_pseudo_id, session_id)) as ecomm_conv_rate, sum(revenue) as revenue, from acq_table left join purchases_tbl using (user_pseudo_id, session_id) group by 1
Goal Completion
You could use the parameter “is a conversion” to do this, but I find it easier just to list out the event you want to track as a conversion. This way, if it wasn’t set up as a conversion in the interface it won’t impact your reporting.
with acq_table as ( select distinct date, user_pseudo_id, session_id, FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source, FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium, FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp) AS campaign, FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid, from( Select * except(traffic_source), array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid) is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source, from( SELECT PARSE_DATE("%Y%m%d",event_date) AS date, user_pseudo_id, event_timestamp, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, (select as struct (select value.string_value from unnest(event_params) where key = 'source') as source, (select value.string_value from unnest(event_params) where key = 'medium') as medium, (select value.string_value from unnest(event_params) where key = 'campaign') as campaign, collected_traffic_source.gclid as gclid ) as traffic_source, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name NOT IN ('first_visit','session_start')) group by 1,2,3,4) ), goal_tbl as ( Select user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, 1 as goal_completion, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name = "select_promotion" ) Select case when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct' when regexp_contains(campaign, 'cross-network') then 'Cross-network' when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$')) and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping' when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video' when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display' when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video' when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') or medium = 'organic' then 'Organic Search' when regexp_contains(source,'email|e-mail|e_mail|e mail') or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email' when medium = 'affiliate' then 'Affiliates' when medium = 'referral' then 'Referral' when medium = 'audio' then 'Audio' when medium = 'sms' then 'SMS' when medium like '%push' or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications' else 'Unassigned' end as channel_grouping_session, count(distinct concat(user_pseudo_id, session_id)) as sessions, sum(goal_completion) as goal_completion, sum(goal_completion)/count(distinct concat(user_pseudo_id, session_id)) as goal_con_rate, from acq_table left join goal_tbl using (user_pseudo_id, session_id) group by 1
Bringing it all together
Using these metrics and audience metrics we have from the previous post we have:
with acq_table as ( select distinct date, user_pseudo_id, session_id, FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source, FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium, FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp) AS campaign, FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid, from( Select * except(traffic_source), array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid) is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source, from( SELECT PARSE_DATE("%Y%m%d",event_date) AS date, user_pseudo_id, event_timestamp, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, (select as struct (select value.string_value from unnest(event_params) where key = 'source') as source, (select value.string_value from unnest(event_params) where key = 'medium') as medium, (select value.string_value from unnest(event_params) where key = 'campaign') as campaign, collected_traffic_source.gclid as gclid ) as traffic_source, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name NOT IN ('first_visit','session_start')) group by 1,2,3,4) ), purchases_tbl as ( Select user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, count(distinct ecommerce.transaction_id) as purchases, sum(ecommerce.purchase_revenue) as revenue, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name = "purchase" ), goal_tbl as ( Select user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, 1 as goal_completion, FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` where _table_suffix BETWEEN "20231001" and "20231031" and event_name = "select_promotion" ) Select case when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct' when regexp_contains(campaign, 'cross-network') then 'Cross-network' when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$')) and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping' when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video' when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display' when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart') or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping' when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp') or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social' when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube') or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video' when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex') or medium = 'organic' then 'Organic Search' when regexp_contains(source,'email|e-mail|e_mail|e mail') or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email' when medium = 'affiliate' then 'Affiliates' when medium = 'referral' then 'Referral' when medium = 'audio' then 'Audio' when medium = 'sms' then 'SMS' when medium like '%push' or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications' else 'Unassigned' end as channel_grouping_session, count(distinct concat(user_pseudo_id, session_id)) as sessions, sum(purchases) as purchases, sum(purchases)/count(distinct concat(user_pseudo_id, session_id)) as ecomm_conv_rate, sum(revenue) as revenue, sum(goal_completion) as goal_completion, sum(goal_completion)/count(distinct concat(user_pseudo_id, session_id)) as goal_con_rate, from acq_table left join purchases_tbl using (user_pseudo_id, session_id) left join goal_tbl using (user_pseudo_id, session_id) group by 1