Matching Big Query Data with Google Analytics - Landing pages
Following on of my series of recreating Google Analytics reports in Big Query, this time we are looking at the Landing page report.
Previously we have looked at:
Landing pages follows a lot of the same rules of channel acquisition but with one major difference. There is no landing page dimension in BigQuery. So helpful, right! To get the landing page you need to get the first page in the session, that would be the one with the smallest hit number. Now, you can’t assume that it will be the first hit, as sometimes there are non-interactive events firing before the page view. This is why you need to rank all the pages in a session and get the first one, as the query below shows.
SELECT
PagePath
FROM (
SELECT
hits.page.PagePath,
RANK() OVER (PARTITION BY fullvisitorid, visitid, date ORDER BY hits.hitNumber) AS ranking
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hits
WHERE
_table_suffix BETWEEN "20170701"
AND "20170731"
AND totals.visits = 1
AND hits.type = "PAGE"
GROUP BY
PagePath,
hitNumber,
fullvisitorid,
visitid,
date)
WHERE
ranking = 1
GROUP BY
pagePath
order by users desc
This line here:
RANK() OVER (PARTITION BY fullvisitorid, visitid, date ORDER BY hits.hitNumber) AS ranking`
will rank each hit from 1 to x based on the hit number, and by adding:
and hits.type = “PAGE”
we limit the ranking to pages only.
Now we have the first page in the session, we can get the rest of the metrics, just like channel groupings.
SELECT PagePath, sum(sessions) as sessions, sum(new_users) as new_users, round(sum(new_users) /sum(sessions)*100,2) as percentage_new_user, ROUND(sum(pageviews)/sum(sessions),2) AS pageviews_per_session, ROUND(sum(total_session_duration)/sum(sessions),2) AS avg_session_duration_secs, ROUND(sum(bounces)/sum(sessions)*100,2) AS bounceRate, sum(transactions) as transactions, ROUND(sum(transactions)/sum(sessions)*100,2) AS eCom_conversion_rate, ROUND(sum(total_revenue),2) AS total_revenue, FROM ( SELECT hits.page.PagePath, COUNT(DISTINCT fullvisitorId) as users, COUNT(DISTINCT CASE WHEN visitNumber = 1 THEN fullvisitorId END ) AS new_users, COUNT(DISTINCT CONCAT(fullvisitorId, CAST(visitId AS string), date)) AS sessions, COUNT(DISTINCT CASE WHEN hits.type = "PAGE" THEN CONCAT(fullvisitorID, CAST(visitId AS STRING), date, hits.hitNumber) END ) AS pageviews, SUM(totals.timeOnSite/ totals.hits ) AS total_session_duration, COUNT(DISTINCT CASE WHEN totals.bounces = 1 THEN CONCAT(fullvisitorId, CAST(visitId AS string), date) END ) AS bounces, sum(totals.transactions) AS transactions, SUM( totals.transactionRevenue/1000000) AS total_revenue, RANK() OVER (PARTITION BY fullvisitorid, visitid, date ORDER BY hits.hitNumber) AS ranking FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`, UNNEST (hits) hits WHERE _table_suffix BETWEEN "20170701" AND "20170731" AND totals.visits = 1 AND hits.type = "PAGE" GROUP BY PagePath, hitNumber, fullvisitorid, visitid, date) WHERE ranking = 1 GROUP BY pagePath order by sessions desc
Any requests on what to do next or questions, feel free to comment!