Matching Big Query Data with Google Analytics - Audience Overview
Whether your Google analytics is sampling too much or you are trying to blend GA data with offline data sets there are a few reasons why you would want to recreate reports from Google Analytics with BigQuery. Below is a list on how to get each metric separately and skip to the bottom if you want on query with them all together. I won’t go through SQL syntax, there are plenty of places that do that.
Below we will be replicating:
I’m using the free Google Analytics data set for the example, that is why it is set to 2017…ahh 2017 you seem so young and innocent now.
Users
Google BigQuery uses fullvisitorID
to identify users. These are linked to cookies, so clear your cookies and become a new user. So to get the number of users we need to count the distinct fullvisitorid
s. BE CAREFUL, if you are using dates, if you count users per day, you can’t sum them up to find the total number of users over a week, a month, a year. In this example, we are just getting the total so we don’t need to worry about that.
New Users
New users are just distinguished by their visit number if visitNumber
is 1 then this is their first visit, i.e. they are a new user. So we will only count the user that has their visitNumber = 1.
Sessions
Given that the google analytics table is based on a session it is strange there is no one unique identifier to count the sessions that would match up with Google Analytics. There is one way to do this, which is to add up all the totals.visits
but this causes problems if we start to explore page and event data, so it is recommended to use the following method.
By adding together the fullvisitorid (the user), visitID (unique id for that user on when they came to the site, though not unique across users) and the date (as Google Analytics likes to make new sessions at midnight) we can get a unique id for each session. Some of these sessions are non-interactive, e.g. sending a payment from an offline source. This is why you need totals.visits =1 to make sure it was an interactive session.
Pageviews
There are a few ways to get the number of page views as well, you could use totals.pagviews
and sum that up but again this wouldn’t work when you start exploring pages and events in more details. The following will work even if you unnest other parts of the table.
Basically, counting each session and the number of pageviews in the session and adding it all together.
Bounces and Bounce Rate
To do this the first thing to do is get the total number of bounced sessions, and the total number of all sessions and then divide one by the other:
Average Session Duration
This one gets difficult, depending on how you have unnested your tables. If you don’t unnest anything you can use avg(totals.timeOnSite)
nice and easy. If you do this when you have unnested hits though you will get a double up for all the users who had more than one hit, the table will look like this:
So in this case where we are unnesting hits we need to divide the time on site by total hits so we don’t double count.
Number of Sessions per User
The rest are calculations of the other metrics, this is just sessions/users
Pages per Session
As the name suggests this is pageviews/sessions
.
Putting it all together
So now if you want them all in one query, here you go: