Meliorum | Melbourne Based Analytics Consultant

View Original

Matching Big Query Data with Google Analytics - Audience Overview

Whether your Google analytics is sampling too much or you are trying to blend GA data with offline data sets there are a few reasons why you would want to recreate reports from Google Analytics with BigQuery. Below is a list on how to get each metric separately and skip to the bottom if you want on query with them all together. I won’t go through SQL syntax, there are plenty of places that do that.

Below we will be replicating:

I’m using the free Google Analytics data set for the example, that is why it is set to 2017…ahh 2017 you seem so young and innocent now.

See this content in the original post

Users

Google BigQuery uses fullvisitorID to identify users. These are linked to cookies, so clear your cookies and become a new user. So to get the number of users we need to count the distinct fullvisitorids. BE CAREFUL, if you are using dates, if you count users per day, you can’t sum them up to find the total number of users over a week, a month, a year. In this example, we are just getting the total so we don’t need to worry about that.

See this content in the original post

New Users

New users are just distinguished by their visit number if visitNumber is 1 then this is their first visit, i.e. they are a new user. So we will only count the user that has their visitNumber = 1.

See this content in the original post

Sessions

Given that the google analytics table is based on a session it is strange there is no one unique identifier to count the sessions that would match up with Google Analytics. There is one way to do this, which is to add up all the totals.visits but this causes problems if we start to explore page and event data, so it is recommended to use the following method.

See this content in the original post

By adding together the fullvisitorid (the user), visitID (unique id for that user on when they came to the site, though not unique across users) and the date (as Google Analytics likes to make new sessions at midnight) we can get a unique id for each session. Some of these sessions are non-interactive, e.g. sending a payment from an offline source. This is why you need totals.visits =1 to make sure it was an interactive session.

See this content in the original post

Pageviews

There are a few ways to get the number of page views as well, you could use totals.pagviews and sum that up but again this wouldn’t work when you start exploring pages and events in more details. The following will work even if you unnest other parts of the table.

See this content in the original post

Basically, counting each session and the number of pageviews in the session and adding it all together.

See this content in the original post


Bounces and Bounce Rate

To do this the first thing to do is get the total number of bounced sessions, and the total number of all sessions and then divide one by the other:

See this content in the original post

Average Session Duration

This one gets difficult, depending on how you have unnested your tables. If you don’t unnest anything you can use avg(totals.timeOnSite) nice and easy. If you do this when you have unnested hits though you will get a double up for all the users who had more than one hit, the table will look like this:

See this content in the original post

So in this case where we are unnesting hits we need to divide the time on site by total hits so we don’t double count.

See this content in the original post

Number of Sessions per User

The rest are calculations of the other metrics, this is just sessions/users

See this content in the original post

Pages per Session

As the name suggests this is pageviews/sessions.

See this content in the original post

Putting it all together

So now if you want them all in one query, here you go:

See this content in the original post