Merging Google Analytics Data with Weather Data
I’ve always been interested in how external factors affect online shopping behaviour. Many of these we can’t measure, like the random chat you have with a friend that leads you to buying a new camera, but some we can, like the weather.
Google makes this even easier by providing a weather dataset for “free” (included with your Google Cloud costs). Details about this dataset is available here: https://console.cloud.google.com/marketplace/details/noaa-public/gsod and goes back to 1929.
For this you need a few things:
GA360 account so you can get Latitude and Longitude of your users
Python set up on your machine (follow this https://realpython.com/installing-python/) I like to use jupyter notebooks too (https://jupyter.org/install). The other option is to use notebooks in Kaggle - https://www.kaggle.com/notebooks
Cloud account (cloud.google.com)
First things first, we need to import the libraries we need:
The next step is to import the data you need, in this example we are looking at all the data for 2019. First getting the GA data:
Now the weather data
So now we have the two datasets we need a way to merge them. To do this we need to find the closest weather station to the user. Without getting into some hard core maths, this is done using by finding the distance between two points, and then we pick the smallest distance. So we create a function for this:
After just getting the location of each weather station, then we loop through all the locations in the GA data set and assign the closest weather station:
Once we make sure all the data is in the right format we merge them on date and location.
Now we have to do a bit of clean up. If the didn’t record a temperature it is set to 999.9 and rain fall at 99.9. Another fun point is the temperature is stored in Fahrenheit so need to convert it to Celsius.
As some basic analysis we split it by: Hot, Cold, Rainy and average:
And then we can see conversion rate by weather
Or we can get a bit more technical and see the conversion rate across temperature and rain fall.
If you are a shop that sells different categories it would be worth to break the transactions down by category, to prove that gum boots sell better when it rains, sun dresses when it’s hot and hopefully other things that aren’t so obvious.
Any questions or suggestions on how to make it better please let me know.