Can you please help provide ways how I can load my google analytics data in biGQuery to Redshift? Can Cloud Function be used for this? or, how do I trigger this from the BigQuery side instead of using a python script to call bigquery?
1 Answer
You can utilise two cloud functions to get the data into S3. Once it's in S3, you can have your own mechanism (e.g. Lambda function) to import the data into Redshift.
Preamble: set up a Stackdriver export trigger
We will trigger our first Cloud Function whenever the latest Google Analytics daily sessions table is available. This is done through triggering a Pub/Sub message whenever Stackdriver Logging indicates that the latest table has been loaded. In order to set up this trigger, follow these steps (refer to the "Pub/Sub & Stackdriver" section).
Cloud Function 1: Export BigQuery table
- Trigger: Pub/Sub message for when a new daily table has been loaded
- Workflow
- Export table as JSON (or Avro, Parquet)
- Save JSON in Google Cloud Storage
Cloud Function 2: Transfer export file to S3
- Trigger: New file in Google Cloud Storage bucket
- Workflow
- Utilising boto read file from Google Cloud Storage
- Transfer file to S3
- Delete or archive file
-
-
@vinoaj, thanks for this! I tried following the steps Pub/Sub, and got stuck at cloud function [1] exporting the bigquery table in GCS.But i get an error - updated in the question– JustineCommented Nov 29, 2019 at 15:37
-
@Justine that looks like it might be a syntax error. Are you able to provide your whole code to debug?– vinoajCommented Nov 29, 2019 at 20:17
-
@Justine - re-looking at the error message, I suspect what you have is a multiline query statement. In that case your Python syntax should be (i.e. using triple quotation marks): QUERY="""......"""– vinoajCommented Nov 29, 2019 at 20:48
-
@vinoaj, thank you, that helps, but I'm getting another error. I have updated my question above to include the code. Do i need to do additional setup?– JustineCommented Nov 30, 2019 at 21:19