3

I have a schema stored in Big Query and I need to transfer that to the amazon s3 bucket ..What is the process ?.. The BigQuery data is of other organization who is willing to give proper access to transfer the data. I found ways to import from s3 bucket to gcp but no way to export from big query to s3 directly..I m really stuck in this part. Also I need to schedule this process as the data in bigquery is changing and I constantly want data to my s3 bucket everyday . Please mention references for that. Note that per days data is gonna be in TBs

2 Answers 2

5

AFAIK there is no direct bigquery to s3 pipeline but the work around is BigQuery to Google cloud storage then to s3

  1. First step is to setup gcloud sdk on the machine where the job would be scheduled.

  2. Configure gcloud and pay special attention to boto configuration you can skip this step if the process is going to run on an EC2 with a role attached that allows it to write to S3 (also a better security practice than using keys)

  3. copy data from bigquery to gcs

     bq --location=US extract --destination_format AVRO --compression SNAPPY [source_table] [target location on gcs]
    
  4. copy the file from gcs to s3

     gsutil rsync -r [target location on gcs] to [target location on s3]
    

Note: this process works fine for GA360 data because it is already partitioned by day but if this is not the case and you can't separate the tables then you can schedule a query on GCP to create the table with incremental logic that will be used as the [source table]

update AWS released a BQ glue connector that can do that

0

I guess this is possible directly using the Export query: Ref: Transferring BigQuery data to amazon s3 bucket

EXPORT DATA WITH CONNECTION CONNECTION_REGION.CONNECTION_NAME
OPTIONS(uri="s3://BUCKET_NAME/PATH", format="FORMAT", ...)
AS QUERY

Not the answer you're looking for? Browse other questions tagged or ask your own question.