25

How do I upload data to Google BigQuery with gsutil, by using a Service Account I created in the Google APIs Console?

First I'm trying to upload data to Cloud Storage using gsutil, as that seems to be the recommended model. Everything works fine with gmail user approval, but it does not allow me to use a service account.

It seems I can use the Python API to get an access token using signed JWT credentials, but I would prefer using a command-line tool like gsutil with support for resumable uploads etc.

EDIT: I would like to use gsutil in a cron to upload files to Cloud Storage every night and then import them to BigQuery.

Any help or directions to go would be appreciated.

1

6 Answers 6

31

To extend @Mike answer, you'll need to

  1. Download service account key file, and put it in e.g. /etc/backup-account.json
  2. gcloud auth activate-service-account --key-file /etc/backup-account.json

And now all calls use said service account.

3
  • 1
    This works only if you have gcloud installed, while this question is about gsutil only.
    – bfontaine
    Commented Dec 16, 2021 at 11:04
  • 1
    @bfontaine Sure but aren't they typically installed together as part of the Google Cloud SDK?
    – Stephen
    Commented Feb 2, 2022 at 14:49
  • @Stephen yes, but you can also install gsutil separately.
    – bfontaine
    Commented Feb 2, 2022 at 17:35
26

Google Cloud Storage just released a new version (3.26) of gsutil that supports service accounts (as well as a number of other features and bug fixes). If you already have gsutil installed you can get this version by running:

gsutil update

In brief, you can configure a service account by running:

gsutil config -e

See gsutil help config for more details about using the config command.
See gsutil help creds for information about the different flavors of credentials (and different use cases) that gsutil supports.

Mike Schwartz, Google Cloud Storage Team

3

Service accounts are generally used to identify applications but when using gsutil you're an interactive user and it's more natural to use your personal account. You can always associate your Google Cloud Storage resources with both your personal account and/or a service account (via access control lists or the developer console Team tab) so my advice would be to use your personal account with gsutil and then use a service account for your application.

3
  • If I use this in a cron job in a production environment, what will then happen when I leave the company? Will the cron job fail?
    – jonathan
    Commented Sep 15, 2012 at 10:17
  • Jonathan - yes, it will fail if your user account no longer exists. We should support service accounts with gsutil, but don't yet. You could: 1) probably add the feature quickly to gsutil/oauth2_plugin/oauth2_helper.py using the existing python oauth client implementation of service accounts, or 2) retrieve the access token externally and store it in the cache location specified in ~/.boto, or 3) create a role account yourself (via gmail.com or google apps) and grant permission to that account and use it for the oauth flow.
    – Ryan Boyd
    Commented Sep 16, 2012 at 8:19
  • @RyanBoyd: Thanks! Could you please post your comment as an answer so I can accept it?
    – jonathan
    Commented Sep 16, 2012 at 10:29
3

First of all, you should be using the bq command line tool to interact with BigQuery from the command line. (Read about it here and download it here).

I agree with Marc that it's a good idea to use your personal credentials with both gsutil and bq, the bq command line tool supports the use of service accounts. The command to use service account auth might look something like this.

bq --service_account [email protected] --service_account_credential_store keep_me_safe --service_account_private_key_file myfile.key query 'select count(*) from publicdata:samples.shakespeare' 

Type bq --help for more info.

It's also pretty easy to use service accounts in your code via Python or Java. Here's a quick example using some code from the BigQuery Authorization guide.

import httplib2

from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials

# REPLACE WITH YOUR Project ID
PROJECT_NUMBER = 'XXXXXXXXXXX'
# REPLACE WITH THE SERVICE ACCOUNT EMAIL FROM GOOGLE DEV CONSOLE
SERVICE_ACCOUNT_EMAIL = '[email protected]'

f = file('key.p12', 'rb')
key = f.read()
f.close()

credentials = SignedJwtAssertionCredentials(
    SERVICE_ACCOUNT_EMAIL,
    key,
    scope='https://www.googleapis.com/auth/bigquery')

http = httplib2.Http()
http = credentials.authorize(http)

service = build('bigquery', 'v2')
datasets = service.datasets()
response = datasets.list(projectId=PROJECT_NUMBER).execute(http)

print('Dataset list:\n')
for dataset in response['datasets']:
  print("%s\n" % dataset['id'])
3
  • Thanks for your help. But I understand that gsutil is the way to go for resumable and reliable uploads. I wish to use this in a cron and not be dependent upon my personal account (what happens when I leave my company?). The cron has to continue working.
    – jonathan
    Commented Sep 15, 2012 at 10:19
  • Yes, gsutil interacts with Cloud Storage, and bq will let you load data that live in Cloud Storage into BigQuery. Both gsutil and bq will store your OAuth2 credentials that will allow the scripts to work without you having to log in every time, and yes you can run a cron job without user input each time. If you decide to use service accounts in your script, then: (1) Set the permissions of your storage bucket to include the service account address as well as your own (2) Use this service account with bq as I described above. Commented Sep 15, 2012 at 23:34
  • 1
    Note that to run bq you need to have Google Cloud SDK installed. On Ubuntu, it is sudo apt-get install google-cloud-sdk
    – akhmed
    Commented Jun 1, 2015 at 23:42
2

Posting as an answer, instead of a comment, based on Jonathan's request

Yes, an OAuth grant made by an individual user will no longer be valid if the user no longer exists. So, if you use the user-based flow with your personal account, your automated processes will fail if you leave the company.

We should support service accounts with gsutil, but don't yet.

You could do one of:

  1. Probably add the feature quickly to gsutil/oauth2_plugin/oauth2_helper.py using the existing python oauth client implementation of service accounts
  2. Retrieve the access token externally via the service account flow and store it in the cache location specified in ~/.boto (slightly hacky)
  3. Create a role account yourself (via gmail.com or google apps) and grant permission to that account and use it for the OAuth flow.

We've filed the feature request to support service accounts for gsutil, and have some initial positive feedback from the team. (though can't give an ETA)

1
  • 1
    This answer is obsolete, see Mike's answer for more up to date info.
    – Benson
    Commented Apr 3, 2013 at 18:59
1

As of today you don’t need to run any command to setup a service account to be used with gsutil. All you have to do is to create ~/.boto with the following content:

[Credentials]
gs_service_key_file=/path/to/your/service-account.json

Edit: you can also tell gsutil where it should look for the .boto file by setting BOTO_CONFIG (docs).

For example, I use one service account per project with the following config, where /app is the path to my app directory:

  • .env:
    BOTO_CONFIG=/app/.boto
    
  • .boto:
    [Credentials]
    gs_service_key_file=/app/service-account.json
    
  • script.sh:
    export $(xargs < .env)
    gsutil ...
    

In the script above, export $(xargs < .env) serves to load the .env file (source). It tells gsutil the location of the .boto file, which in turn tells it the location of the service account. When using the Google Cloud Python library you can do all of this with GOOGLE_APPLICATION_CREDENTIALS, but that’s not supported by gsutil.

Not the answer you're looking for? Browse other questions tagged or ask your own question.