Manage TPU resources

This page describes how to manage Cloud TPU resources using:

The Google Cloud CLI, which provides the primary CLI to Google Cloud.
The Google Cloud console, which provides an integrated management console for your Google Cloud resources.

Prerequisites

Before you run these procedures, you must install the Google Cloud CLI, create a Google Cloud project, and enable the Cloud TPU API. For instructions, see Set up the Cloud TPU environment.

If you are using the Google Cloud CLI, you can run commands using the Cloud Shell, a Compute Engine VM, or your local machine. The Cloud Shell lets you interact with Cloud TPUs without having to install any software. The Cloud Shell disconnects after a period of inactivity. If you're running long-running commands, we recommend installing the Google Cloud CLI on your local machine. For more information on the Google Cloud CLI, see the gcloud Reference.

Provision Cloud TPUs

You can provision a Cloud TPU using gcloud, the Google Cloud console, or the Cloud TPU API.

There are two methods for provisioning TPUs using gcloud:

Using queued resources: gcloud compute tpus queued-resources create
Using the Create Node API: gcloud compute tpus tpu-vm create

The best practice is to provision TPUs using queued resources. When you request queued resources, the request is added to a queue maintained by the Cloud TPU service. When the requested resource becomes available, it's assigned to your Google Cloud project for your immediate exclusive use. For more information, see Managed queued resources.

When using Multislice, you must use queued resources and specify the following additional parameters:

export NODE_COUNT=node_count
export NODE_PREFIX=your_tpu_prefix # Optional

where:

${NODE_COUNT} is the number of slices to create
${NODE_PREFIX} is the prefix you specify to generate names for each slice. A number is appended to the prefix for each slice. For example, if you set ${NODE_PREFIX} to mySlice, the slices are named: mySlice-0, mySlice-1, continuing numerically for each slice.

For more information about Multislice, see the Multislice introduction

Create a Cloud TPU using the Create Node API

When creating a Cloud TPU, you must specify the TPU VM image (also called TPU software version). To determine which VM image you should use, see TPU VM images.

You also need to specify the TPU configuration in terms of TensorCores or TPU chips. For more information, see the section for the TPU version you are using in System architecture.

gcloud

To create a TPU using the Create Node API, use the gcloud compute tpus tpu-vm create command.

The following command uses a TensorCore-based configuration:

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=us-central2-b \
  --accelerator-type=v4-8 \
  --version=tpu-software-version

Command flag descriptions

zone: The zone where you plan to create your Cloud TPU.
accelerator-type: The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version: The TPU software version.
shielded-secure-boot (optional): Specifies that the TPU instances are created with secure boot enabled. This implicitly makes them Shielded VM instances. See What is Shielded VM? for more details.

The following command creates a TPU with a specific topology:

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=us-central2-b \
  --type=v4 \
  --topology=2x2x1 \
  --version=tpu-software-version

Required flags

tpu-name: The name of the TPU VM you are creating.
zone: The zone where you are creating your Cloud TPU.
type: The TPU version you want to use. For more information, see TPU versions.
topology: The physical arrangement of TPU chips, specifying the number of chips in each dimension. For more information about supported topologies for each TPU version, see TPU versions.
version: The TPU software version you want to use. For more information, see TPU software versions.

Console

In the Google Cloud console, go to the TPUs page:

Go to TPUs
Click Create TPU.
In the Name field, enter a name for your TPU.
In the Zone box, select the zone in which to create the TPU.
In the TPU type box, select an accelerator type. The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
In the TPU software version box, select a software version. When creating a Cloud TPU VM, the TPU software version specifies the version of the TPU runtime to install. For more information, see TPU VM images.
Click Create to create your resources.

curl

The following command uses curl to create a TPU.

$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.16.1-pjrt', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name

Required fields

runtime_version: The Cloud TPU runtime version that you want to use.
project: The name of your enrolled Google Cloud project.
zone: The zone where you're creating your Cloud TPU.
node_name: The name of the TPU VM you're creating.

Run a startup script

You can run a startup script on each TPU VM by specifying the --metadata startup-script flag when creating the TPU VM. The following command creates a TPU VM using a startup script.

$ gcloud compute tpus tpu-vm create tpu-name \
    --zone=us-central2-b \
    --accelerator-type=tpu-type \
    --version=tpu-vm-tf-2.16.1-pjrt \
    --metadata startup-script='#! /bin/bash
      pip3 install numpy
      EOF'

Connect to a Cloud TPU

gcloud

Connect to your Cloud TPU using SSH:

$ gcloud compute tpus tpu-vm ssh tpu-name --zone=zone

When you request a slice larger than a single host, Cloud TPU creates a TPU VM for each host. The number of TPU chips per host depends on the TPU version.

To install binaries or run code, connect to each TPU VM using the tpu-vm ssh command.

$ gcloud compute tpus tpu-vm ssh tpu-name

To connect to a specific TPU VM using SSH, use the --worker flag which follows a 0-based index:

$ gcloud compute tpus tpu-vm ssh tpu-name --worker=1

To run a command on all TPU VMs with a single command, use the --worker=all and --command flags:

$ gcloud compute tpus tpu-vm ssh tpu-name \
  --project=your_project_ID \
  --zone=zone \
  --worker=all \
  --command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html'

For Multislice, you can run a command on a single VM using the enumerated TPU name, with each slice prefix and the number appended to it. To run a command on all TPU VMs in all slices, use the --node=all, --worker=all, and --command flags, with an optional --batch-size flag.

$ gcloud compute tpus queued-resources ssh ${QUEUED_RESOURCE_ID} \
  --project=project_ID \
  --zone=zone \
  --node=all \
  --worker=all \
  --command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html' \
  --batch-size=4

Console

To connect to your TPUs in the Google Cloud console, use SSH-in-browser:

In the Google Cloud console, go to the TPUs page:

Go to TPUs
In the list of TPU VMs, click SSH in the row of the TPU VM that you want to connect to.

Note: When you connect to TPU VMs using the Google Cloud console, Compute Engine creates an ephemeral SSH key for you.

List your Cloud TPU resources

You can list all of your Cloud TPUs in a specified zone.

gcloud

$ gcloud compute tpus tpu-vm list --zone=zone

Console

In the Google Cloud console, go to the TPUs page:

Go to TPUs

Retrieve information about your Cloud TPU

You can retrieve information about a specified Cloud TPU.

gcloud

$ gcloud compute tpus tpu-vm describe tpu-name \
  --zone=zone

Console

In the Google Cloud console, go to the TPUs page:

Go to TPUs
Click the name of your Cloud TPU. The console displays the Cloud TPU detail page.

Stop your Cloud TPU resources

You can stop a single Cloud TPU to stop incurring charges without losing your VM's configuration and software.

gcloud

$ gcloud compute tpus tpu-vm stop tpu-name \
  --zone=zone

Console

In the Google Cloud console, go to the TPUs page:

Go to TPUs
Select the checkbox next to your Cloud TPU.
Click Stop.

Start your Cloud TPU resources

You can start a Cloud TPU when it is stopped.

gcloud

$ gcloud compute tpus tpu-vm start tpu-name \
  --zone=zone

Console

In the Google Cloud console, go to the TPUs page:

Go to TPUs
Select the checkbox next to your Cloud TPU.
Click Start.

Delete a Cloud TPU

Delete your TPU VM slices at the end of your session.

gcloud

$ gcloud compute tpus tpu-vm delete tpu-name \
  --project=project-id \
  --zone=zone \
  --quiet

Command flag descriptions

zone: The zone where you plan to delete your Cloud TPU.

Console

In the Google Cloud console, go to the TPUs page:

Go to TPUs
Select the checkbox next to your Cloud TPU.
Click Delete.

Advanced configurations

Specify custom network resources

When you create the TPU, you can choose to specify a network or subnetwork.

gcloud

To specify the network or subnetwork using the gcloud CLI, use the following command flags:

--network NETWORK --subnetwork SUBNETWORK

curl

To specify the network or subnetwork in a curl call, add the following parameters to the request body:

network_config: {network: 'NETWORK', subnet: 'SUBNETWORK', enable_external_ips: true}

Network

You can optionally specify the network to use for the TPU. If not specified, the default network is used.

Valid network formats:

https://www.googleapis.com/compute/{version}/projects/{proj-id}/global/networks/{network}
compute/{version}/projects/{proj-id}/global/networks/{network}
compute/{version}/projects/{proj-##}/global/networks/{network}
projects/{proj-id}/global/networks/{network}
projects/{proj-##}/global/networks/{network}
global/networks/{network}
{network}

Subnetwork

You can specify a specific subnetwork to use for the TPU. The specified subnetwork needs to be in the same region as the zone where the TPU runs.

Valid Formats:

https://www.googleapis.com/compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork}
compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork}
compute/{version}/projects/{proj-##}/regions/{region}/subnetworks/{subnetwork}
projects/{proj-id}/regions/{region}/subnetworks/{subnetwork}
projects/{proj-##}/regions/{region}/subnetworks/{subnetwork}
regions/{region}/subnetworks/{subnetwork}
{subnetwork}

Enable Private Google Access

To connect to TPU VMs using SSH, you need to either add access configurations for the TPU VMs, or turn on the Private Google Access for the subnetwork to which the TPU VMs are connected.

To add access configurations, you must set enable_external_ips. When you create a TPU, enable_external_ips is set by default.

If you want to opt out, enable internal IPs:

gcloud

Use the --internal-ips flag when creating a TPU:

--internal-ips

curl

Add the following parameters to the request body:

network_config: {enable_external_ips: false}

After you have configured Private Google Access, connect to the VM using SSH.

Attach a custom service account

Each TPU VM has an associated service account it uses to make API requests on your behalf. TPU VMs use this service account to call Cloud TPU APIs and access Cloud Storage and other services. By default, your TPU VM uses the default Compute Engine service account.

The service account must be defined in the same Google Cloud project where you create your TPU VM. Custom service accounts used for TPU VMs must have the TPU Viewer role to call the Cloud TPU API. If the code running in your TPU VM calls other Google Cloud services, it must have the roles necessary to access those services.

When you create a TPU, you can choose to specify a custom service account using the --service-account flag. For more information about service accounts, see Service Accounts.

Use the following commands to specify a custom service account.

gcloud

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=us-central2-b \
  --accelerator-type=tpu-type \
  --version=tpu-vm-tf-2.16.1-pjrt \
  --service-account=your-service-account

curl

$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.16.1-pjrt', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
service_account: {email: 'your-service-account'} \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name

Enable custom SSH methods

Set up a firewall for SSH.

The default network is preconfigured to allow SSH access to all VMs. If you don't use the default network, or you have changed the default network settings, you might need to explicitly enable SSH access by adding a firewall rule:
```
$ gcloud compute tpus tpu-vm compute firewall-rules create \
  --network=network allow-ssh \
  --allow=tcp:22
```
Connect to the TPU VMs using SSH.
```
$ gcloud compute tpus tpu-vm ssh tpu-name \
  --zone=us-central2-b \
  --project=project-id
```
Required fields
- tpu-name: Name of the TPU VM.
- zone: The zone where you created the TPU VM.
- project-id: The name of your Google Cloud project.
For a list of optional fields, see the gcloud API documentation.