Systems

Google says its 'Trillium' TPUs are ready to power the next-generation of AI models

Promises 4.7x performance boost over its older silicon

Tue 14 May 2024 // 20:45 UTC

I/O Google blew the lid off its sixth tensor processing unit (TPU) codenamed Trillium, designed to support a new generation of bigger, more capable of large language and recommender models.

Initially built to accelerate Google's internal machine learning workloads, like those built into Gmail, Google Maps, and YouTube, the search giant began making the matrix math accelerators available on its cloud in 2018.

Six generations later, Google's TPUs are central to the development of the Gemini large language models behind its growing portfolio of generative AI apps and services.

According to Google, Trillium boasts a 4.7x increase in peak compute performance and twice the high bandwidth memory capacity and bandwidth of its earlier TPU v5e design, which we looked at last summer. Google has also doubled the interchip interconnect bandwidth.

Looking at the v5e's spec sheet, Google's claims of a 4.7x boost suggest that the new chip is capable of roughly 926 teraFLOPS at BF16 and 1,847 teraFLOPS at INT8. However, that's assuming Google isn't relying on lower precision INT4 or FP4 datatypes to achieve that score, like Nvidia is with its Blackwell chips.

This would make Trillium about twice as fast as Google's TPU v5p accelerators which it announced less than six months ago.

According to Google these performance gains were achieved by increasing the size of the TPU's matrix multiple units (MXUs) — heart of the chip — and boosting the clock speed.

Alongside the MXU improvements, the chip also boasts Google's third-gen SparseCore, a specialized accelerator designed to process large embeddings commonly found in ranking and recommender systems.

Meanwhile, a doubling of bandwidth and capacity means that we're looking at 32GB of HBM operating at 1.6TB/s and a chip-to-chip interconnect capable of 3.2 Tbps.

Google claims the higher memory capacity will enable the chip to support bigger models containing more weights and larger key-value caches — the latter being important for handling large numbers of concurrent users.

As a general rule, you need about 1GB of memory for every billion parameters when training or inferencing a model at 8-bit precision. So a 32GB TPUv6 would be able to support models up to about 30 billion parameters — double that when using models quantized to 4-bit precision.

The higher chip-to-chip interconnect bandwidth, meanwhile, means that multiple TPUs can be strung together much more efficiently in order to support inferencing or training on much larger models.

In terms of scalability, Trillium looks quite similar to the v5e instances it replaces in that it supports pods with up to 256 chips. Multiple Pods can then be networked using Google's multisplice tech and Titanium infrastructure processing units to support training workloads scaling to "tens of thousands of chips."

Despite the boost in performance, Google claims its latest TPU is capable of delivering those FLOPS using 67 percent less power than its previous generation.

They add that several customers, including Nuro, Deep Genomics, Deloitte and Google's own DeepMind team will be among the first to put Trillium through its paces training and running their respective models.

However, it remains to be seen when the rest of us will be able to get our hands on Google's shiny new TPUs. For the moment, Google is soliciting interest via an online form which you can find here. ®

Topics

Special Features

Vendor Voice

Resources

Systems

Google says its 'Trillium' TPUs are ready to power the next-generation of AI models

Promises 4.7x performance boost over its older silicon

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

So much for green Google ... Emissions up 48% since 2019

AI to boost datacenter capex by 28.5% and become the top server workload

OpenAI, Google ink deals to augment AI efforts with news – it was Time for better sources

Accelerate migration and go beyond virtualisation to cloud native

Datacenter demand driven by AI... but constrained by power shortages

On-prem AI has arrived – the solution to cloudy problems no one really has

AI's appetite for power could double datacenter electricity bills by 2030

OpenAI develops AI model to critique its AI models

Alibaba Cloud reveals its datacenter design, homebrew network used for LLM training

Lambda on the hunt for 'another $800M' to fuel its GPU cloud

Intel flashes 4 Tbps optical chiplet to supercharge datacenters

A friendly guide to local AI image gen with Stable Diffusion and Automatic1111

About Us

Our Websites

Your Privacy