AMD's DC chief happy to work with Intel and others to chip away at Nvidia's AI empire

'If everybody's got their own little ecosystem, it's very inefficient'

AMD and Intel have been rivals for decades, but there's at least one thing they can agree on: they have a common enemy in Nvidia. And the enemy of your enemy can be your friend.

"We are absolutely committed to open [ecosystems], even open to working with customers or others that are directly competing with us in the end. That is to everybody's benefit, ourselves included," AMD EVP of datacenter Forrest Norrod told The Register in an interview this month.

"If everybody's got their own little ecosystem – number one, it's very inefficient, and number two, that limits our ability to compete."

Over the past few years, both AMD and Intel have worked to bring ever more competitive accelerators to market in their bid to win share from Nvidia. At Computex, AMD revealed its next-gen accelerator – a bandwidth-boosted version of the MI300X that'll pack 288 GB of HBM3e memory when it arrives in Q4 2024.

Intel, for its part, touted its Gaudi3 accelerators, which we've now learned have been priced to undercut Nvidia's parts.

However, when Norrod talks about establishing open ecosystems with AMD's competitors, he's not talking about the accelerators – rather, the technologies used to stitch them together at scale.

Last week, the House of Zen revealed it was working with Broadcom, Cisco, Google, HPE, Intel, Meta, and Microsoft to develop a high-speed interconnect capable of challenging Nvidia's NVLink and NVSwitch offerings.

Nvidia uses the tech to make multiple GPUs behave as one. Traditionally, this has been done at the node level – usually with four or eight GPUs. AMD has been able to do the same using its own proprietary Infinity Fabric interconnect, while Intel has baked loads of high-speed Ethernet NICs into its accelerators.

Meanwhile, to scale across multiple nodes, Ethernet or InfiniBand are usually used. However, Nvidia is increasingly relying on NVLink to scale across multiple nodes in a rack – take the massive NVL72 it showed off at GTC, which uses the tech to stitch together 72 Blackwell GPUs spread across 18 nodes at 1.8 TB/sec.

And that's just what the members of the Ultra Accelerator Link, aka UA Link, consortium aim to replicate. And because it's being developed as an open standard, the hope is it will ultimately win out the same way Ethernet did in the datacenter.

"We talked a lot about the Ultra Ethernet standard and Ultra Accelerator Link … We think those are critical elements to enabling the industry to come together and innovate – to still have optimized solutions, but not locking them down so as to be proprietary solutions," Norrod said in a jab at Nvidia's product stack.

"The importance of UA Link – and Ultra Ethernet for that matter – is allowing customers to build out system infrastructure that can accommodate choice."

You can find out more about UA Link and the adjacent Ultra Ethernet Consortium over on our sibling site The Next Platform. But from Norrod's perspective, by finding common ground, chipmakers, designers, hyperscalers, and networking vendors can combine their efforts to catch up and eventually overtake Nvidia.

While open standards like Ethernet have historically won in the long run, that doesn't mean there isn't still room for a handful of proprietary technologies. Nvidia's InfiniBand has been around since 1999 and accounts for the majority of AI network deployments, for example.

However, it seems Norrod believes Nvidia is bound to stumble eventually.

"Nvidia is a great company. Jensen has done a heck of a job. They made serious investments in AI much sooner," he noted, commending his rival's foresight. "I think that in the end, open systems, open ecosystems will tend to win out – in part because the proprietary closed system is a bet that your engineers are better than everybody else's engineers combined. That's a difficult proposition, particularly in the datacenter."

Starting with networking and accelerator fabrics seems a natural place to begin, as it still leaves plenty of room for differentiation at the platform level. Intel has pretty much wound down its switching business and AMD outright lacks one.

Meanwhile, their biggest DC customers – the cloud and hyperscalers – generally try to avoid getting married to a single vendor, so standardizing on these technologies will no doubt win them some favor. This is further evidenced by the hyperscalers' early involvement in the two consortiums.

Ultra Ethernet and UA Link are only the latest examples of long-time rivals overcoming their differences to move the industry forward faster. Late last year, Arm CPU slinger Ampere Computing came together with a motley crew of AI upstarts to form the AI Platform Alliance in a bid to upset Nvidia's hegemony.

The group includes Cerebras Systems, Graphcore, Furiosa, Kalray, Kinara, Luminous, Neuchips, Rebellions, and Sapeon to establish an open AI ecosystem.

Since then, Ampere and Cerebras have both announced collaborations with Qualcomm to fill gaps in their portfolios. The idea is that individual startups can focus on their strengths, rather than spreading themselves thin trying to do everything.

Whether these efforts will be enough to unseat Nvidia in the long run, only time will tell. ®

More about

TIP US OFF

Send us news


Other stories you might like