Running AI And Edge Workloads On CPUs

Partner Content Power and cooling in datacenters for well over a decade have been a challenge for enterprises trying to meet their compute needs while keeping the reins on costs. Those challenges have become more acute as the compute environment stretched beyond the datacenter into the cloud and out to the edge and workloads became more fractured, spanning from AI and HPC to microservices and the Internet of Things.

The rapid innovation and adoption of AI – particularly in this new era of generative AI – will have a significant impact. Datacenters in the United States used about 3 percent of the country’s power in 2022, that could triple by the end of the decade.

A report from the Electric Power Research Institute (EPRI) in May predicted that by 2030, datacenters could consume up to 9 percent of US electricity generation, fueled by processing AI workloads. According to the EPRI’s Powering Intelligence: Analyzing Artificial Intelligence and Datacenter Energy Consumption report, AI queries need about 10 times the electricity of traditional internet searches and the generation of music, photos, and video requiring even more. For comparison, a traditional search on Google uses about 0.3 Watt-hours (Wh). A query on OpenAI’s ChatGPT requires about 2.9 Wh.

The problem is exacerbated by hyperscale cloud providers concentrating compute power in massive datacenters that can consume the same amount of power as 80,000 to 800,000 homes. Much of the training of large language models (LLMs) and processing of AI workloads happen in these cloud datacenters.

Enter Intel Xeon 6

Intel is addressing the broad range of challenges – not only the rapidly increasing compute needs of AI and HPC workloads at the high end but also the density requirements for workloads at the space-constrained edge – with the latest iterations of its venerable Xeon datacenter processors.

What the giant chipmaker introduced at the Intel Vision event in April was a reimagining of what Xeon processors should deliver to this rapidly evolving compute environment. Central to this is the introduction of two microarchitectures rather than a single core for all the CPUs in the family.

The Intel Xeon 6 processors now come in two flavors, a Performance-core (P-core) for such compute-intensive workloads as AI and HPC, delivering industry-best memory bandwidth and throughput. The Efficiency-core (E-core) chips are focused on high-density and scale-out workloads. Think edge and IoT devices and cloud-native and hyperscale workloads. The Intel Xeon 6 family will also cover the general-purpose uses cases between the two extremes, from modeling and simulation to in-memory analytics, unstructured databases, scale-out analytics, and the 5G core.

The E-cores are arriving first, with the launch announced last week at Computex. The P-cores will follow soon.

Different Cores And A Common Platform

While Intel Xeon 6 features two core options rather than the traditional single core, all the chips share a common hardware platform and software stack and both are built on the same 5 nanometer Intel 3 FinFET process technology. Intel 3 includes 18 percent higher performance-per-watt efficiency than its predecessor and uses extreme ultraviolet (EUV) lithography, a technique that allows Intel to build chips with increasingly smaller components, giving them the compute capabilities and power efficiencies needed for future workloads.

The E-cores give enterprises the ability to plug more compute capacity into the growing number of power- and space-constrained datacenter, cloud, and edge locations. That includes a processor with 288 cores, providing greater throughput for distributed cloud-scale workloads and delivering almost 100 more cores than a competitive chip with up to 192 cores for servers and cloud environments.

The chips include both a scalar engine and vector engine, enabling them to address a broad array of workloads with high performance and efficiency, as well as an out-of-order engine that drive more parallelism for improved performance and speed.

They also deliver significant upgrades in efficiency and density, key factors when talking about the edge, the cloud, and other space- and power-constrained environments. According to Intel benchmark testing and architectural projects, E-core chips deliver 2.4x the performance-per-watt and 2.5x better rack density of their 4^th-Gen predecessors, essentially providing better performance and power efficiency in smaller systems.

CPUs For AI

The P-cores will include capabilities to enable enterprises to use Xeon CPUs for generative LLM, AI, and hyperscale workloads rather than having to rely on GPUs and other accelerators, bringing with them the same management, networking, and security the organizations have grown accustom to over the past decades.

The chips support the MXFP4 data format that will reduce the next token latency by as much as 6.5 times over the 4^th Gen Xeons that used FP16, accelerating the time it takes for the LLM to process a token and speeding up the AI training process. That faster the training is done, the faster an enterprise can start processing AI workloads.

Being able to support Llama-2 models up to 70 billion parameters is another proof point that enterprises now have a family Intel X86 architecture-based CPUs for this new world of generative AI and LLM rather than having to adopt more expensive accelerators. Parameters are the variables that AI models learn during the training process and are crucial to determining the model’s performance and its ability to make accurate responses to inputs.

The more parameters there are, the better the model can understand complex underlying data patterns, and the number of parameters in the largest LLMs are growing, with some having more than 100. However, there are drawbacks to too many parameters, including how power-hungry those bigger models are. The P-cores give enterprises the compute power to run these larger models and the efficiency to better control the power output.

In addition, test results indicate that the P-cores increase Vector DB throughput by 2.35 times over the previous generation of Xeon chips, which significantly boosts generative AI and retrieval-augmented generation (RAG) performance. RAG allows enterprises to include facts from external sources to improve the reliability, accuracy, and relevancy of LLMs.

Essentially RAG is one of the techniques that helps organizations to incorporate their own data into the training of the LLMs, enabling generative AI query outputs to include such corporate information in everything from enterprise search to data analysis to customer service. It’s also a way for organizations to keep control of their data even as they use pre-trained LLMs.

Much of the data brought in by RAG comes from massive databases of unstructured data that is stored as vectors, which brings out better context and relationships between data. Those vector databases are only getting larger, so the challenge is enabling them to load data and create indices faster to accelerate retrieval of the data.

Doing More With Less

As noted above, the explosive innovation and enterprise adoption of generative AI promise to push the amount of compute power needed in datacenters even farther upward, so organizations need datacenter architectures that can run these workloads while turning down the electricity consumption. The Intel Xeon 6 processors are designed to do that.

Enterprises will be able to replace their older systems with fewer servers running the newest chips and still run the same workloads while meeting organizational sustainability goals. They will be able to do the work that 200 racks of Xeon 2^nd Gen do now but with only with 72 racks of Intel Xeon 6-powered servers, a rate of nearly three to one and at a savings of one megawatt of power.

We’re in the very early stages of running AI in the enterprise and in many ways are still figuring out what it will look like in another year or two. However, we know that there will continue to be furious innovation around LLMs and small language models and what they can do, and organizations are hungry to incorporate the technologies into their business models.

We also know that it will consume huge amounts of power and that the associated costs will continue to grow. With Intel Xeon 6 P-core and E-core processors, Intel is giving enterprises multiple options for running everything from AI and hyperscale workloads down to processors in space- and power-constrained environments in the cloud and at the edge, and do to all of that on a familiar CPU architecture that won’t require organizations to make major changes in their operations.

Enter Intel Xeon 6

Different Cores And A Common Platform

CPUs For AI

Doing More With Less

Sign up to our Newsletter