Skip to main content

Google’s RecurrentGemma brings advanced language AI to edge devices

Credit: VentureBeat using Midjourney
Credit: VentureBeat using Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Google yesterday unveiled RecurrentGemma, a new open language model that enables the use of advanced AI text processing and generation on resource-constrained devices like smartphones, IoT systems and personal computers. Continuing its recent push into small language models (SLMs) and edge computing, Google’s novel architecture drastically reduces memory and processing requirements while maintaining excellent performance on par with larger language models (LLMs). This makes RecurrentGemma ideally suited for applications that demand real-time responses, such as interactive AI systems and real-time translation services.

Why today’s language models are resource pigs

Today’s state-of-the-art language models, like OpenAI’s GPT-4, Anthropic’s Claude and Google’s Gemini, rely on the Transformer architecture, which sees memory and computational needs increase with the size of the input data processed. This is because they consider each new piece of information in relation to every other piece of information in parallel, leading to a substantial increase in memory and processing as data volumes grow. Consequently, these large language models are unsuitable for deployment on resource-constrained devices and must rely on remote servers, hindering the development of real-time edge applications.

How RecurrentGemma works 

RecurrentGemma achieves its efficiency by focusing on smaller segments of the input data at any given time, rather than considering all the information in parallel like Transformer-based models do. This localized attention allows RecurrentGemma to process long text sequences without the need to store and analyze large amounts of intermediate data, which is the main reason Transformers are so memory-hungry. This approach reduces the computational load and speeds up processing without significantly compromising performance.

RecurrentGemma uses techniques that are conceptually older than those used in modern Transformer-based models. The core of RecurrentGemma’s efficiency comes from linear recurrences, a fundamental component of traditional recurrent neural networks (RNNs). 

RNNs were the standard for processing sequential data before the advent of Transformers. They operate by maintaining a hidden state that updates as each new data point is processed, effectively “remembering” previous information in a sequence. 

This approach is well-suited for tasks that involve sequential data, such as language processing. By maintaining a constant level of resource usage regardless of the input data, RecurrentGemma can handle extended text processing tasks while keeping memory and computational requirements in check, making it well-suited for deployment on resource-limited edge devices, and eliminating the dependency on remote cloud computing resources.

The model effectively combines the strengths of both RNNs and attention mechanisms to address the shortcomings of Transformers in situations where efficiency is critical. This makes RecurrentGemma not just a throwback to earlier models but a significant step forward.

What it means for the edge, GPUs and AI processors 

RecurrentGemma’s design focuses on minimizing the need to continuously reprocess large data volumes, one of the primary reasons GPUs are favored for AI tasks. By reducing the scope of processing, RecurrentGemma can operate more efficiently, potentially diminishing the need for high-powered GPUs in many scenarios.

The reduced hardware demands make models like RecurrentGemma more suitable for edge computing applications, where local processing power is typically lower than in servers designed for hyperscale clouds. This enables the deployment of advanced AI language processing directly on edge devices like smartphones, IoT devices, or embedded systems without relying on constant cloud connectivity.

While RecurrentGemma and other SLMs may not completely eliminate the need for GPUs or specialized AI processors in all cases, this shift towards smaller, faster models could accelerate the development and deployment of AI use cases at the edge, transforming the way we interact with technology on the devices we use every day.

The introduction of RecurrentGemma marks an exciting new development in the evolution of language AI, bringing the power of advanced text processing and generation to the edge. As Google continues to refine this technology, it’s clear that the future of AI lies not just in the cloud but also in the palms of our hands.