Skip to main content

Microsoft, Beihang release MoRA, an efficient LLM fine-tuning technique

power cube
Image credit: VentureBeat using DALL-E

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Researchers from Microsoft and Beihang University have introduced a new technique for fine-tuning large language models (LLMs) at a fraction of the cost it usually takes.

The new technique, called MoRA, is a parameter-efficient fine-tuning (PEFT) technique that addresses some of the limitations of other popular techniques such as low-rank adaptation (LoRA). MoRA is especially useful when you want to fine-tune the model on tasks that require the model to acquire new knowledge. With PEFT methods becoming increasingly popular in the enterprise, MoRA can become an important addition to the growing toolset of LLM application developers.

The limitations of LoRA

Classic fine-tuning requires updating all the parameters of an LLM. When the model contains billions of parameters, full fine-tuning can become costly and slow. Parameter-efficient fine-tuning techniques are based on the premise that when fine-tuning LLMs for downstream applications, you do not need to update all the parameters. PEFT methods find the optimal subset of parameters that need to be modified to configure the model for the target task. 

LoRA has gained popularity as a PEFT technique due to its ability to update parameters via low-rank matrices, which map the full-rank weight matrix to a very small subspace. LoRA significantly reduces memory requirements and facilitates the storage and deployment of fine-tuned models. 

However, while LoRA performs well on tasks such as text classification and instruction tuning, it struggles with more complex tasks that require enhancing the knowledge and capabilities of LLMs, such as mathematical reasoning and continual pre-training. Several studies have found that LoRA’s low-rank updating mechanism may limit the ability of large language models to effectively learn and memorize new knowledge.

Since the rank of the LoRA adapter is significantly smaller than the full rank of the model, “this limitation restricts capacity to store new information via fine-tuning,” the researchers write.

MoRA

LoRA vs MoRA
LoRA (left) uses low-rank matrices while MoRA (right) uses a single square matrix for parameter-efficient fine-tuning (source: arxiv)

To address the limitations of LoRA, the researchers introduce MoRA, a PEFT technique that uses a square matrix instead of low-rank matrices. The main idea behind MoRA is to use trainable parameters in a way that achieves the highest possible rank in the space of the model’s original dimensions. 

Unlike LoRA, the input and output dimensions of the MoRA adapter do not match those of the original model, which makes it impossible to combine them in the same matrix multiplication operation. To bridge this gap, the researchers developed a compression/decompression function that transforms inputs between the two spaces. This algorithm allows MoRA to be easily plugged into LLMs of different sizes.

The square weight matrix gives MoRA a stronger capacity to learn new knowledge than a LoRA model of the same size, according to the researchers.

MoRA in action

The researchers compared equally sized LoRA and MoRA models on various tasks and settings. On memorization tasks, MoRA significantly outperformed LoRA and came much closer to the performance of a fully fine-tuned model with fewer parameters and training steps. 

MoRA training curve
MoRA’s loss curve is very similar to full fine-tuning for knowledge memorization tasks (source: arxiv)

“Our method shows significant improvements over LoRA with the same number of trainable parameters, benefiting from high-rank updating,” the researchers write.

In instruction tuning and mathematical reasoning tasks, MoRA showed performance that is almost on par with LoRA. However, for continual pretraining in biomedical and financial domains, MoRA outperformed LoRA, benefiting from its high-rank updating to memorize new knowledge.

The researchers also found that increasing the rank of the MoRA adapter can eliminate the performance gap between PEFT and full fine-tuning in mathematical reasoning tasks, though it comes at higher training and storage costs.

PEFT for the enterprise

Fine-tuning is an important use case for enterprise LLM applications. In addition to increasing the capabilities and accuracy of LLMs on proprietary knowledge, fine-tuning can enable companies to use smaller models for tasks that previously required expensive frontier models.

Currently, LoRA and its variants are the gold standards for parameter-efficient fine-tuning. There is a rich ecosystem of tools and platforms for creating LoRA adapters. For example, S-LoRA is a framework that enables developers to run thousands of LoRA adapters on a single GPU, unlocking applications that require many fine-tuned LLMs, such as models that are customized based on the content of each user.

The researchers at Microsoft and Beihang have released an open-source implementation of MoRA, which is compatible with LoRA. This can turn out to be an important tool for enterprise applications that want to add new knowledge to base models.