Introducing: Vera Model Routing

The Vera Engineering Blog
3 min readNov 27, 2023

--

Hello, world, and welcome to Vera’s Engineering Blog. For our first post, we’re excited to share more of what we’ve been building… the latest and greatest new Vera feature: model routing and optimization!

1) What

tl;dr — Model Routing sends prompt requests to the optimal model for the task at hand. That could mean the one with best performance, lowest latency, or optimal cost, depending on your preferences.

In the example above, we already have a global policy in place: Don’t let users share PII (personally identifiable information)!

Our user:

  1. First, sends a simple request that has no business spending money on GPT-4.
  2. Then, sends a more complicated question in Latin, a language that OpenAI struggles with.
  3. In the process, our user forgets to redact a customer’s phone number from their question.

Vera’s platform:

  1. Sends the simple question to a free, internally-hosted instance of LlaMa, understanding that GPT* is too pricey for the task.
  2. Detects that the question is in a non-English language…
  3. Detects the PII (phone number)…
  4. Redacts it, and…
  5. Automatically sends the Latin question (with redactions) to AI21!

2) Why?!

Imagine you’re a team leader at a Generative-AI-enabled company. You’re plugging along throughout the day, your teammates are asking ChatGPT to summarize a project plan, debug some code, or rephrase marketing copy for higher impact. You’ve already got good trust & safety policies in place, and the Vera platform is moderating model inputs and outputs; your company’s compliance and risk teams are happy.

After months of playing, you and your team are finding out that OpenAI, in all its glory, isn’t actually the best at everything. Maybe you asked it to multiply some really big numbers, or translate text into Mandarin. There’s a new big model announcement today (and every day, actually), that’s outperforming OpenAI in some category, and today that category is the one that matters. Maybe you’re just looking to answer a question about current events, but ChatGPT doesn’t know about anything past 2021.

PLUS… the token invoices are starting to pile up. Remember that time you asked ChatGPT to find a synonym for the word “pleasant”? Did that question really warrant invoking GPT-4’s all-powerful-eye? Or would a smaller, open source model have done the trick?

Generative AI models are SO BIG that they simply aren’t good at everything. There are dozens of LLM evaluations, so many, in fact, that you can make yourself crazy trying to find one that suits all of your needs. Luckily, with a little guidance, we can help you set one policy for all of the LLMs you’d like to use.

3) HOW?!!!!

Great question.

Vera’s incredible ML research team, led by our CTO Justin Norman, uses the latest and greatest evaluation techniques from academia, to rank and order LLMs by their strengths and weaknesses.

Some are good at summaries, others better at non-English language. Some are fast, many are slow, and yet others cost an arm and a leg. For those worried about model stereotyping, bias, and copyright risk (!!), our platform lets you choose which metrics matter most to you, and prioritize them.

You can assign flexible token budgets to teams, roles, and levels, minimizing bias or copyright risks for customer-facing communications. Meanwhile, Vera’s granular RBAC can block groups from sending input types that may be too risky for now (code, or image, for example). Vera’s global policy manager ensures no one sends confidential information outside your organization, while aligning model responses to your company’s values.

And so, we’re thrilled to share this work with you today! If you have questions, suggestions, or even candid feedback, never hesitate to get in touch with us: info@askvera.io.

And don’t forget to sign up for our waitlist while you’re there.

--

--