June 27, 2024 - last updated

Islands of Confidence: Make LLM apps more reliable by running less LLMs

Alon is the CTO of Aporia.

4 min read Nov 28, 2023

In this article, I want to share a method to improve your LLM’s reliability, making LLM apps produce consistent results for particular inputs, by creating something I call “Islands of Confidence”.

An island of confidence is basically a set of inputs where we choose NOT to run an LLM. Instead, we run normal deterministic code.

This approach can be particularly effective when paired with Open Source Large Language Models, as it allows for greater flexibility and customization in handling specific inputs.

We’ll start with a very simple example and build it from there, step by step.

STEP 1: Exact Match

Let’s say we have a customer support chatbot, where users frequently ask: “How do I create a new account?”.

Since this question is so frequent, there’s no reason to run the LLM. Instead, we can simply add an ‘if’ statement before the model that checks if the user input is equal to the question above. If it is – we can return a cached, verified answer.

But this example is useless because another user might ask the same question a little bit differently. Let’s fix that.

STEP 2: Paraphrased Input

What if the user asks the same question but a little bit differently: “yo how to register for new account?”.

In this case, we still want to detect it and run the same logic. Fortunately, there’s a simple solution: we can fine-tune a small NLP binary model to detect paraphrases of our question.

One method is to use sentence transformers and a model such as paraphrase-mpnet-base-v2. From what I’ve seen, you only need around 10-20 examples for good results. Check out the SetFit library by Hugging Face.

Now, our island isn’t just a single string—it’s any paraphrase of the question “How do I create a new account?”.

But we ignored one important detail: our chatbot is probably a RAG, and to answer the question, it usually needs to retrieve context from the knowledge base.

STEP 3: RAGs on infrequently-modified KB

By creating an island that simply returns a string, we basically ignore the retrieval part. This creates a problem: what if a new version of the web app is deployed and the process of creating a new account changes?

Even though the KB would probably be updated, our cached answer is now deprecated.

To solve this, instead of just returning a string, we can check if the context that was originally used to generate the answer is still relevant. If not, we can invalidate the island.

Step 4: Talk-to-your-Data with a Constant Question

Talk-to-your-data use cases are really useful, this is how they work:

A user asks a question (e.g. “How many customers do we have and what’s the average ARR?”)
A prompt is generated with the relevant part of the database schema to be used as context
LLM generates a SQL query
This SQL query is being executed against the data warehouse (e.g. Snowflake)
Application UI is used to show the results of the query

Unfortunately, a hallucination here can lead to incorrect SQL statements, which could lead to completely incorrect data. The user might not know how to interpret it correctly.

Fortunately, our technique just works out of the box here! The island of confidence can return the verified SQL query, as long as the database schema doesn’t change, very similarly to the way we handled RAGs.

In a future post, I’ll discuss how islands of confidence can work with more complex variations of a question (not just paraphrases), as well as tools. Let me know what you think!

Alon Gubkin

Aporia | Co-Founder & CTO

Control All your GenAI Apps in minutes

Get a Demo

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_lfa	1 year	This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address.
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
datadome	session	This is a security cookie set by Force24 to detect BOTS and malicious traffic.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
usprivacy	1 year	This is a consent cookie set by Dailymotion to store the CCPA consent string (mandatory information about an end-user being or not being a California consumer and exercising or not exercising its statutory right).
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_gat	2 minutes	Google Universal Analytics sets this cookie to restrain request rate and thus limit data collection on high-traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_fbp	3 months	Facebook sets this cookie to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising after visiting the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.n
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
_hjSession_*	1 hour	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
_session_id	14 days	_session_id cookie stores a unique identifier for a user's session, allowing servers to identify and track user activities within a website or application.
ajs_anonymous_id	1 year	This cookie is set by Segment to count the number of people who visit a certain site by tracking if they have visited before.
ajs_user_id	never	This cookie is set by Segment to help track visitor usage, events, target marketing, and also measure application performance and stability.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
_rdt_uuid	3 months	Reddit sets this cookie to build a profile of your interests and show you relevant ads.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
MUID	1 year 24 days	Bing sets this cookie to recognise unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.

Islands of Confidence: Make LLM apps more reliable by running *less* LLMs