June 27, 2024 - last updated

How to build LLM-based phone assistants with OpenAI, Twilio, & Aporia

With the release of Large Language Models (LLMs), like GPT-4 and Llama, voice assistants have become more capable and accurate at performing complex tasks. Developers can use APIs to build a custom voice assistant experience for almost any use case.

Alon Gubkin

Alon is the CTO of Aporia.

8 min read Apr 04, 2024

From setting reminders, playing music, and controlling smart home devices, LLM-based voice assistants like Siri, Alexa, and Google Assistant have integrated seamlessly into our daily lives over the last decade. Their popularity, usage, and consumer adoption are increasing rapidly.

In the US alone, the number of voice assistant users is expected to reach 157.1 million by 2026, compared to 142 million in 2022, with global estimates reaching 8.4 billion units in 2024.

With the release of Large Language Models (LLMs), like GPT-4 and Llama, voice assistants have become more capable and accurate at performing complex tasks. Additionally, various Open Source LLMs offer customizable solutions, making them an attractive alternative for developers seeking flexibility in their projects.

In this article, we will show you how to:

Build a phone assistant using GPT-3.5 Turbo LLM from OpenAI API
Handle user-assistant conversations using Twilio.
Mitigate AI hallucinations in real time with Aporia.

Code setup & phone assistant architecture overview

You can also follow along with the code setup on the YouTube tutorial here. To follow this step-by-step tutorial, you need a basic understanding of the following code components:

Hono library: Hono is a small and super fast web framework for building full-stack applications, web APIs, edge applications, etc. It is written in Typescript and works with a range of runtime servers. We will use an empty HTTP server using Node.js.
Twilio phone number: When someone calls your Twilio phone number, it will send an HTTP request to the server. You can buy a phone number using Twilio Console. Then configure the phone number to the server where your application is hosted.
Twilio API: Twilio provides APIs to build Programmable Voice applications. We’ll use its TwiML XML instruction set or tags to set up responses for incoming calls.
OpenAI API: OpenAI provides many text processing and generation capabilities using its API. We’ll use its Chat Completions API to generate a conversation between the user and the assistant.
ngrok: It is a secure application delivery platform. We will use it to tunnel the HTTP server to the internet.
Aporia API: Aporia provides APIs to control AI chatbot and assistant performance. Here, we’ll use its Guardrails solution to mitigate LLM hallucinations in real time.

*Overview of a basic phone assistant architecture using Twilio and OpenAI*

Step-by-step guide to build an AI phone assistant

1. Implement post API for incoming calls.

Set Up a Basic Call Response

When someone calls your Twilio number, the server receives a POST HTTP request for the /incoming-call API.

Let’s implement the /incoming-call API to return a basic Twilio response. In the code snippet below, we are using the VoiceResponse() method from TwiML to format a simple say() response: “Hello, how are you?” and return it as XML for the incoming call.

app.post('/incoming-call', (c) => {
 const voiceResponse = new twiml.VoiceResponse()
 voiceResponse.say(“Hello, how are you?”)


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())
})

Now, start your server and call the number. You will hear the set response.

Speech detection & transcription – Listen to the user

Now, start listening to the user sentences using TwiML’s gather() method. It takes a few arguments, like:

input: such as speech or dtmf
speechTimeout: to detect when the user takes a pause in speech
speechModel: to determine the type of speech that Twilio will transcribe to text.
action: determines the next API call (/respond in this case) after the transcription is complete.

Add the following code snippet to your /incoming-call API after the say() method.

 voiceResponse.gather({
   input: ["speech"],
   speechTimeout: "auto",
   speechModel: 'experimental_conversations',
   enhanced: true,
   action: '/respond',
 })

2. Implement API endpoint/respond

Twilio will pass the result of the speech recognition model to this HTTP request. First, collect the speech recognition result in an HTTP form data object. Use OpenAI API to start a conversation based on the user’s collected response. For that, you need to import the OpenAI library and initialize a new OpenAI instance with the API key to use its Chat Completion API.

When the user says something, the OpenAI Chat Completion API will generate a response using the GPT-3.5 Turbo model. We’ll then pass this response back to the user using the Twilio say() method and redirect the user to /incoming-call API to continue the conversation.

The code snippet below demonstrates this process.

app.post('/respond', async (c) => {
 const formData = await c.req.formData()
 const voiceInput = formData.get("SpeechResult")?.toString()!


 const chatCompletion = await openai.chat.completions.create({
   model: "gpt-3.5-turbo",
   messages: [
       { role: "user", content: voiceInput }
   ],
   temperature: 0,
 })


 const assistantResponse = chatCompletion.choices[0].message.content


 const voiceResponse = new twiml.VoiceResponse()
 voiceResponse.say(assistantResponse!)
 voiceResponse.redirect({ method: "POST" }, "/incoming-call")


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())


})

3. Maintain the conversation state

So far, we have built a basic conversation assistant. But, the problem is that it cannot keep track of the entire user-assistant conversation. So, we need to maintain the speech state to save the conversation context.

For this purpose, we’ll use an HTTP cookie from the hono/cookie library to store the conversation history between the /incoming-call and /respond APIs. We’ll use getCookie() and setCookie() methods from the library.

First, create a new cookie (if one does not already exist) called “messages” in the /incoming-call API to store a new conversation. Use the setCookie() method to store the state of the initial conversation.

Now, use the getCookie() in the /respond API endpoint. Push the current user message into the messages cookie. Pass messages cookie to the OpenAI Chat Completion API. Push the assistant response generated by Chat Completion API in the messages cookie as well. Finally, set the cookie with the updated messages.

Here are the complete /incoming-call and /respond API endpoints with state maintained for conversation history.

import { serve } from '@hono/node-server'
import { Hono } from 'hono'
import { logger } from 'hono/logger'
import { twiml } from 'twilio';
import OpenAI from 'openai';
import { getCookie, setCookie } from 'hono/cookie'


const openai = new OpenAI()


const app = new Hono()
app.use('*', logger())


const INITIAL_MESSAGE = "Hello, how are you?"

app.post('/incoming-call', (c) => {
 const voiceResponse = new twiml.VoiceResponse()


 if (!getCookie(c, "messages")) {
   // This is a new conversation!
   voiceResponse.say(INITIAL_MESSAGE)
   setCookie(c, "messages", JSON.stringify([
     {
       role: "system",
       content: `
         You are a helpful phone assistant for a pizza restaurant.
         The restaurant is open between 10-12 pm.
         You can help the customer reserve a table for the restaurant.
       `
     },
     { role: "assistant", content: INITIAL_MESSAGE }
   ]))
 }


  voiceResponse.gather({
   input: ["speech"],
   speechTimeout: "auto",
   speechModel: 'experimental_conversations',
   enhanced: true,
   action: '/respond',
 })


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())
})
app.post('/respond', async (c) => {
 const formData = await c.req.formData()
 const voiceInput = formData.get("SpeechResult")?.toString()!


 let messages = JSON.parse(getCookie(c, "messages")!)
 messages.push({ role: "user", content: voiceInput })


 const chatCompletion = await openai.chat.completions.create({
   model: "gpt-3.5-turbo",
   messages,
   temperature: 0,
 })


 const assistantResponse = chatCompletion.choices[0].message.content
 messages.push({ role: "assistant", content: assistantResponse })
 console.log(messages)


 setCookie(c, "messages", JSON.stringify(messages))


 const voiceResponse = new twiml.VoiceResponse()
 voiceResponse.say(assistantResponse!)
 voiceResponse.redirect({ method: "POST" }, "/incoming-call")


 c.header("Content-Type", "application/xml")
 return c.body(voiceResponse.toString())


})


const port = 3000
console.log(`Server is running on port ${port}`)


serve({
 fetch: app.fetch,
 port
})

4. Implement guardrails to mitigate risks

Now that our phone assistant is capable of understanding and responding to user queries, it’s key to ensure that these interactions are not just intelligent, but also secure and reliable.

Layered between the OpenAI API and Twilio interface, Aporia Guardrails acts as a robust safeguard, preventing risks like hallucinations, data leakage, and inappropriate responses that could undermine the assistant’s effectiveness.

To integrate Aporia Guardrails with your codebase, a one-line change is all that’s needed:

const openai = new OpenAI({
	baseURL: aporia_guardrails_url,
	defaultHeaders: {“X-APORIA-API-KEY”: aporia_guardrails_api_key}
})

Setup: Easily integrate Aporia Guardrails into the development environment. This step ensures every interaction is analyzed for safety and accuracy, aligning with predefined standards.
Customization: Tailor the guardrails to suit the assistant’s needs. This could involve setting parameters to guard against specific risks, like prompt injections or unintended data leaks.
Continuous detection & mitigation: Aporia not only provides app security but continuously monitors interactions, adapting to new threats and ensuring the assistant’s responses remain within the safety guidelines and aligned with business KPIs.
Deployment: With Aporia Guardrails in place, deploy the assistant with confidence, knowing that it’s equipped to handle interactions securely, maintaining user trust and regulatory compliance.

What’s Next?

If you have followed this tutorial along, you have built a working voice assistant that can be customized for any number of use cases and is safeguarded against hallucinations and AI risks. You can now connect this assistant with Calendar or your reservation system, or whatever application you need.

More of a visual learner? Check out the video:

Learn more about mitigating hallucinations in real time with Aporia Guardrails:

Book a demo today.

Alon Gubkin

Aporia | Co-Founder & CTO

Control All your GenAI Apps in minutes

Get a Demo

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_lfa	1 year	This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address.
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
datadome	session	This is a security cookie set by Force24 to detect BOTS and malicious traffic.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
usprivacy	1 year	This is a consent cookie set by Dailymotion to store the CCPA consent string (mandatory information about an end-user being or not being a California consumer and exercising or not exercising its statutory right).
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_gat	2 minutes	Google Universal Analytics sets this cookie to restrain request rate and thus limit data collection on high-traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_fbp	3 months	Facebook sets this cookie to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising after visiting the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.n
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
_hjSession_*	1 hour	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
_session_id	14 days	_session_id cookie stores a unique identifier for a user's session, allowing servers to identify and track user activities within a website or application.
ajs_anonymous_id	1 year	This cookie is set by Segment to count the number of people who visit a certain site by tracking if they have visited before.
ajs_user_id	never	This cookie is set by Segment to help track visitor usage, events, target marketing, and also measure application performance and stability.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
_rdt_uuid	3 months	Reddit sets this cookie to build a profile of your interests and show you relevant ads.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
MUID	1 year 24 days	Bing sets this cookie to recognise unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.