Skip to main content

Exclusive: Powerful new AI model accurately converts speech to text, even your company’s jargon

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More


The power of turning spoken words into text may be seriously underrated — especially when it works as quickly and accurately as the new AdaKWS model from aiOla, an Israeli tech startup specializing in speech recognition founded in 2020.

AdaKWS further optimizes OpenAI’s existing Whisper AI speech-to-text model that debuted back in 2022, improving its accuracy at detecting keywords by 6.2% overall across 16 languages — and more than 16% across English alone.

It achieves a remarkable 94.6% accuracy in keyword spotting generally, outperforming OpenAI’s widely acclaimed Whisper model with its 88.4% accuracy, according to metrics shared by aiOla. It also works across 100 different languages and transcribes the text in near real time.

Those numbers may not sound impressive at first, but it’s the difference between accuracy in the 80th and 90th percentiles, moving the technology from helpful in certain circumstances to more broadly applicable across a whole range of interesting and niche use cases, even in highly regulated and mission-critical spaces like healthcare, food and beverage and more. 


Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now


Moreover, aiOla’s AdaKWS is much faster at transcribing text than Whisper, according to data shared by the company, about 160 times faster than the Whisper-Large V2 model.

“The ability to spot keywords enables automation of everyday processes across a wide range of industries, from filing a parcel damage report to completing a safety inspection in a food manufacturing plant, transforming speech into actions,” said aiOla’s CEO and co-founder, Amir Haramaty, in a written statement provided to VentureBeat.

Myriad enterprise use cases

Even though the mind may naturally gravitate toward using speech-to-text AI models for tasks that rely on speech already, say transcribing customer service calls, aiOla is already seeing the impact in several less intuitive sectors. 

Haramaty showed VentureBeat demos of the technology over a video call interview, including a voice recording of a woman speaker reading out numerals and metrics from health monitoring equipment for a patient.

The speaker, a health tech, simply spoke out the readings and the AdaKWS AI model automatically used them to fill out a written text form with dozens of complex and specific fields, instead of her having to type them by hand or a support personnel having to listen to the recording and transcribe them later. It did so within seconds.

A video clip demo provided by the company, embedded below this sentence, shows how it can be used effectively for completing product and package inspection reports.

In another example, Haramaty told VentureBeat how the tech was being used to record temperatures of a large chain’s supermarket refrigerators regularly simply by a human monitor observing and speaking them, again instead of having to manually enter them.

aiOla says this use case alone saves the client more than 110,000 hours annually, which would otherwise be used for manual entry. 

The CEO also told VentureBeat he’d received a phone call personally from Oracle CEO Larry Ellison who was said to be highly interested in using the technology for healthcare records. aiOla is exploring such uses now but does not have an official partnership with Oracle to announce (yet). 

How AdaKWS speech-to-text works

The AdaKWS model uses a novel keyword-spotting method that integrates seamlessly into business workflows, allowing for the automation of reports and inspections via spoken commands.

It is a machine learning algorithm that has been trained to augment another existing speech-to-text model — say, OpenAI’s Whisper or any other model of the customer’s choosing — fitting between the host model’s encoder, the part that listens to the speaker’s voice and words, and the decoder, which turns the audio into the correct text output.

“Our play is tuning those,” said Joseph Keshet, aiOla’s chief scientist, in an interview with VentureBeat.

Unlike traditional models that require extensive training on new keywords, aiOla’s solution adapts quickly and efficiently, accommodating over 100 languages and various dialects without the need for retraining. This makes it ideal for enterprise use cases.

“Industry jargon is everywhere and in many fields, it dominates communication, comprising up to half of workers’ speech,” said Haramaty.

“This system is trained so as the overall performance on those keywords will be 100% correct,” explained Keshet. “Those keywords are represented to this latent space…you can have representation that generalizes to any language.”

It works especially well for companies that operate with customers or employees speaking multiple languages around the same subject and can be quickly trained to a specific company or industry’s jargon. The customer simply feeds it a list of text keywords, and the model can learn on its own from these to detect or “spot” them in speech, even without ever hearing a speaker saying them aloud at first.

The model can be trained for a specific customer “in hours,” according to Haramaty, learning “a new language, a new process, a new industry, new keywords we never touched. And we’ll be ready.”

In a head-to-head benchmark test involving 16 languages, aiOla’s AdaKWS not only surpassed Whisper’s accuracy but also demonstrated the ability to handle complex, industry-specific terminologies with fewer computational resources.

The company published the research behind AdaKWS in a scientific paper dated September 2023.

“Our model consistently surpassed the OpenAI Whisper baselines by a significant margin, achieving a substantial improvement compared to the top-performing baseline,” Keshet noted.

Augmentation and efficiency boosting of existing business process, not disruption

As businesses continue to seek efficient and accurate tools to handle complex datasets and communication needs, aiOla’s latest advancement with AdaKWS presents a significant opportunity to streamline operations and reduce overhead.

The company is offering its technology through web and mobile apps, and charges on a software-as-a-service (SaaS) subscription model on a per-user and per-use case basis.

The company’s breakthrough in speech AI not only sets a new standard in the field but also illustrates the potential for future innovations that could further enhance the integration of AI in everyday business processes.

“I love to disrupt, but maybe it took me too long to realize that most people in the world don’t like to be disrupted, ” said Haramaty, noting that AdaKWS was poised to augment and streamline existing businesses processes and human performance, rather than replace them.