Software

Databases

Teradata takes plunge into lakehouse waters, but not everyone is convinced

We have not changed our minds, the industry has evolved, data warehouse stalwart claims


With its vision of a unified enterprise data warehouse, Teradata attracted globally dominant customers including HSBC, Unilever and Walmart. But earlier this month, it confirmed backing of the lakehouse concept, which combines both messy data lakes and structured data warehouses, together with the idea of analytics anywhere, supported by object storage and open table formats.

Although its hand may have been forced, observers pointed out that there is still a place for Teradata's mainstay high-performance, block storage-based analytics.

The 45-year-old company previously announced support for open table formats (OTFs) Apache Iceberg and Linux Foundation Delta Lake. In doing so, it embraced an industry trend towards performing analytics on data in situ rather than moving it to a single store for BI and other analysis.

Teradata also spoke approvingly for the first time about the lakehouse architecture, a term introduced by rival Databricks to describe an environment for both machine learning and data exploration, and the traditional BI and analytics usually done in the more regimented environment of an enterprise data warehouse.

AI adoption, or so Teradata claimed, had consolidated data warehouses, analytics, and data science workloads into unified lakehouses. "OTF support further enhances Teradata's lakehouse capabilities, providing a storage abstraction layer that's designed to be flexible, cost-efficient, and easy-to-use," it said in a corporate missive.

Speaking to The Register, Louis Landry, a Teradata engineering fellow, said support for OTFs did not mean the company no longer believed in the enterprise data warehouse.

"It's complementary," he told us. "We believe that we need to be able to play data where it lies. In a lot of cases, that's going to mean highly efficient block storage, for low latency and all that kind of good stuff. But in a lot of cases that's not how the data is going to be laid out. Different customers have different needs. Our goal is always is to make sure that they get the best value out of integrated data."

He said the data warehouse and lakehouse ideas were architectures more than just technologies and that customers would pick and choose which approach works for them.

"That means continuing to offer the level of service we do around that high throughput work that can really only be serviced out of block storage. But we also need to be able to address data that's sitting in an object store or some sort of external storage, so that we provide a holistic, singular view of what's available and what's accessible and security and all the things that people have come to expect out of [a] Teradata system."

Teradata has been performing analytics on data external to its main data warehouse since 2020, when it updated Teradata QueryGrid and partnered with Starburst Data to integrate the Presto connector so that users of Teradata's Vantage analytics platform could access and query a gamut of cloud and on-premises data sources.

But it was adamant that it would not endorse the lakehouse concept. Speaking to The Register in 2022, then CTO Stephen Brobst said data lakes and data warehouses were part of a unified architecture but discrete concepts. "There is a difference between the raw data, which is really data lake, and the data product, which is the enterprise data warehouse," he said.

Although Teradata launched its own data lake in August 2022, Brobst said there was an important distinction between where businesses put their raw data and the data warehouse, which optimizes query performance and controls governance. Creating a hybrid lakehouse was "actually not very useful because you don't want to have more copies of data than is necessary."

Landry said he and Brobst, who left Teradata in January this year, "have had a fun relationship and been debating various ideas over the course of my ten-year tenure here."

"I don't think we've changed our minds on the approach. The technology industry evolves and our goal is to provide the best possible integrated data solution for our customers. This is not new, we haven't just started working on this in the last couple of months."

However, one seasoned Teradata support engineer, who asked not to be named, told The Register he feared the company had lost its way.

"Teradata has to back this horse whether they like it or not, and whether they mean it or not," he said.

The source pointed out the precedent in that Teradata had first resisted then adopted the trend of using Hadoop during the big data boom of more than a decade ago.

Meanwhile, cloud vendors with data warehouse and data lake systems – particularly Google and Microsoft – were writing "blank checks" to try to attract Teradata's largest customers to their systems.

Although Teradata might have a superior data warehouse product in terms of user concurrency and query optimization, customers were increasingly satisfied with a dumbed-down solution so long as it got them to the cloud, he said.

At the same time, getting onto object storage and OTFs might not help efficiency but it would put users in the driving seat, he said.

"People are basically saying, 'I don't care whether you call it a lakehouse or whatever.' They're saying we just want to dump our data into object storage, then the next evolution of that is we want to process it where it is. Then they want an overlay that anyone can use so it's not a proprietary format in object storage. I think this creates major trouble for all of the vendors. Let's just choose Iceberg as the winner … it means your data is now in an open format in the cheapest storage you can possibly get your hands on. It's a winner from an end user perspective."

Hyoun Park, CEO and chief analyst at Amalgam Insights, agreed that Teradata's hand had been forced in adopting the lakehouse concept and OTFs, but he said customers still value high-performance data warehouse systems.

"Teradata has been forced to embrace the data lakehouse concept because of the importance of data lakes and unstructured data in AI and machine learning. Teradata is still a top choice for data warehouse, although of course they have to deal with the aggressiveness of Snowflake there. But nobody really doubts that Teradata can support high quality enterprise data warehouse."

Park said an enterprise data warehouse was still a "superior concept," but the problem was that the number of data/analytics applications businesses were expected to support had expanded rapidly.

"There will always be a place for data warehouse that supports your top 50 apps in the enterprise because you are going to want a high-performance data store to support analytics as fast as possible and a data warehouse is the best way to do that.

"However, the challenge is that the current enterprise of a billion dollar-plus revenue typically has over 1,000 apps. The sheer effort to bring those other apps into a data warehouse is just crippling. You have to put the rest of that data somewhere if you want to use it for anything from analytics to AI, so that's where the data lake comes in. That forces with this two-tier approach."

The expansion of data-reliant applications – like machine learning and AI – together with the introduction of cloud computing and object storage have converged to transform the enterprise data management and analytics environments.

While Snowflake shook things up by separating storage and compute, Databricks attached SQL-style BI workloads to its data lake machine learning environments.

Data lake company Cloudera and Tabular, the "headless" data warehouse vendor, both have different visions of the market, as do the powerful cloud platform providers, which similarly claim to offer an all-things-to-all-data product suite. Whether Teradata can thrive in this complex and changing market is still unclear. ®

Send us news
Post a comment

OneHouse takes $35M to fight for Hudi in table format wars

After Databricks snaffled Iceberg-linked Tabular, CEO insists there are more than two horses in this race

Law firms seek investors' support in Teradata class action

Claim analytics and data platform biz misled investors about size of public cloud forecast

How Apache Spark lit up the tech world and outshone its big data brethren

El Reg queries author Matei Zaharia on a decade of the project

Lakehouse dam breaks after departure of long-time Teradata CTO

Data warehousing giant abandons stance against hybrid analytics

Big Tech, VC firms pump $1B into ML data darling Scale AI

Be careful not to over-inflate, you may burst your bubble

Big brains divided over training AI with more AI: Is model collapse inevitable?

Gosh, here's us thinking recursion was a solved problem

Stack Overflow simply bans folks who don't want their advice used to train AI

Give us an opt-out button or give us (temporary) account death!

A tale of two Chinas: Our tech governance isn't perfect, but we still get to say no

Too many folks who should know better saying info-slurping tactics of Big Tech are just as bad

AT&T, Verizon, Sprint, T-Mobile US fined $200M for selling off people's location info

Carriers claim real culprits are getting away with it - the data brokers

Think tank: China's tech giants refine and define Beijing's propaganda push

Taking down TikTok won't stop the CCP's attempt to control global narratives

NASA's Psyche hits 25 Mbps from 140 million miles away – enough for Ultra HD Netflix

Laser beam comms are fast, so long as the weather cooperates

Voltron Data revs up hyper-speed analytics, leaves Snowflake in the dust

GPU-based system offers high performance off Parquet files