Let me explain why I am so excited about Arraylake plus Tigris Data.
First, let’s agree: commoditized cloud object storage is the future of data infrastructure. That’s a consensus at this point. But ever since I first learned about cloud object storage, I've heard one big objection from scientific data users, particularly academics: "I don't want to pay to download my own data" (egress fees). This is not just a psychological block; today organizations of all sizes are chasing GPUs wherever they can find them, which means moving training data around from one data center to another. If your data resides in a normal cloud provider, you pay massive egress fees whenever you move it.
Tigris is building their own S3-compatible, globally distributed object storage service with no egress fees. This is really a game changer. By storing data in Tigris, organizations retain a ton of flexibility about where and how they do their compute. For example, they can run their production apps in AWS, their ML training in Lambda Labs, and serve data directly to their users’ browsers. Cloud compute providers are forced to compete on price and quality.
Of course, just dumping data into an S3 bucket—no matter where it lives—eventually leads to an unmanageable data swamp. That’s where Arraylake comes in. Arraylake helps teams build and maintain scientific data lakes via an integrated data catalog, ACID transactions, access controls, and support for all major array-based scientific data formats, while still enabling direct access to the raw bytes in object storage. Arraylake works great with Tigris.
Come to our webinar to see why we are so excited about this combination!
Register 👇 for our webinar on June 18th, co-hosted with Tigris Data, presented by Ryan Abernathey and Garren Smith https://lnkd.in/g3nyMvBx.
We will discuss the integration between Arraylake and Tigris, a globally distributed S3-compatible object storage service. The benefit of Tigris for Arraylake users is the ability to make data available anywhere in the world without paying egress fees to a cloud provider. This enables organizations to provide data to diverse global audiences and to take their data to the most cost-effective compute provider, including AI-specialized GPU data centers. We will conclude with a vision for how this architecture could provide the foundation for a globally distributed shared body of AI-ready, cloud-optimized scientific data.