Cassandra In A Nutshell

•

6 likes•3,018 views

This document summarizes Cassandra, an open source distributed database. It describes Cassandra's history starting at Facebook, then being taken over by Apache. It provides details on Cassandra's architecture as a massively scalable, distributed, structured data store with tunable consistency levels and fast reads/writes. The document outlines that values in Cassandra are structured and indexed by columns and supercolumns with slicing queries supported. Key features like hinted handoff, Thrift API, data center awareness, pluggable comparators, and enumeration/range queries are also summarized.

History
Description
Who

Cassandra In A Nutshell

Eric Evans
eevans@rackspace.com

NoSQL Oakland
November 2, 2009

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

A prophetess in Troy during the Trojan War. Her predictions were
always true, but never believed.

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

A massively scalable, distributed (peer-to-peer), structured data
store (aka database).

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Outline

1 History

2 Description

3 Who

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Facebook

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Google Code

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Apache

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Cassandra is...

O(1) DHT

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Cassandra is...

O(1) DHT
Eventual consistency

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Cassandra is...

O(1) DHT
Eventual consistency
Tunable trade-oﬀs, consistency vs. latency

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

But...

Values are structured, indexed

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

But...

Values are structured, indexed
Columns, Supercolumns

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

But...

Values are structured, indexed
Columns, Supercolumns
Slicing w/ predicates

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

And...

Hinted hand-oﬀ

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

And...

Hinted hand-oﬀ
Thrift API

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

And...

Hinted hand-oﬀ
Thrift API
Rack/data-center aware partitioning

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

And...

Hinted hand-oﬀ
Thrift API
Rack/data-center aware partitioning
Pluggable comparators

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

And...

Hinted hand-oﬀ
Thrift API
Rack/data-center aware partitioning
Pluggable comparators
Key enumeration, range queries

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

History
Description
Who

Droppin’ Names

Facebook
Digg
Rackspace
Twitter
IBM Research

Eric Evans eevans@rackspace.com Cassandra In A Nutshell

More from Eric Evans

Time Series Data with Apache Cassandra

Eric Evans

This document discusses using Apache Cassandra to store and manage time series data in OpenNMS. It describes some limitations of the existing RRDTool-based data storage, such as high I/O requirements for updating and aggregating data. Cassandra is presented as an alternative that is optimized for write throughput, flexible data modeling, high availability, and ability to perform aggregations at read time rather than write time. The Newts project is introduced as a standalone time series data store built on Cassandra that aims to provide fast storage and retrieval of raw samples along with flexible aggregation capabilities.

Time Series Data with Apache Cassandra

Eric Evans

Whether it's statistics, weather forecasting, astronomy, finance, or network management, time series data plays a critical role in analytics and forecasting. Unfortunately, while many tools exist for time series storage and analysis, few are able to scale past memory limits, or provide rich query and analytics capabilities outside what is necessary to produce simple plots; For those challenged by large volumes of data, there is much room for improvement. Apache Cassandra is a fully distributed second-generation database. Cassandra stores data in key-sorted order making it ideal for time series, and its high throughput and linear scalability make it well suited to very large data sets. This talk will cover some of the requirements and challenges of large scale time series storage and analysis. Cassandra data and query modeling for this use-case will be discussed, and Newts, an open source Cassandra-based time series store under development at The OpenNMS Group will be introduced.

It's not you, it's me: Ending a 15 year relationship with RRD

Eric Evans

Time series storage in Cassandra

Eric Evans

Virtual Nodes: Rethinking Topology in Cassandra

Eric Evans

Cassandra by Example: Data Modelling with CQL3

Eric Evans

This document summarizes a presentation about modeling data with Cassandra Query Language (CQL) using examples from a Twitter-like application called Twissandra. It introduces CQL as an alternative to Thrift for querying Cassandra and describes how to model users, followers, tweets, timelines and other social media data structures in Cassandra tables. The presentation emphasizes denormalizing data and using materialized views to optimize queries, and concludes by noting that applications can be built in various languages thanks to Cassandra drivers.

Cassandra By Example: Data Modelling with CQL3

Eric Evans

CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.

Rethinking Topology In Cassandra (ApacheCon NA)

Eric Evans

The document discusses topology and partitioning in Cassandra distributed hash tables (DHTs). It describes issues with poor load distribution and data distribution in traditional DHT designs. It proposes using virtual nodes, where each physical node is assigned multiple tokens, to better distribute partitions and improve performance. Configuration options for Cassandra are presented that implement virtual nodes using a random token assignment strategy.

Virtual Nodes: Rethinking Topology in Cassandra

Eric Evans

The document discusses Cassandra's topology and how it is moving from a single token per node model to a virtual node model where each node is assigned multiple tokens. This improves load balancing and data distribution in the cluster. Specifically, it addresses problems with the single token approach like poor load distribution when nodes fail and inefficient data movement when adding or replacing nodes. The virtual node model with random token assignment provides better scaling properties as the number of nodes and data size increases.

Castle enhanced Cassandra

Eric Evans

CQL: SQL In Cassandra

Eric Evans

This document discusses CQL, the Cassandra Query Language. CQL is designed to be similar to SQL but with some differences to account for Cassandra's data model. The presentation provides an overview of CQL's syntax and capabilities, discusses why CQL was created to provide a more stable interface than Cassandra's native protocol, and analyzes CQL's performance compared to the native protocol. Future roadmap items for CQL are also presented, including prepared statements and custom transports. Available CQL drivers for languages like Java, Python, Ruby, and Node.js are also briefly mentioned.

CQL In Cassandra 1.0 (and beyond)

Eric Evans

This document provides an overview and history of the Cassandra Query Language (CQL) and discusses changes between versions 1.0 and 2.0. It notes that CQL was introduced in Cassandra 0.8.0 to provide a more stable and user-friendly interface than the native Cassandra API. Major changes in CQL 2.0 included data type changes and additional functionality like named keys, counters, and timestamps. The document outlines the roadmap for future CQL features and lists several third-party driver projects supporting CQL connectivity.

Cassandra: Not Just NoSQL, It's MoSQL

Eric Evans

CQL is a structured query language for Apache Cassandra that is similar to SQL. It provides an alternative interface to the existing Thrift API, with the goals of being more stable, easier to use, and providing a better mental model for querying and data. The document outlines the motivations for developing CQL, including limitations of the existing Thrift API, and provides details on CQL specification, drivers, and additional resources.

NoSQL Yes, But YesCQL, No?

Eric Evans

1. The document discusses Cassandra Query Language (CQL), a new structured query language for Apache Cassandra that is similar to SQL. 2. CQL aims to provide a simpler alternative to Cassandra's existing Thrift API, which is difficult for clients to use and unstable due to its tight coupling to Cassandra's internal APIs. 3. The document outlines some benefits of CQL compared to the Thrift API, such as requiring less client-side abstraction and being more intuitive through its use of a familiar query/data model.

Cassandra Explained

Eric Evans

This document provides an overview and introduction to Cassandra, an open source distributed database management system designed to handle large amounts of data across many commodity servers. It discusses Cassandra's origins from influential papers on Bigtable and Dynamo, its properties including flexibility, scalability and high availability. The document also covers Cassandra's data model using keyspaces and column families, its consistency options, API including Thrift and language drivers, and provides examples of usage for an address book app and storing timeseries data.

Outside The Box With Apache Cassnadra

Eric Evans

The Cassandra Distributed Database

Eric Evans

This document summarizes Cassandra, an open source distributed database management system designed to handle large amounts of data across many commodity servers. It discusses Cassandra's history, key features like tunable consistency levels and support for structured and indexed columns. Case studies describe how companies like Digg, Twitter, Facebook and Mahalo use Cassandra to handle terabytes of data and high transaction volumes. The roadmap outlines upcoming releases that will improve features like compaction, management tools, and support for dynamic schema changes.

More from Eric Evans (17)

Time Series Data with Apache Cassandra

It's not you, it's me: Ending a 15 year relationship with RRD

Time series storage in Cassandra

Virtual Nodes: Rethinking Topology in Cassandra

Cassandra by Example: Data Modelling with CQL3

Cassandra By Example: Data Modelling with CQL3

Rethinking Topology In Cassandra (ApacheCon NA)

Virtual Nodes: Rethinking Topology in Cassandra

Castle enhanced Cassandra

CQL: SQL In Cassandra

CQL In Cassandra 1.0 (and beyond)

Cassandra: Not Just NoSQL, It's MoSQL

NoSQL Yes, But YesCQL, No?

Cassandra Explained

Outside The Box With Apache Cassnadra

The Cassandra Distributed Database

Recently uploaded

Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...

Bert Blevins

Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.

INDIAN AIR FORCE FIGHTER PLANES LIST.pdf

jackson110191

Comparison Table of DiskWarrior Alternatives.pdf

Andrey Yasko

Measuring the Impact of Network Latency at Twitter

ScyllaDB

Manual | Product | Research Presentation

welrejdoall

How to Build a Profitable IoT Product.pptx

Adam Dunkels

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...

Erasmo Purificato

The Increasing Use of the National Research Platform by the CSU Campuses

Larry Smarr

TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In

TrustArc

Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk. What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year? Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year. This webinar will review: - Key changes to privacy regulations in 2024 - Key themes in privacy and data governance in 2024 - How to maximize your privacy program in the second half of 2024

Choose our Linux Web Hosting for a seamless and successful online presence

rajancomputerfbd

Transcript: Details of description part II: Describing images in practice - T...

BookNet Canada

This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator. Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/ Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.

Coordinate Systems in FME 101 - Webinar Slides

Safe Software

If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights. During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to: - Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value - Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems - Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors - Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported - Look Ahead: Gain insights into where FME is headed with coordinate systems in the future Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL

Liveplex

Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops

Mydbops

This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization. Key Takeaways: * Understand why connection pooling is essential for high-traffic applications * Explore various connection poolers available for PostgreSQL, including pgbouncer * Learn the configuration options and functionalities of pgbouncer * Discover best practices for monitoring and troubleshooting connection pooling setups * Gain insights into real-world use cases and considerations for production environments This presentation is ideal for: * Database administrators (DBAs) * Developers working with PostgreSQL * DevOps engineers * Anyone interested in optimizing PostgreSQL performance Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services

Quality Patents: Patents That Stand the Test of Time

Aurora Consulting

Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality. Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality. Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality. Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank? ** Episode Overview ** In this first episode of our quality series, Kristen Hansen and the panel discuss: ⦿ What do we mean when we say patent quality? ⦿ Why is patent quality important? ⦿ How to balance quality and budget ⦿ The importance of searching, continuations, and draftsperson domain expertise ⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications https://www.aurorapatents.com/patently-strategic-podcast.html

What’s New in Teams Calling, Meetings and Devices May 2024

Stephanie Beckett

Research Directions for Cross Reality Interfaces

Mark Billinghurst

論文紹介：A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...

Toru Tamaki

20240705 QFM024 Irresponsible AI Reading List June 2024

Matthew Sinclair

Best Practices for Effectively Running dbt in Airflow.pdf

Tatiana Al-Chueyr

As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models. This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through: - Standard ways of running dbt (and when to utilize other methods) - How Cosmos can be used to run and visualize your dbt projects in Airflow - Common challenges and how to address them, including performance, dependency conflicts, and more - How running dbt projects in Airflow helps with cost optimization Webinar given on 9 July 2024

Recently uploaded (20)

Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...

INDIAN AIR FORCE FIGHTER PLANES LIST.pdf

Comparison Table of DiskWarrior Alternatives.pdf

Measuring the Impact of Network Latency at Twitter

Manual | Product | Research Presentation

How to Build a Profitable IoT Product.pptx

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...

The Increasing Use of the National Research Platform by the CSU Campuses

TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In

Choose our Linux Web Hosting for a seamless and successful online presence

Transcript: Details of description part II: Describing images in practice - T...

Coordinate Systems in FME 101 - Webinar Slides

BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL

Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops

Quality Patents: Patents That Stand the Test of Time

What’s New in Teams Calling, Meetings and Devices May 2024

Research Directions for Cross Reality Interfaces

論文紹介：A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...

20240705 QFM024 Irresponsible AI Reading List June 2024

Best Practices for Effectively Running dbt in Airflow.pdf

Cassandra In A Nutshell

1. History Description Who Cassandra In A Nutshell Eric Evans eevans@rackspace.com NoSQL Oakland November 2, 2009 Eric Evans eevans@rackspace.com Cassandra In A Nutshell

2. History Description Who A prophetess in Troy during the Trojan War. Her predictions were always true, but never believed. Eric Evans eevans@rackspace.com Cassandra In A Nutshell

3. History Description Who A massively scalable, distributed (peer-to-peer), structured data store (aka database). Eric Evans eevans@rackspace.com Cassandra In A Nutshell

4. History Description Who Outline 1 History 2 Description 3 Who Eric Evans eevans@rackspace.com Cassandra In A Nutshell

5. History Description Who Facebook Eric Evans eevans@rackspace.com Cassandra In A Nutshell

6. History Description Who Google Code Eric Evans eevans@rackspace.com Cassandra In A Nutshell

7. History Description Who Apache Eric Evans eevans@rackspace.com Cassandra In A Nutshell

8. History Description Who Outline 1 History 2 Description 3 Who Eric Evans eevans@rackspace.com Cassandra In A Nutshell

9. History Description Who Cassandra is... O(1) DHT Eric Evans eevans@rackspace.com Cassandra In A Nutshell

10. History Description Who Cassandra is... O(1) DHT Eventual consistency Eric Evans eevans@rackspace.com Cassandra In A Nutshell

11. History Description Who Cassandra is... O(1) DHT Eventual consistency Tunable trade-oﬀs, consistency vs. latency Eric Evans eevans@rackspace.com Cassandra In A Nutshell

12. History Description Who But... Values are structured, indexed Eric Evans eevans@rackspace.com Cassandra In A Nutshell

13. History Description Who But... Values are structured, indexed Columns, Supercolumns Eric Evans eevans@rackspace.com Cassandra In A Nutshell

14. History Description Who But... Values are structured, indexed Columns, Supercolumns Slicing w/ predicates Eric Evans eevans@rackspace.com Cassandra In A Nutshell

15. History Description Who And... Hinted hand-oﬀ Eric Evans eevans@rackspace.com Cassandra In A Nutshell

16. History Description Who And... Hinted hand-oﬀ Thrift API Eric Evans eevans@rackspace.com Cassandra In A Nutshell

17. History Description Who And... Hinted hand-oﬀ Thrift API Rack/data-center aware partitioning Eric Evans eevans@rackspace.com Cassandra In A Nutshell

18. History Description Who And... Hinted hand-oﬀ Thrift API Rack/data-center aware partitioning Pluggable comparators Eric Evans eevans@rackspace.com Cassandra In A Nutshell

19. History Description Who And... Hinted hand-oﬀ Thrift API Rack/data-center aware partitioning Pluggable comparators Key enumeration, range queries Eric Evans eevans@rackspace.com Cassandra In A Nutshell

20. History Description Who And... Hinted hand-oﬀ Thrift API Rack/data-center aware partitioning Pluggable comparators Key enumeration, range queries Reads are fast, writes are nutty fast Eric Evans eevans@rackspace.com Cassandra In A Nutshell

21. History Description Who Outline 1 History 2 Description 3 Who Eric Evans eevans@rackspace.com Cassandra In A Nutshell

22. History Description Who Droppin’ Names Facebook Digg Rackspace Twitter IBM Research Eric Evans eevans@rackspace.com Cassandra In A Nutshell

Cassandra In A Nutshell

More Related Content

More from Eric Evans

More from Eric Evans (17)

Recently uploaded

Recently uploaded (20)

Cassandra In A Nutshell