The document discusses Cassandra's topology and how it is moving from a single token per node model to a virtual node model where each node is assigned multiple tokens. This improves load balancing and data distribution in the cluster. Specifically, it addresses problems with the single token approach like poor load distribution when nodes fail and inefficient data movement when adding or replacing nodes. The virtual node model with random token assignment provides better scaling properties as the number of nodes and data size increases.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a distributed publish-subscribe messaging system that allows both publishing and subscribing to streams of records. It uses a distributed commit log that provides low latency and high throughput for handling real-time data feeds. Key features include persistence, replication, partitioning, and clustering.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Redis is an in-memory key-value store that is often used as a database, cache, and message broker. It supports various data structures like strings, hashes, lists, sets, and sorted sets. While data is stored in memory for fast access, Redis can also persist data to disk. It is widely used by companies like GitHub, Craigslist, and Engine Yard to power applications with high performance needs.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
Practical advices how to achieve persistence in Redis. Detailed overview of all cons and pros of RDB snapshots and AOF logging. Tips and tricks for proper persistence configuration with Redis pools and master/slave replication.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
This document discusses Apache Kafka, an open-source distributed event streaming platform. It provides an introduction to Kafka's design and capabilities including:
1) Kafka is a distributed publish-subscribe messaging system that can handle high throughput workloads with low latency.
2) It is designed for real-time data pipelines and activity streaming and can be used for transporting logs, metrics collection, and building real-time applications.
3) Kafka supports distributed, scalable, fault-tolerant storage and processing of streaming data across multiple producers and consumers.
When does InnoDB lock a row? Multiple rows? Why would it lock a gap? How do transactions affect these scenarios? Locking is one of the more opaque features of MySQL, but it’s very important for both developers and DBA’s to understand if they want their applications to work with high performance and concurrency. This is a creative presentation to illustrate the scenarios for locking in InnoDB and make these scenarios easier to visualize. I'll cover: key locks, table locks, gap locks, shared locks, exclusive locks, intention locks, insert locks, auto-inc locks, and also conditions for deadlocks.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
Paris Clickhouse meetup 2019: How Contentsquare successfully migrated to Clickhouse !
Discover the subtleties of a migration to Clickhouse. What to check before hand, then how to operate clickhouse in Production
The document summarizes Apache Phoenix and its past, present, and future as a SQL interface for HBase. It describes Phoenix's architecture and key features like secondary indexes, joins, aggregations, and transactions. Recent releases added functional indexes, the Phoenix Query Server, and initial transaction support. Future plans include improvements to local indexes, integration with Calcite and Hive, and adding JSON and other SQL features. The document aims to provide an overview of Phoenix's capabilities and roadmap for building a full-featured SQL layer over HBase.
This document provides an agenda and introduction for a presentation on Apache Cassandra and DataStax Enterprise. The presentation covers an introduction to Cassandra and NoSQL, the CAP theorem, Apache Cassandra features and architecture including replication, consistency levels and failure handling. It also discusses the Cassandra Query Language, data modeling for time series data, and new features in DataStax Enterprise like Spark integration and secondary indexes on collections. The presentation concludes with recommendations for getting started with Cassandra in production environments.
The document discusses Facebook's use of HBase to store messaging data. It provides an overview of HBase, including its data model, performance characteristics, and how it was a good fit for Facebook's needs due to its ability to handle large volumes of data, high write throughput, and efficient random access. It also describes some enhancements Facebook made to HBase to improve availability, stability, and performance. Finally, it briefly mentions Facebook's migration of messaging data from MySQL to their HBase implementation.
DynamoDB and Schema Design
The document discusses DynamoDB schema design and core concepts. It covers how to model different types of relationships in DynamoDB, using primary keys, secondary indexes, and global secondary indexes. It also discusses techniques for optimizing performance and minimizing costs, such as using projections, sparse indexes, and sharding indexes. The document provides an overview of DynamoDB components and new transactional APIs.
RocksDB is an embedded key-value store written in C++ and optimized for fast storage environments like flash or RAM. It uses a log-structured merge tree to store data by writing new data sequentially to an in-memory log and memtable, periodically flushing the memtable to disk in sorted SSTables. It reads from the memtable and SSTables, and performs background compaction to merge SSTables and remove overwritten data. RocksDB supports two compaction styles - level style, which stores SSTables in multiple levels sorted by age, and universal style, which stores all SSTables in level 0 sorted by time.
This document provides an overview of Apache Spark, an open-source unified analytics engine for large-scale data processing. It discusses Spark's core APIs including RDDs and transformations/actions. It also covers Spark SQL, Spark Streaming, MLlib, and GraphX. Spark provides a fast and general engine for big data processing, with explicit operations for streaming, SQL, machine learning, and graph processing. The document includes installation instructions and examples of using various Spark components.
The document discusses the introduction of virtual nodes in Cassandra 1.2. It explains that virtual nodes allow a single server to handle multiple token ranges, improving hardware utilization and simplifying operations. The transition involves changing configuration settings to enable multiple tokens per node and initiating a shuffling process to redistribute data. Virtual nodes provide benefits like faster rebuilds and adding new nodes without complex token management.
Wikimedia Content API: A Cassandra Use-caseEric Evans
Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment.
This presentation covers the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.
Wikimedia Content API: A Cassandra Use-caseEric Evans
This document summarizes Eric Evans' presentation on using Cassandra as the backend for Wikimedia's content API. It discusses Wikimedia's goals of providing free knowledge, key metrics about Wikipedia and its architecture. It then focuses on how Wikimedia uses Cassandra, including their data model, compression techniques, and ongoing work to optimize compaction strategies and reduce node density to improve performance.
Castle is an open-source project that provides an alternative to the lower layers of the storage stack -- RAID and POSIX filesystems -- for big data workloads, and distributed data stores such as Apache Cassandra.
This presentation from Berlin Buzzwords 2012 provides a high-level overview of Castle and how it is used with Cassandra to improve performance and predictability.
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Eric Evans
This document discusses using Apache Cassandra to store and retrieve time series data more efficiently than the traditional RRDTool approach. It describes how Cassandra is well-suited for time series data due to its high write throughput, ability to store data sorted on disk, and partitioning and replication. The document also outlines a data model for storing time series metrics in Cassandra and discusses Newts, an open source time series data store built on Cassandra.
The Wikimedia Foundation is a non-profit and charitable organization driven by a vision of a world where every human can freely share in the sum of all knowledge. Each month Wikimedia sites serve over 18 billion page views to 500 million unique visitors around the world.
Among the many resources offered by Wikimedia is a public-facing API that provides low-latency, programmatic access to full-history content and meta-data, in a variety of formats. Commonly, results from this system are the product of computationally intensive transformations, and must be pre-generated and persisted to meet latency expectations. Unsurprisingly, there are numerous challenges to providing low-latency storage of such a massive data-set, in a demanding, globally distributed environment.
This talk covers Wikimedia Content API, and it's use of Apache Cassandra, a massively-scalable distributed database, as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature are discussed.
This document provides an overview and history of the Cassandra Query Language (CQL) and discusses changes between versions 1.0 and 2.0. It notes that CQL was introduced in Cassandra 0.8.0 to provide a more stable and user-friendly interface than the native Cassandra API. Major changes in CQL 2.0 included data type changes and additional functionality like named keys, counters, and timestamps. The document outlines the roadmap for future CQL features and lists several third-party driver projects supporting CQL connectivity.
Cassandra by Example: Data Modelling with CQL3Eric Evans
This document summarizes a presentation about modeling data with Cassandra Query Language (CQL) using examples from a Twitter-like application called Twissandra. It introduces CQL as an alternative to Thrift for querying Cassandra and describes how to model users, followers, tweets, timelines and other social media data structures in Cassandra tables. The presentation emphasizes denormalizing data and using materialized views to optimize queries, and concludes by noting that applications can be built in various languages thanks to Cassandra drivers.
This document discusses CQL, the Cassandra Query Language. CQL is designed to be similar to SQL but with some differences to account for Cassandra's data model. The presentation provides an overview of CQL's syntax and capabilities, discusses why CQL was created to provide a more stable interface than Cassandra's native protocol, and analyzes CQL's performance compared to the native protocol. Future roadmap items for CQL are also presented, including prepared statements and custom transports. Available CQL drivers for languages like Java, Python, Ruby, and Node.js are also briefly mentioned.
Rethinking Topology In Cassandra (ApacheCon NA)Eric Evans
The document discusses topology and partitioning in Cassandra distributed hash tables (DHTs). It describes issues with poor load distribution and data distribution in traditional DHT designs. It proposes using virtual nodes, where each physical node is assigned multiple tokens, to better distribute partitions and improve performance. Configuration options for Cassandra are presented that implement virtual nodes using a random token assignment strategy.
Whether it's statistics, weather forecasting, astronomy, finance, or network management, time series data plays a critical role in analytics and forecasting. Unfortunately, while many tools exist for time series storage and analysis, few are able to scale past memory limits, or provide rich query and analytics capabilities outside what is necessary to produce simple plots; For those challenged by large volumes of data, there is much room for improvement.
Apache Cassandra is a fully distributed second-generation database. Cassandra stores data in key-sorted order making it ideal for time series, and its high throughput and linear scalability make it well suited to very large data sets.
This talk will cover some of the requirements and challenges of large scale time series storage and analysis. Cassandra data and query modeling for this use-case will be discussed, and Newts, an open source Cassandra-based time series store under development at The OpenNMS Group will be introduced.
This document summarizes Spark, an open-source cluster computing framework that is 10-100x faster than Hadoop for interactive queries and stream processing. It discusses how Spark works and its Resilient Distributed Datasets (RDD) API. It then explains how Spark can be used with Cassandra for fast analytics, including reading and writing Cassandra data as RDDs and mapping rows to objects. Finally, it briefly covers the Shark SQL query engine on Spark.
This document discusses using Apache Cassandra to store and manage time series data in OpenNMS. It describes some limitations of the existing RRDTool-based data storage, such as high I/O requirements for updating and aggregating data. Cassandra is presented as an alternative that is optimized for write throughput, flexible data modeling, high availability, and ability to perform aggregations at read time rather than write time. The Newts project is introduced as a standalone time series data store built on Cassandra that aims to provide fast storage and retrieval of raw samples along with flexible aggregation capabilities.
Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
CQL is a structured query language for Apache Cassandra that is similar to SQL. It provides an alternative interface to the existing Thrift API, with the goals of being more stable, easier to use, and providing a better mental model for querying and data. The document outlines the motivations for developing CQL, including limitations of the existing Thrift API, and provides details on CQL specification, drivers, and additional resources.
1. The document discusses Cassandra Query Language (CQL), a new structured query language for Apache Cassandra that is similar to SQL.
2. CQL aims to provide a simpler alternative to Cassandra's existing Thrift API, which is difficult for clients to use and unstable due to its tight coupling to Cassandra's internal APIs.
3. The document outlines some benefits of CQL compared to the Thrift API, such as requiring less client-side abstraction and being more intuitive through its use of a familiar query/data model.
Cassandra is a distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single points of failure and linear scalability as nodes are added. Cassandra uses a peer-to-peer distributed architecture and tunable consistency levels to achieve high performance and availability without requiring strong consistency. It is based on Amazon's Dynamo and Google's Bigtable papers and provides a combination of their features.
This document provides an overview and introduction to Cassandra, an open source distributed database management system designed to handle large amounts of data across many commodity servers. It discusses Cassandra's origins from influential papers on Bigtable and Dynamo, its properties including flexibility, scalability and high availability. The document also covers Cassandra's data model using keyspaces and column families, its consistency options, API including Thrift and language drivers, and provides examples of usage for an address book app and storing timeseries data.
This document summarizes Cassandra, an open source distributed database management system designed to handle large amounts of data across many commodity servers. It discusses Cassandra's history, key features like tunable consistency levels and support for structured and indexed columns. Case studies describe how companies like Digg, Twitter, Facebook and Mahalo use Cassandra to handle terabytes of data and high transaction volumes. The roadmap outlines upcoming releases that will improve features like compaction, management tools, and support for dynamic schema changes.
This document is an introduction to Cassandra presented by Eric Evans. It provides an outline that covers the project history, description of Cassandra as a massively scalable and decentralized structured data store, and lists some of the people and companies involved in Cassandra including Facebook, Digg, IBM Research, Rackspace and Twitter. The document discusses Cassandra's capabilities such as tunable consistency levels, structured columns and supercolumns, querying, updates, client APIs and performance compared to MySQL.
This document summarizes Cassandra, an open source distributed database. It describes Cassandra's history starting at Facebook, then being taken over by Apache. It provides details on Cassandra's architecture as a massively scalable, distributed, structured data store with tunable consistency levels and fast reads/writes. The document outlines that values in Cassandra are structured and indexed by columns and supercolumns with slicing queries supported. Key features like hinted handoff, Thrift API, data center awareness, pluggable comparators, and enumeration/range queries are also summarized.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
Best Practices for Effectively Running dbt in Airflow.pdfTatiana Al-Chueyr
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
Choose our Linux Web Hosting for a seamless and successful online presencerajancomputerfbd
Our Linux Web Hosting plans offer unbeatable performance, security, and scalability, ensuring your website runs smoothly and efficiently.
Visit- https://onliveserver.com/linux-web-hosting/
Comparison Table of DiskWarrior Alternatives.pdfAndrey Yasko
To help you choose the best DiskWarrior alternative, we've compiled a comparison table summarizing the features, pros, cons, and pricing of six alternatives.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Kief Morris rethinks the infrastructure code delivery lifecycle, advocating for a shift towards composable infrastructure systems. We should shift to designing around deployable components rather than code modules, use more useful levels of abstraction, and drive design and deployment from applications rather than bottom-up, monolithic architecture and delivery.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
3. DHT 101
partitioning
Z A
Wednesday, November 7, 12 3
The keyspace, a namespace encompassing all possible keys
4. DHT 101
partitioning
Z A
Y B
C
Wednesday, November 7, 12 4
The namespace is divided into N partitions (where N is the number of nodes). Partitions are
mapped to nodes and placed evenly throughout the namespace.
5. DHT 101
partitioning
Z A
Y Key = Aaa B
C
Wednesday, November 7, 12 5
A record, stored by key, is positioned on the next node (working clockwise) from where it
sorts in the namespace
6. DHT 101
replica placement
Z A
Y Key = Aaa B
C
Wednesday, November 7, 12 6
Additional copies (replicas) are stored on other nodes. Commonly the next N-1 nodes, but
anything deterministic will work.
7. DHT 101
consistency
Consistency
Availability
Partition tolerance
Wednesday, November 7, 12 7
With multiple copies comes a set of trade-offs commonly articulated using the CAP theorem;
At any given point, we can only guarantee 2 of Consistency, Availability, and Partition
tolerance.
8. DHT 101
scenario: consistency level = one
A
W
?
?
Wednesday, November 7, 12 8
Writing at consistency level ONE provides very high availability, only one in 3 member nodes
need be up for write to succeed
9. DHT 101
scenario: consistency level = all
A
R
?
?
Wednesday, November 7, 12 9
If strong consistency is required, reads with consistency ALL can be used of writes performed
at ONE. The trade-off is in availability, all 3 member nodes must be up, else the read fails.
10. DHT 101
scenario: quorum write
A
W
R+W > N B
?
Wednesday, November 7, 12 10
Using QUORUM consistency, we only require floor((N/2)+1) nodes.
11. DHT 101
scenario: quorum read
?
R+W > N B
R
C
Wednesday, November 7, 12 11
Using QUORUM consistency, we only require floor((N/2)+1) nodes.
14. Problem:
Poor load distribution
Wednesday, November 7, 12 14
15. Distributing Load
Z A
Y B
C
M
Wednesday, November 7, 12 15
B and C hold replicas of A
16. Distributing Load
Z A
Y B
C
M
Wednesday, November 7, 12 16
A and B hold replicas of Z
17. Distributing Load
Z A
Y B
C
M
Wednesday, November 7, 12 17
Z and A hold replicas of Y
18. Distributing Load
Z A
Y B
C
M
Wednesday, November 7, 12 18
Disaster strikes!
19. Distributing Load
Z A
Y B
C
M
Wednesday, November 7, 12 19
Sets [Y,Z,A], [Z,A,B], [A,B,C] all suffer the loss of A; Results in extra load on neighboring
nodes
20. Distributing Load
Z A
A1
Y B
C
M
Wednesday, November 7, 12 20
Solution: Replace/repair down node
21. Distributing Load
Z A
A1
Y B
C
M
Wednesday, November 7, 12 21
Solution: Replace/repair down node
22. Distributing Load
Z A
A1
Y B
C
M
Wednesday, November 7, 12 22
Neighboring nodes are needed to stream missing data to A; Results in even more load on
neighboring nodes
23. Problem:
Poor data distribution
Wednesday, November 7, 12 23
24. Distributing Data
A
C
D
B
Wednesday, November 7, 12 24
Ideal distribution of keyspace
25. Distributing Data
A
E
C
D
B
Wednesday, November 7, 12 25
Bootstrapping a node, bisecting one partition; Distribution is no longer ideal
26. Distributing Data
A A
E
C
D
C
D
B B
Wednesday, November 7, 12 26
Moving existing nodes means moving corresponding data; Not ideal
27. Distributing Data
A A
E
C
D
C
D
B B
Wednesday, November 7, 12 27
Moving existing nodes means moving corresponding data; Not ideal
28. Distributing Data
A
H E
C
D
G F
B
Wednesday, November 7, 12 28
Frequently cited alternative: Double the size of your cluster, bisecting all ranges
29. Distributing Data
A
H E
C
D
G F
B
Wednesday, November 7, 12 29
Frequently cited alternative: Double the size of your cluster, bisecting all ranges
31. In a nutshell...
host
host
host
Wednesday, November 7, 12 31
Basically: “nodes” on the ring are virtual, and many of them are mapped to each “real” node
(host)
32. Benefits
• Operationally simpler (no token
management)
• Better distribution of load
• Concurrent streaming involving all hosts
• Smaller partitions mean greater reliability
• Supports heterogenous hardware
Wednesday, November 7, 12 32
33. Strategies
• Automatic sharding
• Fixed partition assignment
• Random token assignment
Wednesday, November 7, 12 33
34. Strategy
Automatic Sharding
• Partitions are split when data exceeds a
threshold
• Newly created partitions are relocated to a
host with lower data load
• Similar to sharding performed by Bigtable,
or Mongo auto-sharding
Wednesday, November 7, 12 34
35. Strategy
Fixed Partition Assignment
• Namespace divided into Q evenly-sized
partitions
• Q/N partitions assigned per host (where N
is the number of hosts)
• Joining hosts “steal” partitions evenly from
existing hosts.
• Used by Dynamo and Voldemort (described
in Dynamo paper as “strategy 3”)
Wednesday, November 7, 12 35
36. Strategy
Random Token Assignment
• Each host assigned T random tokens
• T random tokens generated for joining
hosts; New tokens divide existing ranges
• Similar to libketama; Identical to Classic
Cassandra when T=1
Wednesday, November 7, 12 36
37. Considerations
1. Number of partitions
2. Partition size
3. How 1 changes with more nodes and data
4. How 2 changes with more nodes and data
Wednesday, November 7, 12 37
38. Evaluating
Strategy No. Partitions Partition size
Random O(N) O(B/N)
Fixed O(1) O(B)
Auto-sharding O(B) O(1)
B ~ total data size, N ~ number of hosts
Wednesday, November 7, 12 38
39. Evaluating
• Automatic sharding
• partition size constant (great)
• number of partitions scales linearly with
data size (bad)
• Fixed partition assignment
• Random token assignment
Wednesday, November 7, 12 39
40. Evaluating
• Automatic sharding
• Fixed partition assignment
• Number of partitions is constant (good)
• Partition size scales linearly with data size
(bad)
• Higher operational complexity (bad)
• Random token assignment
Wednesday, November 7, 12 40
41. Evaluating
• Automatic sharding
• Fixed partition assignment
• Random token assignment
• Number of partitions scales linearly with
number of hosts (good ok)
• Partition size increases with more data;
decreases with more hosts (good)
Wednesday, November 7, 12 41
42. Evaluating
• Automatic sharding
• Fixed partition assignment
• Random token assignment
Wednesday, November 7, 12 42
44. Configuration
conf/cassandra.yaml
# Comma separated list of tokens,
# (new installs only).
initial_token:<token>,<token>,<token>
or
# Number of tokens to generate.
num_tokens: 256
Wednesday, November 7, 12 44
Two params control how tokens are assigned. The initial_token param now optionally
accepts a csv list, or (preferably) you can assign a numeric value to num_tokens
45. Configuration
nodetool info
Token : (invoke with -T/--tokens to see all 256 tokens)
ID : 64090651-6034-41d5-bfc6-ddd24957f164
Gossip active : true
Thrift active : true
Load : 92.69 KB
Generation No : 1351030018
Uptime (seconds): 45
Heap Memory (MB): 95.16 / 1956.00
Data Center : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : size 240 (bytes), capacity 101711872 (bytes ...
Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, ...
Wednesday, November 7, 12 45
To keep the output readable, nodetool info no longer displays tokens (if there are more than
one), unless the -T/--tokens argument is passed
46. Configuration
nodetool ring
Datacenter: datacenter1
==========
Replicas: 2
Address Rack Status State Load Owns Token
9022770486425350384
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9182469192098976078
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9054823614314102214
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8970752544645156769
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8927190060345427739
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8880475677109843259
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8817876497520861779
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8810512134942064901
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8661764562509480261
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8641550925069186492
127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8636224350654790732
...
...
Wednesday, November 7, 12 46
nodetool ring is still there, but the output is significantly more verbose, and it is less useful
as the go-to
47. Configuration
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1
UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1
UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1
Wednesday, November 7, 12 47
New go-to command is nodetool status
48. Configuration
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1
UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1
UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1
Wednesday, November 7, 12 48
Of note, since it is no longer practical to name a host by it’s token (because it can have
many), each host has a unique ID
49. Configuration
nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1
UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1
UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1
Wednesday, November 7, 12 49
Note the token per-node count
51. Migration
edit conf/cassandra.yaml and restart
# Number of tokens to generate.
num_tokens: 256
Wednesday, November 7, 12 51
Step 1: Set num_tokens in cassandra.yaml, and restart node
52. Migration
convert to T contiguous tokens in existing ranges
A AA
A AA B
A
A
AA
A
A
AA A
A
A
A
AAA AA
A A
A
C
A
A
A
A
A
A
A
A
Wednesday, November 7, 12 52
This will cause the existing range to be split into T contiguous tokens. This results in no
change to placement
53. Migration
shuffle
A AA
A AA B
A
A
AA
A
A
AA A
A
A
A
AAA AA
A A
A
C
A
A
A
A
A
A
A
A
Wednesday, November 7, 12 53
Step 2: Initialize a shuffle operation. Nodes randomly exchange ranges.
54. Shuffle
• Range transfers are queued on each host
• Hosts initiate transfer of ranges to self
• Pay attention to the logs!
Wednesday, November 7, 12 54
55. Shuffle
bin/shuffle
Usage: shuffle [options] <sub-command>
Sub-commands:
create Initialize a new shuffle operation
ls List pending relocations
clear Clear pending relocations
en[able] Enable shuffling
dis[able] Disable shuffling
Options:
-dc, --only-dc Apply only to named DC (create only)
-tp, --thrift-port Thrift port number (Default: 9160)
-p, --port JMX port number (Default: 7199)
-tf, --thrift-framed Enable framed transport for Thrift (Default: false)
-en, --and-enable Immediately enable shuffling (create only)
-H, --help Print help information
-h, --host JMX hostname or IP address (Default: localhost)
-th, --thrift-host Thrift hostname or IP address (Default: JMX host)
Wednesday, November 7, 12 55