We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
MySQL Clustering over InnoDB engines has grown a lot over the last decade. Galera began working with InnoDB early and then Group Replication came to the environment later, where the features are now rich and robust. This presentation offers a technical comparison of both of them.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
This document introduces HBase, an open-source, non-relational, distributed database modeled after Google's BigTable. It describes what HBase is, how it can be used, and when it is applicable. Key points include that HBase stores data in columns and rows accessed by row keys, integrates with Hadoop for MapReduce jobs, and is well-suited for large datasets, fast random access, and write-heavy applications. Common use cases involve log analytics, real-time analytics, and messages-centered systems.
The document discusses two MySQL high availability solutions: MySQL InnoDB Cluster and MySQL NDB Cluster. MySQL InnoDB Cluster provides easy high availability built into MySQL with write consistency, read scalability, and application failover using MySQL Router. MySQL NDB Cluster is an in-memory database that provides automatic sharding, native access via several APIs, read/write consistency, and read/write scalability using the NDB storage engine. The document compares the two solutions and discusses their architectures and key features.
- Galera is a MySQL clustering solution that provides true multi-master replication with synchronous replication and no single point of failure. - It allows high availability, data integrity, and elastic scaling of databases across multiple nodes. - Companies like Percona and MariaDB have integrated Galera to provide highly available database clusters.
MaxScale is a database proxy that provides load balancing, connection pooling, and replication capabilities for MariaDB and MySQL databases. It can be used to scale databases horizontally across multiple servers for increased performance and availability. The document provides an overview of MaxScale concepts and capabilities such as routing, filtering, security features, and how it can be used for operational tasks like query caching, logging, and data streaming. It also includes instructions on setting up MaxScale with a basic example of configuring read/write splitting between a master and slave database servers.
Galera Cluster for MySQL, Percona XtraDB Cluster and MariaDB Cluster (the three “flavours” of Galera Cluster) make use of the Galera WSREP libraries to handle synchronous replication.MySQL Cluster is the official clustering solution from Oracle, while Galera Cluster for MySQL is slowly but surely establishing itself as the de-facto clustering solution in the wider MySQL eco-system. In this webinar, we will look at all these alternatives and present an unbiased view on their strengths/weaknesses and the use cases that fit each alternative. This webinar will cover the following: MySQL Cluster architecture: strengths and limitations Galera Architecture: strengths and limitations Deployment scenarios Data migration Read and write workloads (Optimistic/pessimistic locking) WAN/Geographical replication Schema changes Management and monitoring
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
A coordination service like Zookeeper helps distributed applications coordinate by providing common services like synchronization, configuration sharing, naming, and leader election. Zookeeper uses an ensemble of servers running as a cluster. It stores data in a hierarchical namespace of znodes. Clients can read and write znodes, set watches on znodes to get notified of changes, and rely on Zookeeper to handle session and server failures in a transparent way. Some common usage recipes for Zookeeper include barriers for synchronization, cluster management using ephemeral znodes, queues using sequential znodes, locks for mutual exclusion, and leader election.
Nowadays, people are creating, sharing and storing data at a faster pace than ever before, effective data compression / decompression could significantly reduce the cost of data usage. Apache Spark is a general distributed computing engine for big data analytics, and it has large amount of data storing and shuffling across cluster in runtime, the data compression/decompression codecs can impact the end to end application performance in many ways. However, there’s a trade-off between the storage size and compression/decompression throughput (CPU computation). Balancing the data compress speed and ratio is a very interesting topic, particularly while both software algorithms and the CPU instruction set keep evolving. Apache Spark provides a very flexible compression codecs interface with default implementations like GZip, Snappy, LZ4, ZSTD etc. and Intel Big Data Technologies team also implemented more codecs based on latest Intel platform like ISA-L(igzip), LZ4-IPP, Zlib-IPP and ZSTD for Apache Spark; in this session, we’d like to compare the characteristics of those algorithms and implementations, by running different micro workloads as well as end to end workloads, based on different generations of Intel x86 platform and disk. It’s supposedly to be the best practice for big data software engineers to choose the proper compression/decompression codecs for their applications, and we also will present the methodologies of measuring and tuning the performance bottlenecks for typical Apache Spark workloads.
Alluxio Day VIII December 14, 2021 https://www.alluxio.io/alluxio-day/ Speaker: Ryan Blue, Apache Iceberg
This document provides an overview of Apache Kafka. It begins with defining Kafka as a distributed streaming platform and messaging system. It then lists the agenda which includes what Kafka is, why it is used, common use cases, major companies that use it, how it achieves high performance, and core concepts. Core concepts explained include topics, partitions, brokers, replication, leaders, and producers and consumers. The document also provides examples to illustrate these concepts.