Nadav Har'El from ScyllaDB team, on the path for 10x Cassandra performance. Seastar / ScyllaDB intro at the generalist engineer meetup
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
My short talk at a RocksDB meetup on 2/3/16 about the new Ceph OSD backend BlueStore and its use of rocksdb.
This document discusses replacing external caching solutions with using the internal caching capabilities of ScyllaDB. It provides examples of companies that improved performance, reduced costs and complexity by moving from Redis or Elasticsearch with an external cache to using ScyllaDB's embedded cache instead. The document also outlines some of the advantages of ScyllaDB's cache like improved latency, coherency with the database and observability compared to external caching layers.
Apache HBase is the Hadoop opensource, distributed, versioned storage manager well suited for random, realtime read/write access. This talk will give an overview on how HBase achieve random I/O, focusing on the storage layer internals. Starting from how the client interact with Region Servers and Master to go into WAL, MemStore, Compactions and on-disk format details. Looking at how the storage is used by features like snapshots, and how it can be improved to gain flexibility, performance and space efficiency.
This document summarizes a presentation about Ceph, an open-source distributed storage system. It discusses Ceph's introduction and components, benchmarks Ceph's block and object storage performance on Intel architecture, and describes optimizations like cache tiering and erasure coding. It also outlines Intel's product portfolio in supporting Ceph through optimized CPUs, flash storage, networking, server boards, software libraries, and contributions to the open source Ceph community.
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB Navigating workload-specific performance challenges and tradeoffs. Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
How Coralogix shrank query processing times from 30 seconds (not a typo) to 86 ms by moving from Postgres to ScyllaDB.
DataSourceV2 is Spark's new API for working with data from tables and streams, but "v2" also includes a set of changes to SQL internals, the addition of a catalog API, and changes to the data frame read and write APIs. This talk will cover the context for those additional changes and how "v2" will make Spark more reliable and predictable for building enterprise data pipelines. This talk will include: * Problem areas where the current behavior is unpredictable or unreliable * The new standard SQL write plans (and the related SPIP) * The new table catalog API and a new Scala API for table DDL operations (and the related SPIP) * Netflix's use case that motivated these changes
Anti-entropy repairs are known to be a very peculiar maintenance operation of Cassandra clusters. They are problematic mostly because of the potential of having negative impact on the cluster's performance. Another problematic aspect is the difficulty of managing the repairs of Cassandra clusters in a careful way that would prevent the negative performance impact. Based on the long-term pain we have been experiencing with managing repairs of nearly 100 Cassandra clusters, and being unable to find a solution that would meet our needs, we went ahead and developed an open-source tool, named Cassandra Reaper [1], for easy management of Cassandra repairs. Cassandra Reaper is a tool that automates the management of anti-entropy repairs of Cassandra clusters in a rather smart, efficient and careful manner while requiring minimal Cassandra expertise. I will have to cover some basics of eventual consistency mechanisms of Cassandra, after which I will be able to focus on the features of Cassandra Reaper and our six months of experience having the tool managing the repairs of our production clusters.
Apache Spark 1.4 introduced support for Apache ORC. However, initially it did not take advantage of the full power of ORC. For instance, it was slow because ORC vectorization was not used and push-down predicate wa s also not supported on DATE types. Recently the Apache Spark community has started to use the latest Apache ORC which include new enhancements to address these limitations. In this talk, we show the result of integrating the latest Apache ORC and Apache Spark. We will also review the latest enhancements and roadmap. Speakers: Owen O'Malley, Co-founder & Technical Fellow, Hortonworks Dongjoon Hyun, Staff Software Engineer, Hortonworks
Optimizing MapReduce job performance is often seen as something of a black art. In order to maximize performance, developers need to understand the inner workings of the MapReduce execution framework and how they are affected by various configuration parameters and MR design patterns. The talk will illustrate the underlying mechanics of job and task execution, including the map side sort/spill, the shuffle, and the reduce side merge, and then explain how different job configuration parameters and job design strategies affect the performance of these operations. Though the talk will cover internals, it will also provide practical tips, guidelines, and rules of thumb for better job performance. The talk is primarily targeted towards developers directly using the MapReduce API, though will also include some tips for users of higher level frameworks.
This document discusses techniques for improving latency in HBase. It analyzes the write and read paths, identifying sources of latency such as networking, HDFS flushes, garbage collection, and machine failures. For writes, it finds that single puts can achieve millisecond latency while streaming puts can hide latency spikes. For reads, it notes cache hits are sub-millisecond while cache misses and seeks add latency. GC pauses of 25-100ms are common, and failures hurt locality and require cache rebuilding. The document outlines ongoing work to reduce GC, use off-heap memory, improve compactions and caching to further optimize for low latency.
The document summarizes new features and updates in Ceph's RBD block storage component. Key points include: improved live migration support using external data sources; built-in LUKS encryption; up to 3x better small I/O performance; a new persistent write-back cache; snapshot quiesce hooks; kernel messenger v2 and replica read support; and initial RBD support on Windows. Future work planned for Quincy includes encryption-formatted clones, cache improvements, usability enhancements, and expanded ecosystem integration.
When a new node is added or removed, Scylla has to transfer part of the existing data from some nodes to their neighbors. When a node fails, Scylla has to repopulate its data with data from the surviving replicas. Those operations are collectively referred to as "streaming" operations, since they simply stream data from one node to another, without using this opportunity to also fix discrepancies in the data. This is in contrast with the repair operation, that looks into all existing replicas and reconcile their contents. Scylla is moving towards unifying those two operations. In this talk we will discuss why this is considered beneficial, and what other possibilities this opens to users.
Hive’s RCFile has been the standard format for storing Hive data for the last 3 years. However, RCFile has limitations because it treats each column as a binary blob without semantics. The upcoming Hive 0.11 will add a new file format named Optimized Row Columnar (ORC) file that uses and retains the type information from the table definition. ORC uses type specific readers and writers that provide light weight compression techniques such as dictionary encoding, bit packing, delta encoding, and run length encoding — resulting in dramatically smaller files. Additionally, ORC can apply generic compression using zlib, LZO, or Snappy on top of the lightweight compression for even smaller files. However, storage savings are only part of the gain. ORC supports projection, which selects subsets of the columns for reading, so that queries reading only one column read only the required bytes. Furthermore, ORC files include light weight indexes that include the minimum and maximum values for each column in each set of 10,000 rows and the entire file. Using pushdown filters from Hive, the file reader can skip entire sets of rows that aren’t important for this query. Columnar storage formats like ORC reduce I/O and storage use, but it’s just as important to reduce CPU usage. A technical breakthrough called vectorized query execution works nicely with column store formats to do this. Vectorized query execution has proven to give dramatic performance speedups, on the order of 10X to 100X, for structured data processing. We describe how we’re adding vectorized query execution to Hive, coupling it with ORC with a vectorized iterator.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
This document provides an overview of new features in Oracle Real Application Clusters (RAC) 12c Release 2, including: 1. The Cluster Domain architecture improves scalability by assigning each pluggable database a unique domain ID. 2. Flex diskgroups allow database files to be grouped and managed at the file group level. Quota groups also enable enforcing quota management. 3. The Autonomous Health Framework automates monitoring and problem resolution to reduce downtime.
This document discusses bulk loading and unloading data into and from Cassandra. It describes using CQL INSERT statements via Java drivers or CQLSH COPY FROM for loading, as well as using SSTable files via sstableloader or custom code. For unloading, it recommends using parallel CQL SELECT queries by splitting the token range across multiple connections. Testing showed Java asynchronous INSERTs to be the fastest loading method in most cases, while sstableloader requires all nodes be online. Batching INSERTs can improve throughput but increases latency.
Mesos is an open source cluster manager that improves resource utilization. It allows Spark Streaming jobs to leverage Mesos fault tolerance features like driver supervision using Marathon. Backpressure is also supported in Spark Streaming to prevent scheduling delays from fast data arrival. Reactive Streams provide more direct backpressure control and are expected in future Spark versions.
Thread-per-core programming models are well known in software domains where latency is important. Pinning application threads to physical cores and handling data in a core-affine way can yield impressive results, but what is it like to build a large scale general purpose software product within these constraints? Redpanda is a high performance persistent stream engine, built on the Seastar framework. This session will describe practical experience of building high performance systems with C++20 in an asynchronous runtime, the unexpected simplicity that can come from strictly mapping data to cores, and explore the challenges & tradeoffs in adopting a thread-per-core architecture.
Cassandra at Pollfish. Athens Cassandra Users MeetUp (Datastax)- 11 March 2015. http://www.meetup.com/Athens-Cassandra-Users/events/220922960/
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
This community update from Sage Weil at Red Hat provides information on the current and upcoming releases of Ceph. The document summarizes that Luminous is the current stable release, BlueStore is now stable and default, and there have been significant performance improvements for hardware like HDDs. It also outlines many new features and improvements planned or in development for Ceph components like RBD, RGW, CephFS, erasure coding, and more in upcoming releases like Mimic.
This document summarizes new features and upcoming releases for Ceph. In the Jewel release in April 2016, CephFS became more stable with improvements to repair and disaster recovery tools. The BlueStore backend was introduced experimentally to replace Filestore. Future releases Kraken and Luminous will include multi-active MDS support for CephFS, erasure code overwrites for RBD, management tools, and continued optimizations for performance and scalability.
What can you expect when migrating from Cassandra to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to Cassandra’s. Then, hear about your Cassandra to ScyllaDB migration options and practical strategies for success, including our top do’s and don’ts.
Ceph is a mature open source software-defined storage solution that was created over a decade ago. During that time new faster storage technologies have emerged including NVMe and Persistent memory. The crimson project aim is to create a better Ceph OSD that is more well suited to those faster devices. The crimson OSD is built on the Seastar C++ framework and can leverage these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the project design, our current status, and our future plans.
This document discusses Linux huge pages, including: - What huge pages are and how they can reduce memory management overhead by allocating larger blocks of memory - How to configure huge pages on Linux, including installing required packages, mounting the huge page filesystem, and setting kernel parameters - When huge pages should be configured, such as for data-intensive or latency-sensitive applications like databases, but that testing is required due to disadvantages like reduced swappability
Ceph is a highly scalable open source distributed storage system that provides object, block, and file interfaces on a single platform. Although Ceph RBD block storage has dominated OpenStack deployments for several years, maturing object (S3, Swift, and librados) interfaces and stable CephFS (file) interfaces now make Ceph the only fully open source unified storage platform. This talk will cover Ceph's architectural vision and project mission and how our approach differs from alternative approaches to storage in the OpenStack ecosystem. In particular, we will look at how our open development model dovetails well with OpenStack, how major contributors are advancing Ceph capabilities and performance at a rapid pace to adapt to new hardware types and deployment models, and what major features we are priotizing for the next few years to meet the needs of expanding cloud workloads.
This document provides an introduction to Docker and Openshift including discussions around infrastructure, storage, monitoring, metrics, logs, backup, and security considerations. It describes the recommended infrastructure for a 3 node Openshift cluster including masters, etcd, and nodes. It also discusses strategies for storage, monitoring both internal pod status and external infrastructure metrics, collecting and managing logs, backups, and security features within Openshift like limiting resource usage and isolating projects.
Introduction to Apache Spark. With an emphasis on the RDD API, Spark SQL (DataFrame and Dataset API) and Spark Streaming. Presented at the Desert Code Camp: http://oct2016.desertcodecamp.com/sessions/all