A presentation in ApacheCon Asia 2022 from Dan Wang and Yingchun Lai.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
High Concurrency Architecture and Laravel Performance Tuning
This document summarizes techniques for improving performance and concurrency in Laravel applications. It discusses caching routes and configuration files, using caching beyond just the database, implementing asynchronous event handling with message queues, separating database reads and writes, enabling OPcache and preloading in PHP 7.4, and analyzing use cases like a news site, ticketing system, and chat service. The document provides benchmarks showing performance improvements from these techniques.
Getting started with Riak in the Cloud involves provisioning a Riak cluster on Engine Yard and optimizing it for performance. Key steps include choosing instance types like m1.large or m1.xlarge that are EBS-optimized, having at least 5 nodes, setting the ring size to 256, disabling swap, using the Bitcask backend, enabling kernel optimizations, and monitoring and backing up the cluster. Benchmarks show best performance from high I/O instance types like hi1.4xlarge that use SSDs rather than EBS storage.
High-Performance Storage Services with HailDB and Java
This document summarizes an approach to providing high-performance storage services using Java and HailDB. It discusses using the optimized "guts" of MySQL without needing to go through JDBC and SQL. It presents HailDB as a storage engine alternative to NoSQL options like Voldemort. It describes integrating HailDB with Java using JNA, building a REST API on top called St8, and examples of nifty applications like graph stores and counters. It concludes with discussing future work like improving packaging, online backup, and exploring JNI bindings.
In-memory Caching in HDFS: Lower Latency, Same Great Taste
This document discusses in-memory caching in HDFS to improve query latency. The implementation caches important datasets in the DataNode memory and allows clients to directly access cached blocks via zero-copy reads without checksum verification. Evaluation shows the zero-copy reads approach provides significant performance gains over short-circuit and TCP reads for both microbenchmarks and Impala queries, with speedups of up to 7x when the working set fits in memory. MapReduce jobs see more modest gains as they are often not I/O bound.
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
1) Aesop is an open source change data capture and propagation tool that reliably captures changes from data sources and propagates them to other data stores and systems to enable eventual consistency across polyglot data platforms.
2) It uses log mining to capture changes from data sources like MySQL and propagates the change events to consumers like ElasticSearch and HBase through an enhanced relay component.
3) It provides utilities for bootstrapping consumers, monitoring and administering the system, and has been used in production at Flipkart for applications including payments, ETL, and data serving.
Scaling with sync_replication using Galera and EC2
Challenging architecture design, and proof of concept on a real case of study using Syncrhomous solution.
Customer asks me to investigate and design MySQL architecture to support his application serving shops around the globe.
Scale out and scale in base to sales seasons.
Using cassandra as a distributed logging to store pb data
This document discusses using Cassandra for big data event logging. It notes that Cassandra scales incrementally, is highly available, and is well suited for OLTP workloads where write throughput is prioritized over reads. It covers Cassandra's internal workings including token assignment, replication, and compaction strategies. Setup instructions are provided along with benchmarking results. Maintenance tools like Nodetool and stress testing tools are also mentioned. The document concludes that Cassandra is a good candidate for logging systems due to its scalability and ease of adding nodes.
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...
O.co implemented Teradata Express on VMware to create virtual development and sandbox environments. This provided isolation for testing new technologies with minimal impact to production. Benefits included increased flexibility and the ability to more closely mimic production. Challenges included data size limitations, maintaining referential integrity across images, and performance on shared hardware. Initial projects showed success in validating new features ahead of upgrades. Next steps include improving access controls and automating image management.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
This document provides an overview and agenda for a presentation on Apache ActiveMQ 5.9.x and Apache Apollo. The presentation will cover new features in ActiveMQ 5.9.x including AMQP 1.0 support, REST management, a new default file-based store using LevelDB, and high availability replication of the store. It will also introduce Apache Apollo and allow for a question and discussion period.
This document summarizes Marek Lubinski's presentation on implementing NexentaStor as the backend storage for LeaseWeb's Express Cloud platform. Initially, storage performance issues arose using Linux NFS on Supermicro servers. NexentaStor was selected for its ZFS capabilities. The initial design underestimated I/O sizes and overestimated cache hit ratios, resulting in performance problems during VM migration. Through sizing changes, such as adding RAM, SSD caching, and mirrored vdevs, a final configuration was achieved that met performance needs. Lessons included using mirrored vdevs for high demand, achieving high cache hit ratios, and testing configurations first.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Link: https://youtu.be/YhktX1W0geM
https://go.dok.community/slack
https://dok.community/
From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)
The storage topology in vogue seems to cycle every few years. Internal storage is followed by centralized Storage Area Networks only to be superseded by one-size-fits-all Hyperconverged models - until scalability constraints led to distributed storage. Then comes NVMe, offering blistering speeds that all of these storage stacks struggle with. Kubernetes inspires Container Attached Storage aspiring to be the perfect model, so why is disaggregated storage now making an appearance?
This talk considers the motivations behind yet another storage topology and examines a modern, flexible architecture for delivering high-performance storage under Kubernetes.
-----
Nick Connolly is a pioneer of storage virtualisation and the Chief Scientist at DataCore, where his background in real-time computing and multiprocessing led to the creation of a world-class high-performance storage stack on Windows. He holds patents ranging from highly scalable algorithms through to data protection techniques. Recently he has been working with OpenEBS to bring the power and performance of NVMe to Kubernetes.
Apache Pegasus (incubating): A distributed key-value storage system
A presentation in ApacheCon Asia 2021 from Yuchen He and Shuo Jia.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
A presentation in COSCon (China Open Source Conference) 2023 from Guohao Li.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
A presentation in Apache Pegasus meetup in 2022 from Jiaoming Shi.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How to continuously improve Apache Pegasus in complex toB scenarios
A presentation in Apache Pegasus meetup in 2022 from Hao Wang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
A presentation in Apache Pegasus meetup in 2022 from Wei Wang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How does Apache Pegasus used in Xiaomi's Universal Recommendation Algorithm ...
A presentation in Apache Pegasus meetup in 2022 from Wei Liang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
A presentation in Apache Pegasus meetup in 2022 from Shuo Jia.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
A presentation in Apache Pegasus meetup in 2021 from Yuchen He.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
Apache Pegasus's Practice in Data Access Business of Xiaomi
A presentation in Apache Pegasus meetup in 2021 from Fateng Xiao.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...
A presentation in Apache Pegasus meetup in 2021 from Gang Hao.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How do we manage more than one thousand of Pegasus clusters - engine part
A presentation in Apache Pegasus meetup in 2021 from Guohao Li.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How do we manage more than one thousand of Pegasus clusters - backend part
A presentation in Apache Pegasus meetup in 2021 from Wang Dan.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
We will start from understanding how Real-Time Analytics can be implemented on Enterprise Level Infrastructure and will go to details and discover how different cases of business intelligence be used in real-time on streaming data. We will cover different Stream Data Processing Architectures and discus their benefits and disadvantages. I'll show with live demos how to build Fast Data Platform in Azure Cloud using open source projects: Apache Kafka, Apache Cassandra, Mesos. Also I'll show examples and code from real projects.
This document discusses various techniques for optimizing Drupal performance, including:
- Defining goals such as faster page loads or handling more traffic
- Applying patches and rearchitecting content to optimize at a code level
- Using tools like Apache Benchmark and MySQL tuning to analyze performance bottlenecks
- Implementing solutions like caching, memcached, and reverse proxies to improve scalability
High Concurrency Architecture and Laravel Performance TuningAlbert Chen
This document summarizes techniques for improving performance and concurrency in Laravel applications. It discusses caching routes and configuration files, using caching beyond just the database, implementing asynchronous event handling with message queues, separating database reads and writes, enabling OPcache and preloading in PHP 7.4, and analyzing use cases like a news site, ticketing system, and chat service. The document provides benchmarks showing performance improvements from these techniques.
Getting started with Riak in the Cloud involves provisioning a Riak cluster on Engine Yard and optimizing it for performance. Key steps include choosing instance types like m1.large or m1.xlarge that are EBS-optimized, having at least 5 nodes, setting the ring size to 256, disabling swap, using the Bitcask backend, enabling kernel optimizations, and monitoring and backing up the cluster. Benchmarks show best performance from high I/O instance types like hi1.4xlarge that use SSDs rather than EBS storage.
High-Performance Storage Services with HailDB and Javasunnygleason
This document summarizes an approach to providing high-performance storage services using Java and HailDB. It discusses using the optimized "guts" of MySQL without needing to go through JDBC and SQL. It presents HailDB as a storage engine alternative to NoSQL options like Voldemort. It describes integrating HailDB with Java using JNA, building a REST API on top called St8, and examples of nifty applications like graph stores and counters. It concludes with discussing future work like improving packaging, online backup, and exploring JNI bindings.
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
This document discusses in-memory caching in HDFS to improve query latency. The implementation caches important datasets in the DataNode memory and allows clients to directly access cached blocks via zero-copy reads without checksum verification. Evaluation shows the zero-copy reads approach provides significant performance gains over short-circuit and TCP reads for both microbenchmarks and Impala queries, with speedups of up to 7x when the working set fits in memory. MapReduce jobs see more modest gains as they are often not I/O bound.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
1) Aesop is an open source change data capture and propagation tool that reliably captures changes from data sources and propagates them to other data stores and systems to enable eventual consistency across polyglot data platforms.
2) It uses log mining to capture changes from data sources like MySQL and propagates the change events to consumers like ElasticSearch and HBase through an enhanced relay component.
3) It provides utilities for bootstrapping consumers, monitoring and administering the system, and has been used in production at Flipkart for applications including payments, ETL, and data serving.
Scaling with sync_replication using Galera and EC2Marco Tusa
Challenging architecture design, and proof of concept on a real case of study using Syncrhomous solution.
Customer asks me to investigate and design MySQL architecture to support his application serving shops around the globe.
Scale out and scale in base to sales seasons.
Using cassandra as a distributed logging to store pb dataRamesh Veeramani
This document discusses using Cassandra for big data event logging. It notes that Cassandra scales incrementally, is highly available, and is well suited for OLTP workloads where write throughput is prioritized over reads. It covers Cassandra's internal workings including token assignment, replication, and compaction strategies. Setup instructions are provided along with benchmarking results. Maintenance tools like Nodetool and stress testing tools are also mentioned. The document concludes that Cassandra is a good candidate for logging systems due to its scalability and ease of adding nodes.
Teradata Partners 2011 - Utilizing Teradata Express For Development And Sandb...monsonc
O.co implemented Teradata Express on VMware to create virtual development and sandbox environments. This provided isolation for testing new technologies with minimal impact to production. Benefits included increased flexibility and the ability to more closely mimic production. Challenges included data size limitations, maintaining referential integrity across images, and performance on shared hardware. Initial projects showed success in validating new features ahead of upgrades. Next steps include improving access controls and automating image management.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
This document provides an overview and agenda for a presentation on Apache ActiveMQ 5.9.x and Apache Apollo. The presentation will cover new features in ActiveMQ 5.9.x including AMQP 1.0 support, REST management, a new default file-based store using LevelDB, and high availability replication of the store. It will also introduce Apache Apollo and allow for a question and discussion period.
This document summarizes Marek Lubinski's presentation on implementing NexentaStor as the backend storage for LeaseWeb's Express Cloud platform. Initially, storage performance issues arose using Linux NFS on Supermicro servers. NexentaStor was selected for its ZFS capabilities. The initial design underestimated I/O sizes and overestimated cache hit ratios, resulting in performance problems during VM migration. Through sizing changes, such as adding RAM, SSD caching, and mirrored vdevs, a final configuration was achieved that met performance needs. Lessons included using mirrored vdevs for high demand, achieving high cache hit ratios, and testing configurations first.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
Link: https://youtu.be/YhktX1W0geM
https://go.dok.community/slack
https://dok.community/
From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)
The storage topology in vogue seems to cycle every few years. Internal storage is followed by centralized Storage Area Networks only to be superseded by one-size-fits-all Hyperconverged models - until scalability constraints led to distributed storage. Then comes NVMe, offering blistering speeds that all of these storage stacks struggle with. Kubernetes inspires Container Attached Storage aspiring to be the perfect model, so why is disaggregated storage now making an appearance?
This talk considers the motivations behind yet another storage topology and examines a modern, flexible architecture for delivering high-performance storage under Kubernetes.
-----
Nick Connolly is a pioneer of storage virtualisation and the Chief Scientist at DataCore, where his background in real-time computing and multiprocessing led to the creation of a world-class high-performance storage stack on Windows. He holds patents ranging from highly scalable algorithms through to data protection techniques. Recently he has been working with OpenEBS to bring the power and performance of NVMe to Kubernetes.
Similar to How does Apache Pegasus (incubating) community develop at SensorsData (20)
Apache Pegasus (incubating): A distributed key-value storage systemacelyc1112009
A presentation in ApacheCon Asia 2021 from Yuchen He and Shuo Jia.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How does Apache Pegasusused in SensorsDataacelyc1112009
A presentation in COSCon (China Open Source Conference) 2023 from Guohao Li.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How does the Apache Pegasus used in Advertising Data Stream in SensorsDataacelyc1112009
A presentation in Apache Pegasus meetup in 2022 from Jiaoming Shi.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How to continuously improve Apache Pegasus in complex toB scenariosacelyc1112009
A presentation in Apache Pegasus meetup in 2022 from Hao Wang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...acelyc1112009
A presentation in Apache Pegasus meetup in 2022 from Wei Wang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How does Apache Pegasus used in Xiaomi's Universal Recommendation Algorithm ...acelyc1112009
A presentation in Apache Pegasus meetup in 2022 from Wei Liang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Introduction of Apache Pegasus 2.4.0acelyc1112009
A presentation in Apache Pegasus meetup in 2022 from Shuo Jia.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Design, Implementation and Open Source Way of Apache Pegasusacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Yuchen He.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
Apache Pegasus's Practice in Data Access Business of Xiaomiacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Fateng Xiao.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...acelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Gang Hao.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How do we manage more than one thousand of Pegasus clusters - engine partacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Guohao Li.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How do we manage more than one thousand of Pegasus clusters - backend partacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Wang Dan.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
How We Added Replication to QuestDB - JonTheBeachjavier ramirez
Building a database that can beat industry benchmarks is hard work, and we had to use every trick in the book to keep as close to the hardware as possible. In doing so, we initially decided QuestDB would scale only vertically, on a single instance.
A few years later, data replication —for horizontally scaling reads and for high availability— became one of the most demanded features, especially for enterprise and cloud environments. So, we rolled up our sleeves and made it happen.
Today, QuestDB supports an unbounded number of geographically distributed read-replicas without slowing down reads on the primary node, which can ingest data at over 4 million rows per second.
In this talk, I will tell you about the technical decisions we made, and their trade offs. You'll learn how we had to revamp the whole ingestion layer, and how we actually made the primary faster than before when we added multi-threaded Write Ahead Logs to deal with data replication. I'll also discuss how we are leveraging object storage as a central part of the process. And of course, I'll show you a live demo of high-performance multi-region replication in action.
Amazon DocumentDB(MongoDB와 호환됨)는 빠르고 안정적이며 완전 관리형 데이터베이스 서비스입니다. Amazon DocumentDB를 사용하면 클라우드에서 MongoDB 호환 데이터베이스를 쉽게 설치, 운영 및 규모를 조정할 수 있습니다. Amazon DocumentDB를 사용하면 MongoDB에서 사용하는 것과 동일한 애플리케이션 코드를 실행하고 동일한 드라이버와 도구를 사용하는 것을 실습합니다.
LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time.
It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.
Amazon Aurora 클러스터를 초당 수백만 건의 쓰기 트랜잭션으로 확장하고 페타바이트 규모의 데이터를 관리할 수 있으며, 사용자 지정 애플리케이션 로직을 생성하거나 여러 데이터베이스를 관리할 필요 없이 Aurora에서 관계형 데이터베이스 워크로드를 단일 Aurora 라이터 인스턴스의 한도 이상으로 확장할 수 있는 Amazon Aurora Limitless Database를 소개합니다.
How does Apache Pegasus (incubating) community develop at SensorsData
1. How does Apache Pegasus (incubating)
community develop at SensorsData
Dan Wang & Yingchun Lai
2022.07.29
2. Outline
• Overview of Apache Pegasus
• Architecture, Data Model, User Interface, Performance, Important Features
• Why SensorsData chose Apache Pegasus?
• Evolution and Current Situation
• Contributions to Apache Pegasus by SensorsData
• Features, Improvements, Bugfixes
• What's going on in the Pegasus community?
• Development, New Release and Activities
3. Overview of Apache Pegasus
Architecture, Data Model, User Interface, Performance, Important Features
4. What is Pegasus?
Apache Pegasus is a horizontally scalable, strongly consistent and
high-performance key-value store
• C++ implemented
• Local persistent storage engine by RocksDB
• Strongly consistent by PacificA
• High performance
• Horizontally scalable
• Flexible data model
• Easy to use ecosystem tools
5. Architecture
MetaServer
• Cluster controller
• Configuration manager
• Doesn't store data on itself
ReplicaServer
• Data node
• Hash partition
• PacificA (strongly consistent)
• One RocksDB instance for each replica
ZooKeeper
• Meta server election
• Metadata storage
ClientLib
• Request routing table from MetaServer once
• Cache routing table
• Straightly interact with ReplicaServer for R/W requests
6. Data Model
SortKey
• Extend user's usage scenario
• Sorted in a specified HashKey
HashKey
• Decide which partition it belongs to
• hash(HashKey) % kPartitionCount → partition_id
Value
• User's data
7. User Interface
note: * means uncertain count
• Supported language: Java, C++, Go, Python, Node-js, Scala
• Multiple SortKeys under one HashKey can be atomically accessed
8. How to adapt to RocksDB
For one table in Pegasus
• The whole key space is hash split into N partitions
• Each partition has 3 replicas typically
• Distribute all these (3*N) replicas to M Replica Servers
• Load balance between Replica Servers in cluster
• Both for replicas and primary replicas
• Both consider replica count and disk space
• Load balance between data directories on Replica Server
• Same
• Each replica corresponding to a RocksDB instance
• How does Pegasus key-value map to RocksDB key-value?
9. How to adapt to RocksDB
RocksDB Key
• Length of HashKey: 2 bytes, for encoding and decoding key
• HashKey: variable length, defined by user
• SortKey: variable length, defined by user
RocksDB Value
• Value Header: 13 bytes
• Flag bit: 1 bit, always to be 1
• Data version: 7 bits
• Expire timestamp: 4 bytes, in seconds, since epoch
• Time tag: 8 bytes, designed for duplication
• Timestamp: 56 bits, in micro-seconds
• Cluster ID: 7 bits
• Deleted tag: 1 bit
• Value: variable length, defined by user
10. Performance
YCSB on Pegasus 2.3.0 (the latest release)
• CPU: 2.4 GHz, 24 cores
• Memory: 128 GB
• Disk: 480G SSD * 8
• Network card: 10 Gb/s
• 5 Replica Servers
• 64 partitions on test table
11. Important Features
Cold Backup
• Create checkpoint for a table
• Store data remotely on HDFS
• Restore table to the original or another cluster
Duplication
• Asynchronous duplicate
• To achieve high write throughput
• To tolerant high latency
• The two clusters can be deployed in
different regions
• Support pipeline duplication, multi-master
duplication, and master-master duplication
12. Important Features
Bulk Load
• Generate SST files from user's original data
• via Pegasus-Spark, in Pegasus rule
• Store generated SST files to HDFS
• Download SST files to Pegasus ReplicaServer
• Ingest SST files to RocksDB
• Reject client write while ingesting
• Provide read & write
Partition Split
• Divide one replica into two replicas
• Copy checkpoint and then duplicate WAL
• Register on Meta Server when new replicas are ready
• Reject client R/W request while registering
• Provide client R/W request
• GC redundant data that doesn't belong to the new partition
in the background
13. Important Features
Backup Request
• Only for read requests
• Usage scenarios:
• Load inbalanced
• Network problem
• Single point of failure
Hotkey Detection
• Detect bad designed user key
• Resolve single point of failure
caused by hotkey
14. Important Features
• Access control
• Authentication: Kerberos
• Authorization: table-level ACL
• Usage scenario option templates
• Set RocksDB options in table level
• Manual compaction
• Fast GC, fast sort
• Integration with BigData ecosystem
• HDFS: cold backup, bulk load
• Spark: bulk load, analysis on Hive
• MetaProxy
• Access unification
17. KV Evolution in SensorsData
• Distributed Redis (2016)
• Redis sentinel → Redis cluster
• Pros
• Scale out
• Cons
• Frequent OOM (several hundred million keys)
18. KV Evolution in SensorsData
• SSDB (2017)
• master-slave
• Pros
• Reduce memory consumption
• Compatible with redis, thus easy for migration
• Persistence
• Cons
• Cannot scale out
• Cannot afford to more data and more businesses (I/O
utilization is nearly 100%)
19. Introduce Apache Pegasus (2020)
• Scale out
• High Availability
• Strong consistency
• Persistence
• High performance
• Stability
• Tools for monitor and operations
• Support mget & mset
• Documents & community
• Cost for migration
20. Apache Pegasus in SensorsData
• Pegasus has been deployed on over 1300 clusters up to
now
• About 20 products have chosen Pegasus to store their
business data
22. Characteristics of Product Environment
• It’s difficult for operations on private clusters
• A large number of clusters
• Some clusters have to be operated on site
• Some clusters are very small
• Even single node
• The hardware configuration is not good
• Small memory
• HDD
• Multiple services are deployed on one node
• Have to limit resource usage, such as memory
23. New Functions
• Support single replica
• Connect Zookeeper secured with Kerberos
• Change the replication factor of each table
• Implement new system of metrics
Improve Memory Usage
• Limit RocksDB memory usage
• Support jemalloc
Refactor
• Merge sub-projects
Contributions
Optimize Performance
• Support batchGetByPartitions to improve batch get
• Use multi_set to speed copy_data up
Compatibility
• Support to build on MacOS
• Support to build and run on AArch64 architecture
Bugfixes
• Fix replica metadata lost risk on XFS after power
outage
• Fix message body size unset after parsed which
leads to large I/O throughputs
24. Change the Replication Factor
• Motivation
• Scale out, e.g., 1 → 3 or 2 → 3
• Migration
• Increase partitions offline
• Process
• Check new replication factor
• Update meta data asynchronously
• Missing/redundant replicas will be
added/dropped typically during several
seconds
• Clearing redundant data can be
launched by lively meta level
25. New System of Metrics
Perf-counter
• Verbose naming
• Overlapped metric types
• Unreasonable abstract interfaces
• Memory leak by outdated metrics
• Potential performance problems
New metrics
• Use labels to simplify naming
• Redefine metric types
• Clear outdated metrics after a
period of configurable time
• Improve performance
26. Framework
• Gauge: set/get, increment/decrement
• Counter: increment monotonically
• Percentile: P90/P95/P99/..., for a fixed window size
27. Performance
0
50
100
150
200
250
300
350
2 threads 4 threads 8 threads 16 threads
Latency of counters
(seconds, with 1 billion oeprations for each thread)
old counter new counter
0
10
20
30
40
50
60
70
80
10,000 operations 50,000 opeartions 100,000 operations
Latency of percentiles
(seconds, with window size 5000)
old percentile new percentile
New counter is based on long adder.
New percentile is based on nth_element
instead of median-of-medians selection.
32. jemalloc vs. tcmalloc
• Both memtables and index & filter blocks are capped by block cache
• rocksdb_block_cache_capacity=12GB
• rocksdb_total_size_across_write_buffer=8GB
37. What's going on in the Pegasus community?
Development, New Release and Activities
38. Development
• New metrics framework
• Higher performance and easy to use
• Enhance backup & restore
• Enhance duplication
• Enhance authorization
• Easy to use admin tools
• Use Go tools to replace C++ tools
• Refactor
• Support more CPU architectures
• x86, ARMs, Apple Silicon
• Support more operation systems
• Linux: RHEL/CentOS(6, 7, 8, 9), Ubuntu(16.04, 18.04, 20.04, 22.04)
• MacOS: 12.4
• Website & Documents
39. New Release
• Pegasus 2.4.0
• Performance improvement
• Refactor dual-WAL to single WAL
• New features
• Change table's replication factor
• Read request limiter
• Enhancement
• Bulk load
• Duplication
• Manual compaction
• API
• Add batchGetByPartitions()
• Tools
• admin-cli support more operations
40. Activities
• The 1st meetup held in Sep, 2021
• Planning to hold the 2nd meetup this autumn
• Online small meetings held unscheduled