There’s a lot of buzz around different DevOps tools being thrown around, and it can be difficult to break through the noise. We plan to share our success story of what to do/not to do while powering your software with the most acclaimed DevOps technologies. From provisioning clusters with Kubernetes to scaling the product for global user base; from Streaming live data using Kafka/Spark to consolidating it in Athena; from monitoring with Kibana to continuously integrating & deploying with CircleCI, we promise to you a smooth ride. Come hear our journey of moving a monolith to elastic infrastructure
This document summarizes Haitao Wang's experience working on streaming platforms at Alibaba and Microsoft. It describes Alibaba's data infrastructure challenges in handling large volumes of streaming data. It introduces Alibaba Blink, a distribution of Apache Flink that was developed to meet Alibaba's scale needs. Blink has achieved unprecedented throughput of 472 million events per second with latency of 10s of milliseconds. The document outlines improvements made in Blink's runtime, declarative SQL support, and use cases at Alibaba including real-time A/B testing, search index building, and online machine learning.
The Apache Kafka ecosystem is very rich with components and pieces that make for designing and implementing secure, efficient, fault-tolerant and scalable event stream processing (ESP) systems. Using real-world examples, this talk covers why Apache Kafka is an excellent choice for cloud-native and hybrid architectures, how to go about designing, implementing and maintaining ESP systems, best practices and patterns for migrating to the cloud or hybrid configurations, when to go with PaaS or IaaS, what options are available for running Kafka in cloud or hybrid environments and what you need to build and maintain successful ESP systems that are secure, performant, reliable, highly-available and scalable.
Managing Apache Kafka sometimes could be cumbersome, and that's something that we would like to avoid, especially for developers and data engineers that need to build and develop data pipelines. Luckily, Kubernetes and Kafka's combination helps us reduce everyday tasks tremendously by adding myriad capabilities to lessen the complexity of managing clusters. Kafka Connect and KSQLDB are a fantastic combo to add to your streaming stack. These two soldiers can facilitate data acquisition and processing and also provide outstanding real-time ETL capabilities. But what if you need an OLAP datastore to answer complex queries with a low-latency response, that's where Apache Pinot comes to play. At this session, you're going to learn: - Effective Kafka deployment on Kubernetes - How to properly configure Kafka Connect and KSQLDB - Integrate Apache Pinot to answer OLAP queries
In 2015, Google open sourced the core of their internal container clustering system under the name Kubernetes. Teams that previously relied upon IaaS and PaaS to run their applications quickly adopted Kubernetes instead. Today, only a few years later, Kubernetes is key to many companies and runs applications with literally billions of users. Kubernetes has become the de facto standard for deploying and running cloud native applications. We’ll give an overview of what Kubernetes is today and share our experiences from using Kubernetes in an ecormmerce and an IoT application. The future of Kubernetes could not look better. The Kubernetes ecosystem is growing, allowing to provision professionally managed databases directly within the cluster, running functions in a serverless-fashion, and even allowing us to host the code, the build pipeline and the application itself on Kubernetes. In the future, there might be only one Kubernetes to rule them all.
You have billions of events in your fact table, all of it waiting to be visualized. Enter Tableau… but wait: how can you ensure scalability and speed with your data in Amazon S3, Spark, Amazon Redshift, or Presto? In this talk, you’ll hear how Albert Wong and Srikanth Devidi at Netflix use Tableau on top of their big data stack. Albert and Srikanth also show how you can get the most out of a massive dataset using Tableau, and help guide you through the problems you may encounter along the way. Session sponsored by Tableau. AWS Competency Partner
This document discusses the transition to modern enterprise applications using containers, microservices, and big data technologies. It outlines how the Datacenter Operating System (DC/OS) provides a platform for building, running, and managing modern apps at scale. DC/OS abstracts infrastructure and provides platform services to simplify developing and operating distributed apps across a datacenter. It allows organizations to innovate faster by accelerating development and deployment of new services.
In this presentation, Steven Laan, Product Owner and Advanced Real-Time Analytics Dev Engineer, ING Group talks about the Why, What, and How of real time transaction forecasting. Topics include: visual end product, architecture landscape, actor system solution and a bit of ING Way of Working.
This document discusses strategies for migrating monolithic applications to the cloud using the strangler pattern. It begins with an overview of the strangler pattern, which involves gradually building a new system around the edges of an existing monolith. It then provides examples of how to implement the strangler pattern on AWS by hosting the existing application, adding facades with API Gateway, detecting hot spots with X-Ray, replacing hot spots with Lambda functions, and iteratively strangulating more of the monolith over time until it is retired. The document emphasizes that this incremental approach allows migrating applications at a lower cost and risk compared to full rewrites.
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal. In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
One of the key metrics to monitor when working with Apache Kafka, as a data pipeline or a streaming platform, is Consumer Groups Lag. Lag is the delta between the last produced message and the last committed message of a partition. In other words, lag indicates how far behind your application is in processing up-to-date information. For a long time, we used our own service to keep track of these metrics, collect them and visualize them. But this didn’t scale well. You had to perform many manual operations, redeploy it and to do other tedious manual tasks, but most importantly, the biggest gap for us, was that its output was represented in absolute numbers (e.g - your lag is 30K), which basically tells you nothing as a human being. We understood that we had to find a more suitable solution that will give us better visibility and will allow us to measure the lag in a time-based format that we all understand. In this talk, I’m going to go over the core concepts of Kafka offsets and lags, and explain why lag even matters and is an important KPI to measure. I’ll also talk about the kind of research we did to find the right tool, what the options in the market were at the time, and eventually why we chose Linkedin’s Burrow as the right tool for us. And finally, I’ll take a closer look at Burrow, its building blocks, how we build and deploy it, how we monitor better with it, and eventually the most important improvement - how we transformed its output from numbers to time-based metrics.
Apache Kafka is the de facto standard for real-time event streaming, but what do you do if you want to perform user-facing, ad-hoc, real-time analytics too? That's a hard problem. Apache Pinot solves it, and the two together are like chocolate and peanut butter, peaches and cream, and Steve Rogers and Peggy Carter. Come to this talk for an introduction to Pinot and an overview of how the Pinot Kafka Connector works. Hear the challenges unique to a user-facing realtime analytics system, and how Pinot and Kafka work harmoniously to solve them. Witness an action-packed demo, showing just how easy it is to go from events to blazing-fast analytics, and how to use powerful features of both systems that help you do this at scale.
This document summarizes a presentation about managing Kafka clusters at scale. It discusses how AppsFlyer migrated from a monolithic Kafka deployment to multiple clusters for different teams. It then outlines challenges faced like traffic surges and mixed Kafka protocol versions. Solutions discussed include improving infrastructure, adding visibility tools, creating automation and APIs for management, and implementing sleep-driven design principles to reduce developer fatigue. The presentation concludes by discussing future goals like auto-scaling clusters.
Understanding Apache Kafka® Latency at Scale, Pere Urbon Bayes, Solutions Architect, Confluent Meetup Link: https://www.meetup.com/Mexico-Kafka/events/282390919/
Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services. "Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal
Simon Aubury gave a presentation on using ksqlDB for various enterprise workloads. He discussed four use cases: 1) streaming ETL to analyze web traffic data, 2) data enrichment to identify customers impacted by a storm, 3) measurement and audit to verify new system loads, and 4) data transformation to quickly fix data issues. For each use case, he described how to develop pipelines and applications in ksqlDB to address the business needs in a scalable and failure-resistant manner. Overall, he advocated for understanding when ksqlDB is appropriate to use and planning systems accordingly.
This document compares different approaches for performing zero downtime upgrades of applications hosted on Microsoft Azure: Web Deploy, VIP-swap, load balanced endpoints, and Traffic Manager. Web Deploy allows automatic updates of web roles with minor changes but requires an RDP connection. VIP-swap uses DNS swapping to test upgrades on a staging environment with fast redirection. Load balanced endpoints provides easy scaling but requires manual upgrades and running multiple versions simultaneously. Traffic Manager also uses DNS for isolated testing and fast redirection between environments, but incurs additional costs.
This is a story about what happens when a distributed system becomes a big part of a small team's infrastructure. This distributed system was Kafka and the team size was one engineer. I will discuss my failures along with my journey of deploying Kafka at scale with very little prior distributed systems experience. In this presentation, we will discuss how unique insights in the following organization culture, engineering and metrics created tailwinds and headwinds. This presentation will be a tactical approach to conquering a complex system with an understaffed team while your business is growing fast. I will discuss how the use case and resilience requirements for our Kafka cluster change as the user base grew from 100K users to over 6 million.
See what's new in #Serverless and #Data at GCP. Our guest, Guillaume Blaquiere - Stack Overflow contributor & #GCP #Developer Expert from France, covered the best #GoogleCloudNext announcements, practically demoed how to benefit from #BigQuery Remote Functions and answered many questions. The meetup recording with TOC for easy navigation is at https://youtu.be/AuZZTwHIcdY P.S. For more interactive lectures like this, go to http://youtube.serverlesstoronto.org/ or sign up for our upcoming live events at https://www.meetup.com/Serverless-Toronto/events/
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can! We’re facing the challenge of migrating hundreds of JEE legacy applications of a German blue chip company onto a Kubernetes cluster within one year. The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way.
Open Source Summit 2018, Vancouver (Canada): Talk by Josef Adersberger (@adersberger, CTO at QAware), Michael Frank (Software Architect at QAware) and Robert Bichler (IT Project Manager at Allianz Germany) Abstract: Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud-native apps. But what to do if you’ve no shiny new cloud-native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can! We’re facing the challenge of migrating hundreds of JEE legacy applications of a German blue chip company onto a Kubernetes cluster within one year. The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way.
Implement a Universal Data Distribution Architecture to Manage All Streaming Data Cloudera Partner SkillUp Tim Spann Principal Developer Advocate in Data In Motion for Cloudera tspann@cloudera.com using apache nifi, apache kafka and apache flink in a hybrid environment cloudera dataflow cloudera streams messaging manager cloudera sql streams builder
This presentation is geared toward enterprise architects and senior IT leaders looking to drive more value from their data by learning about cloud data lake management. As businesses focus on leveraging big data to drive digital transformation, technology leaders are struggling to keep pace with the high volume of data coming in at high speed and rapidly evolving technologies. What's needed is an approach that helps you turn petabytes into profit. Cloud data lakes and cloud data warehouses have emerged as a popular architectural pattern to support next-generation analytics. Informatica's comprehensive AI-driven cloud data lake management solution natively ingests, streams, integrates, cleanses, governs, protects and processes big data workloads in multi-cloud environments. Please leave any questions or comments below.
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
FINRA’s Data Lake unlocks the value in its data to accelerate analytics and machine learning at scale. FINRA's Technology group has changed its customer's relationship with data by creating a Managed Data Lake that enables discovery on Petabytes of capital markets data, while saving time and money over traditional analytics solutions. FINRA’s Managed Data Lake includes a centralized data catalog and separates storage from compute, allowing users to query from petabytes of data in seconds. Learn how FINRA uses Spot instances and services such as Amazon S3, Amazon EMR, Amazon Redshift, and AWS Lambda to provide the 'right tool for the right job' at each step in the data processing pipeline. All of this is done while meeting FINRA’s security and compliance responsibilities as a financial regulator.
This document provides an overview of the Confluent streaming platform and Apache Kafka. It discusses how streaming platforms can be used to publish, subscribe and process streams of data in real-time. It also highlights challenges with traditional architectures and how the Confluent platform addresses them by allowing data to be ingested from many sources and processed using stream processing APIs. The document also summarizes key components of the Confluent platform like Kafka Connect for streaming data between systems, the Schema Registry for ensuring compatibility, and Control Center for monitoring the platform.
This document discusses strategies for modernizing applications and moving workloads to Kubernetes and container platforms like Pivotal Container Service (PKS). It recommends identifying candidate applications using buckets based on factors like programming language, dependencies, and access to source code. It outlines assessing applications' business value and technical quality using Gartner's TIME methodology to prioritize efforts. The document provides an overview of PKS and how it can provide benefits like increased speed, stability, scalability and cost savings. It recommends starting projects by pushing a few applications to production on PKS to measure ROI metrics.
This document discusses strategies for modernizing applications and moving workloads to Kubernetes and container platforms like Pivotal Container Service (PKS). It recommends identifying candidate applications using buckets based on factors like programming language, dependencies, and access to source code. It outlines assessing applications' business value and technical quality using Gartner's TIME methodology to prioritize efforts. The document provides an overview of PKS and how it can provide benefits like increased speed, security, scalability and cost savings. It recommends starting projects by pushing a few applications to production on PKS to measure ROI metrics.