Session Recording on Youtube
https://www.youtube.com/watch?v=uWPZQ_HMy10
- Session Description
Do you find yourself bombarded with buzzwords and overwhelmed by the rapid emergence of new technologies? "Stream Processing" is a tech buzzword that has been around for some time but is still unfamiliar to many. Join this session to discover its potential in software systems. I will share insights from Apache Flink, Apache Beam, Google Dataflow, and my experiences at Bol.com (the biggest e-commerce platform in the Netherlands) as we cover:
- Stream Processing overview: main concepts and features
- Apache Beam vs. Spring Boot comparison
- Key Considerations for Using Stream Processing
- Learning strategies to navigate this evolving landscape.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how data architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
This document discusses activating data governance using a data catalog. It compares active vs passive data governance, with active embedding governance into people's work through a catalog. The catalog plays a key role by allowing stewards to document definition, production, and usage of data in a centralized place. For governance to be effective, metadata from various sources must be consolidated and maintained in the catalog.
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
This document provides an agenda and overview for the "Big MDM Part 2" meetup event. The agenda includes presentations on using graph databases for master data management (MDM) and relationship management. Speakers from Caserta Concepts, Neo Technology, and Pitney Bowes will discuss graph databases, MDM use cases, and modeling and managing data with graph databases. The meetup is sponsored by Caserta Concepts and hosted by Neo Technology. It will include networking, five presentations on graph databases and MDM topics, and a Q&A session.
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Watch this talk here: https://www.confluent.io/online-talks/bridge-to-cloud-apache-kafka-migrate-gcp
Most companies start their cloud journey with a new use case, or a new application. Sometimes these applications can run independently in the cloud, but often times they need data from the on premises datacenter. Existing applications will slowly migrate, but will need a strategy and the technology to enable a multi-year migration.
In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka® service, to migrate to Google Cloud Platform. By implementing a central-pipeline architecture using Apache Kafka to sync on-prem and cloud deployments, companies can accelerate migration times and reduce costs.
Register now to learn:
-How to take the first step in migrating to GCP
-How to reliably sync your on premises applications using a persistent bridge to cloud
-How Confluent Cloud can make this daunting task simple, reliable and performant
This document discusses change data capture (CDC) and its components. CDC is an approach that identifies, captures, and delivers changes made to enterprise data sources. It feeds these changes into a central data stream that can be combined with other data sources in real-time. The document outlines Kafka Connect, Debezium, Schema Registry, and Apache Avro which are key parts of the CDC architecture. It also discusses future steps like supporting additional databases and improving deployment, as well as open issues around performance and compatibility with certain databases.
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Microservices became the new black in enterprise architectures. APIs provide functions to other applications or end users. Even if your architecture uses another pattern than microservices, like SOA (Service-Oriented Architecture) or Client-Server communication, APIs are used between the different applications and end users.
Apache Kafka plays a key role in modern microservice architectures to build open, scalable, flexible and decoupled real time applications. API Management complements Kafka by providing a way to implement and govern the full life cycle of the APIs.
This session explores how event streaming with Apache Kafka and API Management (including API Gateway and Service Mesh technologies) complement and compete with each other depending on the use case and point of view of the project team. The session concludes exploring the vision of event streaming APIs instead of RPC calls.
Understand how event streaming with Kafka and Confluent complements tools and frameworks such as Kong, Mulesoft, Apigee, Envoy, Istio, Linkerd, Software AG, TIBCO Mashery, IBM, Axway, etc.
A Streaming API Data Exchangeprovides streaming replication between business units and companies. API Management with REST/HTTP is not appropriate for streaming data.
Nubank is the leading fintech in Latin America. Using bleeding-edge technology, design, and data, the company aims to fight complexity and empower people to take control of their finances. We are disrupting an outdated and bureaucratic system by building a simple, safe and 100% digital environment.
In order to succeed, we need to constantly make better decisions in the speed of insight, and that’s what We aim when building Nubank’s Data Platform. In this talk we want to explore and share the guiding principles and how we created an automated, scalable, declarative and self-service platform that has more than 200 contributors, mostly non-technical, to build 8 thousand distinct datasets, ingesting data from 800 databases, leveraging Apache Spark expressiveness and scalability.
The topics we want to explore are:
– Making data-ingestion a no-brainer when creating new services
– Reducing the cycle time to deploy new Datasets and Machine Learning models to production
– Closing the loop and leverage knowledge processed in the analytical environment to take decisions in production
– Providing the perfect level of abstraction to users
You will get from this talk:
– Our love for ‘The Log’ and how we use it to decouple databases from its schema and distribute the work to keep schemas up to date to the entire team.
– How we made data ingestion so simple using Kafka Streams that teams stopped using databases for analytical data.
– The huge benefits of relying on the DataFrame API to create datasets which made possible having tests end-to-end verifying that the 8000 datasets work without even running a Spark Job and much more.
– The importance of creating the right amount of abstractions and restrictions to have the power to optimize.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Architecture Best Practices for Advanced Analytics
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
Andreas Grabner maintains that most performance and scalability problems don’t need a large or long running performance test or the expertise of a performance engineering guru. Don’t let anybody tell you that performance is too hard to practice because it actually is not. You can take the initiative and find these often serious defects. Andreas analyzed and spotted the performance and scalability issues in more than 200 applications last year. He shares his performance testing approaches and explores the top problem patterns that you can learn to spot in your apps. By looking at key metrics found in log files and performance monitoring data, you will learn to identify most problems with a single functional test and a simple five-user load test. The problem patterns Andreas explains are applicable to any type of technology and platform. Try out your new skills in your current testing project and take the first step toward becoming a performance diagnostic hero.
This session takes an in-depth look at:
- Trends in stream processing
- How streaming SQL has become a standard
- The advantages of Streaming SQL
- Ease of development with streaming SQL: Graphical and Streaming SQL query editors
- Business value of streaming SQL and its related tools: Domain-specific UIs
- Scalable deployment of streaming SQL: Distributed processing
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Spring and Pivotal Application Service - SpringOne Tour - Boston
This document discusses Spring and Pivotal Application Service (PAS). It notes that PAS provides market-leading support for Spring technologies and an ecosystem of services for Spring applications. It covers why developers use Spring and PAS, how PAS supports Spring features like Boot, Security, and Cloud, and the services available on PAS like MySQL, RabbitMQ, and Redis. It concludes with next steps around contacting an account team, trying hosted PAS software, and signing up for roadmap calls.
Top Java Performance Problems and Metrics To Check in Your Pipeline
Why is Performance Important? What are the most common reasons applications dont scale and perform well. Which technical metrics to look at. How to check it automated in the pipeline
Presentation on complete Datasmith warehousing solutions offering, including Voice technology, middleware solutions, WMS (Warehouse Management System) and mobile store delivery application.
How fluentd fits into the modern software landscape
The document discusses using Fluentd to manage logs. It provides an overview of Fluentd, including how it can aggregate and route logs from multiple sources to various outputs like Elasticsearch. It also discusses approaches to scaling Fluentd in distributed environments like Kubernetes, including using sidecars. Real-world challenges with log management are addressed, such as the need to consolidate logs from many distributed services and support multiple analytics tools.
Running in the Cloud - First Belgian Azure project
The document discusses how ChronoRace, a company that provides timing services for sports events, migrated their infrastructure to Windows Azure to handle unpredictable traffic bursts during large events. Key aspects covered include identifying current infrastructure limitations, migrating the VS2003 website and SQL database to Azure, implementing auto-scaling functionality, and addressing issues with video streaming and PDF generation. The migration allowed ChronoRace to scale their infrastructure as needed for events while reducing monthly costs compared to their previous setup.
Running in the Cloud - First Belgian Azure project
The document discusses how ChronoRace, a company that provides timing services for sports events, migrated their infrastructure to Windows Azure to handle unpredictable traffic bursts during large events. Key points covered include identifying pitfalls of their current on-premise solution, migrating their website and database to Azure, implementing auto-scaling to dynamically scale resources during events, and testing the Azure-based solution at an upcoming large event. The migration overall was successful in addressing ChronoRace's needs, though one component requiring registry access could not be migrated and remains on-premise.
StreamSets Data Collector is an open source data integration tool that can ingest data from various sources in both batch and streaming modes. It uses a record-oriented approach to data processing which avoids issues caused by combinatorial explosion. Pipelines can be developed visually using an IDE interface, allowing non-technical users to build integrations. StreamSets originated from ex-Cloudera and Informatica employees and focuses on continuous open source development.
Presenter: Kenn Knowles, Software Engineer, Google & Apache Beam (incubating) PPMC member
Apache Beam (incubating) is a programming model and library for unified batch & streaming big data processing. This talk will cover the Beam programming model broadly, including its origin story and vision for the future. We will dig into how Beam separates concerns for authors of streaming data processing pipelines, isolating what you want to compute from where your data is distributed in time and when you want to produce output. Time permitting, we might dive deeper into what goes into building a Beam runner, for example atop Apache Apex.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdf
What do fleet managers do? What are their duties, responsibilities, and challenges? And what makes a fleet manager effective and successful? This blog answers all these questions.
Are you wondering how to migrate to the Cloud? At the ITB session, we addressed the challenge of managing multiple ColdFusion licenses and AWS EC2 instances. Discover how you can consolidate with just one EC2 instance capable of running over 50 apps using CommandBox ColdFusion. This solution supports both ColdFusion flavors and includes cb-websites, a GoLang binary for managing CommandBox websites.
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...
Unlock the full potential of mobile monitoring with ONEMONITAR. Our advanced and discreet app offers a comprehensive suite of features, including hidden call recording, real-time GPS tracking, message monitoring, and much more.
Perfect for parents, employers, and anyone needing a reliable solution, ONEMONITAR ensures you stay informed and in control. Explore the key features of ONEMONITAR and see why it’s the trusted choice for Android device monitoring.
Share this infographic to spread the word about the ultimate mobile spy app!
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Unlock the full potential of your data by effortlessly migrating from PostgreSQL to Snowflake, the leading cloud data warehouse. This comprehensive guide presents an easy-to-follow 8-step process using Estuary Flow, an open-source data operations platform designed to simplify data pipelines.
Discover how to seamlessly transfer your PostgreSQL data to Snowflake, leveraging Estuary Flow's intuitive interface and powerful real-time replication capabilities. Harness the power of both platforms to create a robust data ecosystem that drives business intelligence, analytics, and data-driven decision-making.
Key Takeaways:
1. Effortless Migration: Learn how to migrate your PostgreSQL data to Snowflake in 8 simple steps, even with limited technical expertise.
2. Real-Time Insights: Achieve near-instantaneous data syncing for up-to-the-minute analytics and reporting.
3. Cost-Effective Solution: Lower your total cost of ownership (TCO) with Estuary Flow's efficient and scalable architecture.
4. Seamless Integration: Combine the strengths of PostgreSQL's transactional power with Snowflake's cloud-native scalability and data warehousing features.
Don't miss out on this opportunity to unlock the full potential of your data. Read & Download this comprehensive guide now and embark on a seamless data journey from PostgreSQL to Snowflake with Estuary Flow!
Try it Free: https://dashboard.estuary.dev/register
Cultural Shifts: Embracing DevOps for Organizational Transformation
Mindfire Solutions specializes in DevOps services, facilitating digital transformation through streamlined software development and operational efficiency. Their expertise enhances collaboration, accelerates delivery cycles, and ensures scalability using cloud-native technologies. Mindfire Solutions empowers businesses to innovate rapidly and maintain competitive advantage in dynamic market landscapes.
This document discusses challenges with centralized data architectures and proposes a data mesh approach. It outlines 4 challenges: 1) centralized teams fail to scale sources and consumers, 2) point-to-point data sharing is difficult to decouple, 3) bridging operational and analytical systems is complex, and 4) legacy data stacks rely on outdated paradigms. The document then proposes a data mesh architecture with domain data as products and an operational data platform to address these challenges by decentralizing control and improving data sharing, discovery, and governance.
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
[Pcamp19] - Escalando o uso de dados no Nubank - André Tavares | NubankProduct Camp Brasil
Nubank uses its data platform to power automated decisions and data-driven products. The platform sources data from various systems and refines it into datasets using data tools. These datasets then power credit decisions, customer support automations, and other systems. The platform also supports internal users through business intelligence tools. As data usage grows, the platform must scale to handle larger volumes, more systems/users, and new use cases. Product managers play a key role in ensuring the success of both consumer-facing products and internal platforms and tools.
Data Lakehouse Symposium | Day 1 | Part 2Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how data architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Activate Data Governance Using the Data CatalogDATAVERSITY
This document discusses activating data governance using a data catalog. It compares active vs passive data governance, with active embedding governance into people's work through a catalog. The catalog plays a key role by allowing stewards to document definition, production, and usage of data in a centralized place. For governance to be effective, metadata from various sources must be consolidated and maintained in the catalog.
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementCaserta
This document provides an agenda and overview for the "Big MDM Part 2" meetup event. The agenda includes presentations on using graph databases for master data management (MDM) and relationship management. Speakers from Caserta Concepts, Neo Technology, and Pitney Bowes will discuss graph databases, MDM use cases, and modeling and managing data with graph databases. The meetup is sponsored by Caserta Concepts and hosted by Neo Technology. It will include networking, five presentations on graph databases and MDM topics, and a Q&A session.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
Watch this talk here: https://www.confluent.io/online-talks/bridge-to-cloud-apache-kafka-migrate-gcp
Most companies start their cloud journey with a new use case, or a new application. Sometimes these applications can run independently in the cloud, but often times they need data from the on premises datacenter. Existing applications will slowly migrate, but will need a strategy and the technology to enable a multi-year migration.
In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka® service, to migrate to Google Cloud Platform. By implementing a central-pipeline architecture using Apache Kafka to sync on-prem and cloud deployments, companies can accelerate migration times and reduce costs.
Register now to learn:
-How to take the first step in migrating to GCP
-How to reliably sync your on premises applications using a persistent bridge to cloud
-How Confluent Cloud can make this daunting task simple, reliable and performant
This document discusses change data capture (CDC) and its components. CDC is an approach that identifies, captures, and delivers changes made to enterprise data sources. It feeds these changes into a central data stream that can be combined with other data sources in real-time. The document outlines Kafka Connect, Debezium, Schema Registry, and Apache Avro which are key parts of the CDC architecture. It also discusses future steps like supporting additional databases and improving deployment, as well as open issues around performance and compatibility with certain databases.
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Kai Wähner
Microservices became the new black in enterprise architectures. APIs provide functions to other applications or end users. Even if your architecture uses another pattern than microservices, like SOA (Service-Oriented Architecture) or Client-Server communication, APIs are used between the different applications and end users.
Apache Kafka plays a key role in modern microservice architectures to build open, scalable, flexible and decoupled real time applications. API Management complements Kafka by providing a way to implement and govern the full life cycle of the APIs.
This session explores how event streaming with Apache Kafka and API Management (including API Gateway and Service Mesh technologies) complement and compete with each other depending on the use case and point of view of the project team. The session concludes exploring the vision of event streaming APIs instead of RPC calls.
Understand how event streaming with Kafka and Confluent complements tools and frameworks such as Kong, Mulesoft, Apigee, Envoy, Istio, Linkerd, Software AG, TIBCO Mashery, IBM, Axway, etc.
A Streaming API Data Exchangeprovides streaming replication between business units and companies. API Management with REST/HTTP is not appropriate for streaming data.
Nubank is the leading fintech in Latin America. Using bleeding-edge technology, design, and data, the company aims to fight complexity and empower people to take control of their finances. We are disrupting an outdated and bureaucratic system by building a simple, safe and 100% digital environment.
In order to succeed, we need to constantly make better decisions in the speed of insight, and that’s what We aim when building Nubank’s Data Platform. In this talk we want to explore and share the guiding principles and how we created an automated, scalable, declarative and self-service platform that has more than 200 contributors, mostly non-technical, to build 8 thousand distinct datasets, ingesting data from 800 databases, leveraging Apache Spark expressiveness and scalability.
The topics we want to explore are:
– Making data-ingestion a no-brainer when creating new services
– Reducing the cycle time to deploy new Datasets and Machine Learning models to production
– Closing the loop and leverage knowledge processed in the analytical environment to take decisions in production
– Providing the perfect level of abstraction to users
You will get from this talk:
– Our love for ‘The Log’ and how we use it to decouple databases from its schema and distribute the work to keep schemas up to date to the entire team.
– How we made data ingestion so simple using Kafka Streams that teams stopped using databases for analytical data.
– The huge benefits of relying on the DataFrame API to create datasets which made possible having tests end-to-end verifying that the 8000 datasets work without even running a Spark Job and much more.
– The importance of creating the right amount of abstractions and restrictions to have the power to optimize.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
Andreas Grabner maintains that most performance and scalability problems don’t need a large or long running performance test or the expertise of a performance engineering guru. Don’t let anybody tell you that performance is too hard to practice because it actually is not. You can take the initiative and find these often serious defects. Andreas analyzed and spotted the performance and scalability issues in more than 200 applications last year. He shares his performance testing approaches and explores the top problem patterns that you can learn to spot in your apps. By looking at key metrics found in log files and performance monitoring data, you will learn to identify most problems with a single functional test and a simple five-user load test. The problem patterns Andreas explains are applicable to any type of technology and platform. Try out your new skills in your current testing project and take the first step toward becoming a performance diagnostic hero.
This session takes an in-depth look at:
- Trends in stream processing
- How streaming SQL has become a standard
- The advantages of Streaming SQL
- Ease of development with streaming SQL: Graphical and Streaming SQL query editors
- Business value of streaming SQL and its related tools: Domain-specific UIs
- Scalable deployment of streaming SQL: Distributed processing
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
This document discusses Spring and Pivotal Application Service (PAS). It notes that PAS provides market-leading support for Spring technologies and an ecosystem of services for Spring applications. It covers why developers use Spring and PAS, how PAS supports Spring features like Boot, Security, and Cloud, and the services available on PAS like MySQL, RabbitMQ, and Redis. It concludes with next steps around contacting an account team, trying hosted PAS software, and signing up for roadmap calls.
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
Why is Performance Important? What are the most common reasons applications dont scale and perform well. Which technical metrics to look at. How to check it automated in the pipeline
Presentation on complete Datasmith warehousing solutions offering, including Voice technology, middleware solutions, WMS (Warehouse Management System) and mobile store delivery application.
How fluentd fits into the modern software landscapePhil Wilkins
The document discusses using Fluentd to manage logs. It provides an overview of Fluentd, including how it can aggregate and route logs from multiple sources to various outputs like Elasticsearch. It also discusses approaches to scaling Fluentd in distributed environments like Kubernetes, including using sidecars. Real-world challenges with log management are addressed, such as the need to consolidate logs from many distributed services and support multiple analytics tools.
Running in the Cloud - First Belgian Azure projectMaarten Balliauw
The document discusses how ChronoRace, a company that provides timing services for sports events, migrated their infrastructure to Windows Azure to handle unpredictable traffic bursts during large events. Key aspects covered include identifying current infrastructure limitations, migrating the VS2003 website and SQL database to Azure, implementing auto-scaling functionality, and addressing issues with video streaming and PDF generation. The migration allowed ChronoRace to scale their infrastructure as needed for events while reducing monthly costs compared to their previous setup.
Running in the Cloud - First Belgian Azure projectMaarten Balliauw
The document discusses how ChronoRace, a company that provides timing services for sports events, migrated their infrastructure to Windows Azure to handle unpredictable traffic bursts during large events. Key points covered include identifying pitfalls of their current on-premise solution, migrating their website and database to Azure, implementing auto-scaling to dynamically scale resources during events, and testing the Azure-based solution at an upcoming large event. The migration overall was successful in addressing ChronoRace's needs, though one component requiring registry access could not be migrated and remains on-premise.
Data Ingestion in Big Data and IoT platformsGuido Schmutz
StreamSets Data Collector is an open source data integration tool that can ingest data from various sources in both batch and streaming modes. It uses a record-oriented approach to data processing which avoids issues caused by combinatorial explosion. Pipelines can be developed visually using an IDE interface, allowing non-technical users to build integrations. StreamSets originated from ex-Cloudera and Informatica employees and focuses on continuous open source development.
Presenter: Kenn Knowles, Software Engineer, Google & Apache Beam (incubating) PPMC member
Apache Beam (incubating) is a programming model and library for unified batch & streaming big data processing. This talk will cover the Beam programming model broadly, including its origin story and vision for the future. We will dig into how Beam separates concerns for authors of streaming data processing pipelines, isolating what you want to compute from where your data is distributed in time and when you want to produce output. Time permitting, we might dive deeper into what goes into building a Beam runner, for example atop Apache Apex.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Similar to Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023 (20)
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdfTrackobit
What do fleet managers do? What are their duties, responsibilities, and challenges? And what makes a fleet manager effective and successful? This blog answers all these questions.
Are you wondering how to migrate to the Cloud? At the ITB session, we addressed the challenge of managing multiple ColdFusion licenses and AWS EC2 instances. Discover how you can consolidate with just one EC2 instance capable of running over 50 apps using CommandBox ColdFusion. This solution supports both ColdFusion flavors and includes cb-websites, a GoLang binary for managing CommandBox websites.
Discover the Power of ONEMONITAR: The Ultimate Mobile Spy App for Android Dev...onemonitarsoftware
Unlock the full potential of mobile monitoring with ONEMONITAR. Our advanced and discreet app offers a comprehensive suite of features, including hidden call recording, real-time GPS tracking, message monitoring, and much more.
Perfect for parents, employers, and anyone needing a reliable solution, ONEMONITAR ensures you stay informed and in control. Explore the key features of ONEMONITAR and see why it’s the trusted choice for Android device monitoring.
Share this infographic to spread the word about the ultimate mobile spy app!
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsEstuary Flow
Unlock the full potential of your data by effortlessly migrating from PostgreSQL to Snowflake, the leading cloud data warehouse. This comprehensive guide presents an easy-to-follow 8-step process using Estuary Flow, an open-source data operations platform designed to simplify data pipelines.
Discover how to seamlessly transfer your PostgreSQL data to Snowflake, leveraging Estuary Flow's intuitive interface and powerful real-time replication capabilities. Harness the power of both platforms to create a robust data ecosystem that drives business intelligence, analytics, and data-driven decision-making.
Key Takeaways:
1. Effortless Migration: Learn how to migrate your PostgreSQL data to Snowflake in 8 simple steps, even with limited technical expertise.
2. Real-Time Insights: Achieve near-instantaneous data syncing for up-to-the-minute analytics and reporting.
3. Cost-Effective Solution: Lower your total cost of ownership (TCO) with Estuary Flow's efficient and scalable architecture.
4. Seamless Integration: Combine the strengths of PostgreSQL's transactional power with Snowflake's cloud-native scalability and data warehousing features.
Don't miss out on this opportunity to unlock the full potential of your data. Read & Download this comprehensive guide now and embark on a seamless data journey from PostgreSQL to Snowflake with Estuary Flow!
Try it Free: https://dashboard.estuary.dev/register
Cultural Shifts: Embracing DevOps for Organizational TransformationMindfire Solution
Mindfire Solutions specializes in DevOps services, facilitating digital transformation through streamlined software development and operational efficiency. Their expertise enhances collaboration, accelerates delivery cycles, and ensures scalability using cloud-native technologies. Mindfire Solutions empowers businesses to innovate rapidly and maintain competitive advantage in dynamic market landscapes.
introduction of Ansys software and basic and advance knowledge of modelling s...sachin chaurasia
Ansys Mechanical enables you to solve complex structural engineering problems and make better, faster design decisions. With the finite element analysis (FEA) solvers available in the suite, you can customize and automate solutions for your structural mechanics problems and parameterize them to analyze multiple design scenarios. Ansys Mechanical is a dynamic tool that has a complete range of analysis tools.
Break data silos with real-time connectivity using Confluent Cloud Connectorsconfluent
Connectors integrate Apache Kafka® with external data systems, enabling you to move away from a brittle spaghetti architecture to one that is more streamlined, secure, and future-proof. However, if your team still spends multiple dev cycles building and managing connectors using just open source Kafka Connect, it’s time to consider a faster and cost-effective alternative.
A captivating AI chatbot PowerPoint presentation is made with a striking backdrop in order to attract a wider audience. Select this template featuring several AI chatbot visuals to boost audience engagement and spontaneity. With the aid of this multi-colored template, you may make a compelling presentation and get extra bonuses. To easily elucidate your ideas, choose a typeface with vibrant colors. You can include your data regarding utilizing the chatbot methodology to the remaining half of the template.
Software development... for all? (keynote at ICSOFT'2024)miso_uam
Our world runs on software. It governs all major aspects of our life. It is an enabler for research and innovation, and is critical for business competitivity. Traditional software engineering techniques have achieved high effectiveness, but still may fall short on delivering software at the accelerated pace and with the increasing quality that future scenarios will require.
To attack this issue, some software paradigms raise the automation of software development via higher levels of abstraction through domain-specific languages (e.g., in model-driven engineering) and empowering non-professional developers with the possibility to build their own software (e.g., in low-code development approaches). In a software-demanding world, this is an attractive possibility, and perhaps -- paraphrasing Andy Warhol -- "in the future, everyone will be a developer for 15 minutes". However, to make this possible, methods are required to tweak languages to their context of use (crucial given the diversity of backgrounds and purposes), and the assistance to developers throughout the development process (especially critical for non-professionals).
In this keynote talk at ICSOFT'2024 I presented enabling techniques for this vision, supporting the creation of families of domain-specific languages, their adaptation to the usage context; and the augmentation of low-code environments with assistants and recommender systems to guide developers (professional or not) in the development process.
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Softwares
NBFC Software: Optimize Your Non-Banking Financial Company
Enhance Your Financial Services with Comprehensive NBFC Software
NBFC software provides a complete solution for non-banking financial companies, streamlining banking and accounting functions to reduce operational costs. Our software is designed to meet the diverse needs of NBFCs, including investment banks, insurance companies, and hedge funds.
Key Features of NBFC Software:
Centralized Database: Facilitates inter-branch collaboration and smooth operations with a unified platform.
Automation: Simplifies loan lifecycle management and account maintenance, ensuring efficient delivery of financial services.
Customization: Highly customizable to fit specific business needs, offering flexibility in managing various loan types such as home loans, mortgage loans, personal loans, and more.
Security: Ensures safe and secure handling of financial transactions and sensitive data.
User-Friendly Interface: Designed to be intuitive and easy to use, reducing the learning curve for employees.
Cost-Effective: Reduces the need for additional manpower by automating tasks, making it a budget-friendly solution. Benefits of NBFC Software:
Go Paperless: Transition to a fully digital operation, eliminating offline work.
Transparency: Enables managers and executives to monitor various points of the banking process easily.
Defaulter Tracking: Helps track loan defaulters, maintaining a healthy loan management system.
Increased Accessibility: Cutting-edge technology increases the accessibility and usability of NBFC operations. Request a Demo Now!
Ansys Mechanical enables you to solve complex structural engineering problems and make better, faster design decisions. With the finite element analysis (FEA) solvers available in the suite, you can customize and automate solutions for your structural mechanics problems and parameterize them to analyze multiple design scenarios. Ansys Mechanical is a dynamic tool that has a complete range of analysis tools.
2. Agenda
What is Stream Processing?
Frameworks & Platforms
Basic Concepts & Patterns
Demo Time
Benefits & Drawbacks + Considerations
Use Cases For Different Industries
How to start ?
3. This Talk is For
Software Developers
Tech Leads / Software Architects
Data Engineers / Data Scientist / AI Engineers
Product Owners / Product Managers / Business Analysts
4. $ whoami
I’m Soroosh Khodami
Full-Stack Developer at Bol.com & Code Nomads
Working with Stream Processing at Scale in Bol.com
Software Architecture Enthusiastic
@SorooshKh linkedin.com/in/sorooshkhodami/
Slides & Code Repository Link Will Be Shared At The End
9. Stream (Data) Processing
Stream processing is a big data technique that focuses on
continuously reading data, processing the data individually
or joining it with related data sets in real-time or near real-
time, and then sending the output to other applications,
data-stores, or systems.
18. Bounded Stream / Unbounded Stream
Time
Now
Past Future
Unbounded Stream
Bounded Stream #1
Start End
Time
Now
Past Future
Bounded Stream #2
Start End
19. Event Time & Processing Time
Processing
Time
Event Time
1
Login
1 2 3 4 5 6 7
2
Search
3
View
4
View
5
View
6
Play
1
Login
2
Search
3
View
4
View
5
View
6
Play
1 2 3 4 5 6 7
20. Delivery Guarantees
Learn More (Important)
Streaming Concepts - Exactly Once Fault Tolerance Guarantees youtube.com/watch?v=9pRsewtSPkQ
Rundown of Flink's Checkpoints - youtube.com/watch?v=hoLeQjoGBkQ
Understanding exactly-once processing and windowing in streaming pipelines - youtube.com/watch?v=DraQGkARegE
At Most Once
At Least Once
Exactly Once
Messages can be lost, but never duplicated (Fire & Forget)
Messages can be duplicated
Messages are delivered & processed exactly once
21. IoT Farm
Context
+1000 Sensors
Multiple Sensors per location
Not reliable internet connection
Large amount of continious sensors data
Requirements
Aggregated Sensors Data Per Location
Correct Order Of Data
No Duplicates
25. Time
5
4 4
1
7
2 2
6
4 1
Windowing
Sum: 19
Count: 5
2
3
6
4 4
7
2
2
6 4
1
2
• Divides an unbounded, continuous data stream into
smaller, finite segments
• Allows to perform operations and calculations on
manageable chunks of data.
• It’s not feasible to load/keep entire stream into memory
• Useful for analyzing data over specific time periods or
fixed numbers of events.
Window of Data
Learn More
Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s
Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M
26. Time
5
4 4
1
7
2 2
6
4
1
5 seconds
Time Based Windows
No Overlaps between windows elements
Tumbling/Fixed Window
5
1
4
7
2
4
5 seconds 5 seconds
4
2 1
Sum:11
Count: 4
Sum: 19
Count: 5
Sum: 5
Count: 2
Time
5
2 3
4 4
1
7
2 2
6
4
1
Size Based Windows
5
2 3
1
4
7
2
4
4
2
6
1
Sum: 11
Count: 4
Sum: 17
Count: 4
Sum: 13
Count: 4
2 3
2 3
Time
5
2 3
4 4
1
7
2 2
6
4
1
Time & Size Based Windows
5
2 3
1
4
7
2
4
4
2
6
1
Sum: 11
Count: 4
Sum: 17
Count: 4
Sum: 7
Count: 3
5 seconds 5 seconds 5 seconds
27. Sliding Window
Time
Success
Success
Success
Success Success
Error
WARN
WARN Error
WARN
Window #1 Window #2 Window #3 Window #N Window #N+1
Time Based Windows
Error
Error Error
Error Error
Error Error
Error
Success : 4
Warn : 0
Error : 0
Success : 3
Warn : 0
Error : 1
Success : 1
Warn : 2
Error : 1
………..
Success : 0
Warn : 0
Error : 4
Last 10 Second Every 5 Seconds + Overlaps Between Windows
28. Session Window
Time
User #1
Play
Heartbeat
Heart Beat
Seek
Seek Heartbeat
Seek
Heart Beat Heartbeat Heartbeat
Seek
Pause
Window #1 Window #2
10 sec
User #2
Play
Heartbeat
Heart Beat
Seek
Heartbeat
Heartbeat
Window #1 Window #2
20 sec
Close the window based on GAP Duration = 10 sec
31. Learn More
Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY
2
5 3
2
1 2
1
3 4 5
Temperature Sensor
Stream
Moisture Sensor
Stream
Window Window Inner Join
2
1 1
2
Window Cross Join
(CoGroup)
3
2
1
5
2
1
Joining Streams & Enrichment Pattern
Device-2 , Temp : 28
Device-2 , Moisture : 876
Device-2
Moisture : 876
Temp : 28
Inner Join
32. States & Stateful Stream Processing
Learn More
Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0
Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk
State
Stateful
Operator
Streams
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateful
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
State
33. States & Stateful Stream Processing
Login
Attempts
State:
Last Threshold Breach : Nullable
Read
Windowing
Last 15 Minutes
Count
Enrich With Previous
Breache and Update
Last Breach
Group By IP
Brute Force Login Monitoring
Sink
Security
Alerts
Learn More
Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0
Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk
Login
Attempts
Login
Attempts
Filter Above
Threshold
34. Group By Key / KeyBy [4Geeks]
Play
Heartbeat
Heart Beat
Seek
Seek
Heartbeat
Seek
Heart Beat
Heartbeat
Heartbeat
Seek
Group By Action
Play
Play
Play
Group By Customer Seek Heartbeat
Heartbeat
Heartbeat Seek
Play
Play
Learn More
Apache Flink Specifying Keys https://medium.com/big-data-processing/apache-flink-specifying-keys-81b3b651469
Branching & merging PCollections with Apache Beam - https://youtu.be/RYD40js20a4
40. Order Enrichment With Customer Data [4Geeks]
Apache Beam + Dataflow vs Spring Boot
Customers Events (CDC)
Orders Events
Enriched Orders With
Customer Data
Enrich Order Data
Code Repository & Slides
@SorooshKh
41. Insights
1 Dataflow Worker with Default Spec
120k message processed in 3 minutes
Apache Beam + Dataflow
Order Enrichment Test Results
Note: Please note that the insights provided above are not derived from a fully accurate benchmark.
~ 700 msg/second
Higher Costs
For Keeping Job Running
Tested on Minimum Kubernetes Hardware on GCP
120k message processed in 5 minutes
Spring Boot
~ 400 msg/second
Lower Costs
For Keeping Job Running
42. Order Enrichment With Customer Data [4Geeks]
Customer
CDC
Read
Enrich Order With
Customer Data
Sink
EnrichedOrder
Orders Read
Store Customer
in Redis
Get Customer
Information from Redis
Spring Boot + Redis
43. Order Enrichment With Customer Data [4Geeks]
Customer
CDC
State:
Customer
Read
CoGroupByKey
EnrichOrderWithCusto
merData
Sink
EnrichedOrder
Orders Read
KeyBy
CustomerID
KeyBy
CustomerID
Update Customer in State
Customer(123) (123, Customer(123)) (123, Customer(123))
Order(1005, CustomerId =123) (123, Order(1005, CustomerId=123)) (123, Order(1005, CustomerId=123))
OrderWithCustomerData
- Order
- Customer
Learn More
Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY
Apache Beam + Dataflow
44. Why Should We Consider It
Benefits, Drawbacks & Considerations
45. Benefits & Drawbacks
Fast & High-Throughput
Easy to Scale
Exactly Once Processing / Fault Tolerant
Customizable
Advanced features in scale: Windowing,
Watermarks, Stateful Functions and ..
✖ Complexity
✖ Implementation & Maintenance
✖ Testing & Debugging is challenging
✖ Changing the data pipelines are hard
✖ Error handling is not simple
✖ Data consistency is not easy
Drawbacks
Benefits
Stream Processing Frameworks
46. Stream Data Integration vs Stream Analytics
Learn More
Stream Processing – Concepts and Frameworks (Guido Schmutz, Switzerland)
https://www.youtube.com/watch?v=vFshGQ2ndeg | https://www.slideshare.net/gschmutz/introduction-to-stream-processing-132881199
(Stream ETL)
Stream Data Integration Stream Analytics
Reading Input
Map
Filter
Simple Enrich
Stateful Processing
Pattern Matching
Complex Calculations / Aggregations
47. Considerations
Learn More ( Important )
Apache Flink Worst Practices - Konstantin Knauf - https://www.youtube.com/watch?v=F7HQd3KX2TQ
Learning Curve Project Timeline Hard to Find Developer
Limited Docs/Resources Community Support Costs
Stream Data Integration
1 – 2 Weeks
Stream Analytics
2 – 3 Months
3 – 4 Engineers
4 – 6 Months
0 -> Stability
Cloud Providers Helps a Bit
50. When should we consider it in our solutions?
Case: Stream Data Integration
Context / Conditions
51. When should we consider it in our solutions?
Case: Stream Data Integration
Context / Conditions
• Events / second < 1K
• Experience of Stream processing : No
• Business queries are changing frequently
• Time to market : Very tight
• 3 – 4 Mid-Senior Developers
Learn More
Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ
Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.
52. When should we consider it in our solutions?
Learn More
Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ
Context / Conditions
Case: Stream Analytics
• Events / second > 10K
• Experience of Stream processing : No
• Business queries are clear and not changing frequently
• Real time/near real time insights are crucial ? Yes
• 3 – 4 Mid-Senior Developers
Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.
55. Video Platforms
Use cases
Playback Analytics
Content Provider Shares
Pay Per Minute
Fraud Detection
Personalized
Recommendation
Learn More
Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis youtube.com/watch?v=lC0d3gAPXaI
Custom, Complex Windows at Scale using Apache Flink - Matt Zimmer (Netflix) youtube.com/watch?v=XUvqnsWm8yo
SF 2017: Monal Daxini - Stream Processing with Flink at Netflix youtube.com/watch?v=sPB8w-YXX1s
Real-time Processing with Flink for Machine Learning at Netflix - Elliot Chow youtube.com/watch?v=o4C7TDneH00
56. Gaming Industry
Use cases
Learn More
Kafka and Big Data Streaming Use Cases in the Gaming Industry
https://www.confluent.io/online-talks/kafka-and-big-data-streaming-use-cases-in-the-gaming-
industry/
Let's Play Flink – Fun with Streaming in a Gaming Company
https://www.youtube.com/watch?v=8BNKEmt47UM
Game
Telemetry
Analytics
Rewards
(In-Game)
Live
In-Game
Changes
(NPC, Quests, .. )
IoT
Integration
Loyalty
Service
Anti-Cheat
Chat Service
Monitoring
Match
Making
Payment
Fraud
Detection
In-Game
Recommendation
Advertiseme
AI
Training
Payment
57. Application Analytics
Use cases
Learn More
Implementing Google Analytics: A Case Study - Making Sense of Stream Processing by Martin Kleppmann
https://www.oreilly.com/library/view/making-sense-of/9781492042563/ch01.html
Martin Kleppmann — Event Sourcing and Stream Processing at Scale https://www.youtube.com/watch?v=avi-TZI9t2I
Singles Day 2018: Data in a Flink of an eye https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye
58. Learn More
7 Reasons to use Apache Flink for your IoT Project
https://www.youtube.com/watch?v=Q0LBTmT4W9o
Fleet management / GPS Tracking
Anomaly detection
Smart home automation
Energy management
Environmental monitoring
Predictive maintenance
Self-Driving Cars
Internet Of Things
Use cases
59. Billing Network Optimization Security Fraud Detection
Learn More
Maciej Próchniak - Stream processing in telco - case study based on Apache Flink & TouK Nussknacker @ Devoxx Poland
https://www.youtube.com/watch?v=WLfEB__fM-4
Telecommunication
Use cases
60. Fraud detection
Algorithmic trading
Risk management
Real-time portfolio analysis Customer analytics
Regulatory compliance
Profit & Lost Insights
Learn More
Real Time Fraud Detection with Stateful Functions https://www.youtube.com/watch?v=RxDlksbsdQ0
Fast Data at ING - Martijn Visser & Bas Geerdink (ING) https://www.youtube.com/watch?v=e-_6gijUGAw
Stream ING Models – Real time model deployment of ML Capabilities https://www.youtube.com/watch?v=Do7C4UJyWCM
Financial Systems
Use cases
62. How to start learning?
[1] https://youtu.be/65lmwL7rSy4
[2] https://youtube.com/playlist?list=PL8bzd7vku-WhVHzJgmXoCxx3aB4PxTQLP
[3] https://beamsummit.org/
[3] https://www.flink-forward.org/
[4] https://beam.apache.org/documentation/
[4] https://nightlies.apache.org/flink/flink-docs-stable/
1 2 3 4
IMPORTANT NOTE
Creating a Stream Processing service isn't as straightforward as crafting CRUD APIs. Relying solely on Google, development
tools, Stackoverflow, and copy-pasting won't get you far. It's crucial to dedicate ample time to thoroughly learn and
understand the underlying concepts.
Google Cloud Apache Beam
Debi Cabrera
Apache Beam Step By Step
Atul Raina
BEAM SUMMIT & FLINK
FORWARD
Official Documentation
63. Slides & Code Repository
Any Question ?
Send me a message on twitter or Linkedin
Thanks for your Attention !
@SorooshKh linkedin.com/in/sorooshkhodami/
Please Rate This Session
And Share Your Feedback
Editor's Notes
What is Stream Processing ?
Why We Should Learn It ?
Developer By Day, Furniture Assembelr By NightI learned that using Right tool is the most important part of assembling
Question 1: Who has heard these technologies a lot ? Question 2: Who has used this technologies in production ? Everyday that we wake up, we hear some new Apache technologies ..
Okay, Not for me I'm not fan of complex definitions. let's get to a simple definition
reading data multiple source
processing Data itself. payload itself
individually or joined with other data
sending out to another system
Event processing is a technique that focuses on listening for specific events or patterns of events within a system, enabling decision-making and triggering actions based on the information contained in the events.
Services communicates with Events
We need to chunk the data to make it feasible to process
Bounded Stream Example : Processing list of last month records for Train Check in – Checkout for Analysis purpose
1 Minute : You are watching netflix on Airplane / Subway . Your actions will be synced afterward
We have three type of guarantees, no gurantee , at least one delivery, exactly once deliveryFlink -> Checkpointing
Don’t forget to check learn more
Ok, wait. Hold your horse , So you said a lot of definitions, what is the usecase ..
1 minute We cannot carry two watermelon with one hand We need to chunk the data to make it feasible to processOk, right. We should devide . but how we are going to divide the data ?
It’s very similar to a shuttle, isn’t it ?
Let’s imagine that we are receiving request logs
Watching Video on in the Subway or during the flight
Phone Call
How Stream Processing can do this ? Session Window is based on Group By Key
1 Minute : Thing that we need to learn, they are too much. So we make it easier by Examples !
How can we do it in our current applications, without Stream processing frame works ?
Some times we need to store some data, and later looking back to stored data similar to what we used to do with Redis / Database.
Key By is most commong Transformation partition the data stream similar to group by in SQL Some times we need to group some of the data together
Some times it may cause a network shuffle that will partition the stream on different nodes
5 minute
val failedLogins = p.apply("Read PubSub Messages", readFromPubSubSubscription())
val ipCounts = failedLogins
.apply("Window", failedLoginWindowingStrategy())
.apply("Map to KV <IP,MSG>", mapToKVIPAddr())
.apply("Group by Key IP-Addr", GroupByKey.create())
.apply("Count per IP", countNumberOfAttempts())
val alerts = ipCounts
.apply("Filter by Threshold", isCountOfAttempAboveThresholdFilter())
.apply("Enrich with Old Breaches Last Month", enrichWithOldBreachesLastMonth())
alerts.apply("Write Alerts to PubSub", publishToPubSubTopic())
val failedLogins = p.apply("Read PubSub Messages", readFromPubSubSubscription())
val ipCounts = failedLogins
.apply("Window", failedLoginWindowingStrategy())
.apply("Map to KV <IP,MSG>", mapToKVIPAddr())
.apply("Group by Key IP-Addr", GroupByKey.create())
.apply("Count per IP", countNumberOfAttempts())
val alerts = ipCounts
.apply("Filter by Threshold", isCountOfAttempAboveThresholdFilter())
.apply("Enrich with Old Breaches Last Month", enrichWithOldBreachesLastMonth())
alerts.apply("Write Alerts to PubSub", publishToPubSubTopic())
Stream Processing Applications and especially when you start to have Stateful functions are not really easy.
Complexity
Handling out-of-order events, windowing, and state management
Increased complexity compared to batch processing
Implementation and Maintenance
Expertise required in distributed systems, fault tolerance, and specific stream processing frameworks
Maintenance effort for business logic and data flow changes
Testing and Debugging
Complex testing scenarios and simulation of various events and failures
Difficulties in debugging due to real-time and distributed nature of processing
Error Handling
Managing errors and edge cases can be challenging
Recovery mechanisms and failure scenarios require careful consideration
Data Consistency
Ensuring exactly-once processing and data consistency can be challenging
Requires robust handling of distributed systems and failures
Learning Curve and Project Timeline
2-3 months for a medior developer to become proficient
4-6 months for a project to reach stability from start
Resource Intensiveness
Real-time processing may consume more resources than batch processing
Cloud services can help mitigate infrastructure costs
In Short Stream Data Integration is Map Transform Filter Enrich Stream Data Integration is also using States , Windowing , State Management, Event Pattern
Learning Curve
Stream Data Integration : 1 – 2 weeks
Stream Analytics: 2 – 3 months
For not very basic project, expect 2-4 months from project initiation to reach stability
It’s not easy to find developers with extensive stream processing experience.
For most of Stream processing frameworks, there are not many step by step documentation & stack overflow questions with working answers. You need to connect the dots yourself.
Decent community support available, but not as extensive as Spring or other popular frameworks
Stream processing can be resource-intensive, ( Cloud services helps us here )
Case Stream Data Integration: (Map, Filter, Basic Enrichment)You are not getting much out of using Stream processing frameworks. You can achieve almost same results with other tools with possibility to scale up.Case Stream Analytics : You should start investing on your stream processing solution and building a team by help of professional consultants to lead/faciliate/boost the process. In the mean time, you can use other available tools to support part of your business requirements. ( Like BigQuery, Monitoring tools)
Case Stream Data Integration: (Map, Filter, Basic Enrichment)You are not getting much out of using Stream processing frameworks. You can achieve almost same results with other tools with possibility to scale up.Case Stream Analytics : You should start investing on your stream processing solution and building a team by help of professional consultants to lead/faciliate/boost the process. In the mean time, you can use other available tools to support part of your business requirements. ( Like BigQuery, Monitoring tools)
Case Stream Data Integration: (Real time ETL) You are not getting much out of using Stream processing frameworks. You can achieve almost same results with other tools with possibility to scale up.Case Stream Analytics : You should start investing on your stream processing solution and building a team by help of professional consultants to lead/faciliate/boost the process. In the mean time, you can use other available tools to support part of your business requirements. ( Like BigQuery, Monitoring tools)
Anomaly detection: Stream processing can help identify unusual patterns or behaviors in IoT device data, enabling early detection of potential issues or failures. For example, it can be used to monitor sensor data from industrial equipment or vehicles to detect anomalies that may indicate a malfunction or maintenance need.
Smart home automation: In a smart home environment, stream processing can be used to analyze data from various sensors and devices to trigger automated actions, such as adjusting lighting or temperature based on occupancy, time of day, or user preferences.
Fleet management: Stream processing can analyze data from GPS trackers, vehicle sensors, and other devices in real-time to optimize fleet operations. This may include route planning, vehicle maintenance scheduling, fuel efficiency analysis, or driver behavior monitoring.
Environmental monitoring: IoT devices can be deployed to monitor various environmental parameters, such as air quality, water levels, or temperature. Stream processing can be used to analyze this data in real-time, enabling rapid response to environmental changes or potential hazards.
Energy management: Stream processing can be used to analyze energy consumption data from smart meters, IoT devices, and sensors in real-time, helping to optimize energy usage and reduce costs. This can be applied to smart grids, microgrids, or individual buildings.
Predictive maintenance: By analyzing IoT sensor data in real-time, stream processing can help predict when a machine or equipment may require maintenance or is likely to fail. This allows for proactive maintenance scheduling, reducing downtime and increasing operational efficiency.