If your business is heavily dependent on the Internet, you may be facing an unprecedented level of network traffic analytics data. How to make the most of that data is the challenge. This presentation from Kentik VP Product and former EMA analyst Jim Frey explores the evolving need, the architecture and key use cases for BGP and NetFlow analysis based on scale-out cloud computing and Big Data technologies.
The process of streaming real-time data from a wide variety of machine data sources and entities can be very complex and unwieldy. Using an agent-based approach, Informatica has invented a new technique and open access product that makes this process much more user friendly and efficient, even when dealing with multiple environments such as Hadoop, Cassandra, Storm, Amazon Kinesis and Complex Event Processing.
Failure is inevitable in any distributed system but anticipating failures and building systems to recover from failures instantaneous makes the system highly resilient. At Capital One we process billions of events everyday and we leverage cloud, microservices, streaming and machine learning technologies to solve customer problems and provide the best customer experience. As part of this session I will be talking about highly resilient streaming architecture that is supporting processing of billions of events every day then some of the strategies & best practices to build highly available and fault-tolerant systems utilizing Kafka and Cloud environments.
(Benny Lee + Christopher Arthur, Bank of Australia) Kafka Summit SF 2018 Commonwealth Bank of Australia (CBA) is Australia’s largest bank with over 15m customers, 50,000 employees and over USD700 billion in assets. We started the journey two years ago to transform our existing enterprise architecture into an “event driven” architecture. Since then, Kafka has become a mission critical platform in the Bank and it is the core component in our “event driven” architecture strategy. In this talk, we will walk you through the journey of how we stood up the initial Kafka clusters, the challenges we encountered (both technical and organisational) and how we overcame those challenges. We will also deep dive into one of the use cases for Kafka (with Kafka Streams and Connectors) in our new real time payment system that was introduced in Australia early this year. We will discuss why we think Kafka was the perfect solution for this use case, and the lessons learned. Key Takeaways: -Lessons learned from our experiences (that we think other companies could be able to benefit from) -Our use cases for Kafka with a particular focus on the new real time payment systems (NPP) initiative in Australia
Undertaking a digital journey starts with clearly articulating the success factors for the entire digital journey, and our experience from the field has shown it to be an Achilles heel for most CXOs, across Fortune 500 organizations. Our findings were corroborated when a Mckinsey study reported that only 15% of the organizations are able to calculate the ROI of a digital initiative. In this talk we will deliberate on demonstrated examples from multi-billion dollar businesses around proven methodologies to measure the value of a digital enterprise. The panel will share experiences as well as provide actionable advice for immediate next steps around the following: Successful metrics for measuring the value for Digital / IoT / AI/ Machine learning engagements How can 'Digital Traction Metrics' help with actionable insights even before the Financial Metrics have been reported What are the best in-class organizational constructs and futuristic employee engagement methods to facilitate the digital revolution Panelists for this session include: • Christian Bilien - Head of Global Data at Societe Generale • Pierre Alexandre Pautrat – Head of Big Data at BPCE/Nattixis • Ronny Fehling – VP , Airbus • Juergen Urbanski – Silicon Valley Data Science • Abhas Ricky - EMEA Lead, Innovation & Strategy, Hortonworks
Time series data is everywhere -- connected IoT devices, application monitoring & observability platforms, and more. What makes time series datastreams challenging is that they often have orders of magnitude more data than other workloads, with millions of time series datapoints being quite common. Given its ability to ingest high volumes of data, Kafka is a natural part of any data architecture handling large volumes of time series telemetry, specifically as an intermediate buffer before that data is persisted in InfluxDB for processing, analysis, and use in other applications. In this session, we will show you how you can stream time series data to your IoT application using Kafka queues and InfluxDB, drawing upon deployments done at Hulu and Wayfair that allow both to ingest 1 million metrics per second. Once this session is complete, you’ll be able to connect a Kafka queue to an InfluxDB instance as the beginning of your own time series data pipeline.
At Gloo.us, we face a challenge in providing platform data to heterogeneous applications in a way that eliminates access contention, avoids high latency ETLs, and ensures consistency for many teams. We're solving this problem by adopting Data Mesh principles and leveraging Kafka, Kafka Connect, and Kafka streams to build an event driven architecture to connect applications to the data they need. A domain driven design keeps the boundaries between specialized process domains and singularly focused data domains clear, distinct, and disciplined. Applying the principles of a Data Mesh, process domains assume the responsibility of transforming, enriching, or aggregating data rather than relying on these changes at the source of truth -- the data domains. Architecturally, we've broken centralized big data lakes into smaller data stores that can be consumed into storage managed by process domains. This session covers how we’re applying Kafka tools to enable our data mesh architecture. This includes how we interpret and apply the data mesh paradigm, the role of Kafka as the backbone for a mesh of connectivity, the role of Kafka Connect to generate and consume data events, and the use of KSQL to perform minor transformations for consumers.
Kurt Schneider [Discover Financial] | How Discover Modernizes Observability with InfluxDB Cloud | InfluxDays Virtual Experience NA 2020
This document discusses Apache Flink for IoT event-time stream processing. It begins by introducing streaming architectures and Flink. It then discusses how IoT data has important properties like continuous data production and event timestamps that require event-time based processing. Examples are provided of companies like King and Bouygues Telecom using Flink for billions of events per day with challenges like out-of-order data and flexible windowing. Event-time processing in Flink is able to handle these challenges through features like watermarks.
Apache Hudi is a data lake platform, that provides streaming primitives (upserts/deletes/change streams) on top of data lake storage. Hudi powers very large data lakes at Uber, Robinhood and other companies, while being pre-installed on four major cloud platforms. Hudi supports exactly-once, near real-time data ingestion from Apache Kafka to cloud storage, which is typically used in-place of a S3/HDFS sink connector to gain transactions and mutability. While this approach is scalable and battle-tested, it can only ingest data in mini batches, leading to lower data freshness. In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes, to further re-organize for better query performance.
(Dmitry Milman + Ankur Kaneria, Express Scripts) Kafka Summit SF 2018 Building cloud-based microservices can be a challenge when the system of record is a relational database residing on an on-premise mainframe. The challenge lies in the ability to efficiently and cost-effectively access the ever-increasing amount of data. Express Scripts is reimagining its data architecture to bring best-in-class user experience and provide the foundation of next-generation applications. This talk will showcase how Kafka plays a key role within Express Scripts’ transformation from mainframe to a microservice-based ecosystem, ensuring data integrity between two worlds. It will discuss how change data capture (CDC) is leveraged to stream data changes to Kafka, allowing us to build a low-latency data sync pipeline. We will describe how we achieve transactional consistency by collapsing all events that belong together onto a single topic, yet have the ability to scale out to meet the real time SLAs and low-latency requirements through means of partitions. We will share our Kafka Streams configuration to handle the data transformation workload. We will discuss our overall Kafka cluster footprint, configuration and security measures. Express Scripts Holding Company is an American Fortune 100 company. As of 2018, the company is the 25th largest in the U.S. as well as one of the largest pharmacy benefit management organizations in the U.S. Customers rely on 24/7 access to our services, and need the ability to interact with our systems in real time via various channels such as web and mobile. Sharing our mainframe t0 microservices migration journey, our experiences and lessons learned would be beneficial to other companies venturing on a similar path.
For many industries the need to group together related events based on a period of activity or inactivity is key. Advertising businesses, content producers are just a few examples of where session windows can be used to better understand user behavior. While such sessionization has been possible in Apache Kafka up to this point, implementing it has been rather complex and required leveraging low-level APIs. In the most recent release of Kafka, however, new capabilities have been added making session windows much easier to implement. In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Booz Allen is at the forefront of cyber innovation and sometimes that means applying AI in an on-prem environment because of data sensitivity.
Hyperconverged infrastructure functions by combining storage, networking and computing into a single system.
This document discusses microservices and provides an overview of common microservice concepts. It begins with discussing problems with monolithic architectures and then covers topics like service registration with Eureka, load balancing with Ribbon, edge services with Zuul, and failure management with Hystrix. Both pros and cons of the microservices approach are presented. The document concludes with an example demo of a microservices architecture using Spring Cloud and a request for any questions.
Data volumes continue to grow, demanding new, more scalable solutions for low-latency data processing. Previously, the default approach to deploying such systems was to throw a ton of hardware at the problem. However, that is no longer necessary, as newer technologies showcase a level of efficiency that enables smaller, more manageable clusters while handling extreme workloads. Processing billions of events per second on Kafka can now be done with a modest investment in compute resources. In this session, you will learn how to architect and build the fastest data processing applications that scale linearly, and combine streaming data and reference data data-in-motion and data-at-rest with machine learning. We will take you through the end-to-end framework and example application, built on the Hazelcast Platform, an open source software engine designed for ultra-fast performance. We will also show how you can leverage SQL to further explore the operational data in the solution including querying Kafka topics and key-value data on the in-memory data store. Attendees will also get access to the Github sample application shown.
The Ohio Department of Transportation has adopted Confluent as the event driven enabler of DriveOhio, a modern Intelligent Transportation System. DriveOhio digitally links sensors, cameras, speed monitoring equipment, and smart highway assets in real time, to dynamically adjust the surface road network to maximize the safety and efficiency for travelers. Over the past 24 months the team has increased the number and types of devices within the DriveOhio environment, while also working to see their vendors adopt Kafka to better participate in data sharing.
Meetup presentation for the kafka meetup in NYC put on by @allthingshadoop. The presentation covers Apache NiFi (incubating)
Nokia is looking to transform its business for the future by regaining leadership in the smartphone market, maintaining leadership in mobile phones, and sustaining its position as a leading mobile products company. It will partner with Microsoft to build a new ecosystem for smartphones and maintain volume and value leadership. Nokia will also focus on bringing the web and apps to new price points, invest in future disruptions like MeeGo, and develop its location and commerce business including building a structured data platform and advanced analytics capabilities using big data.
Data Innovation Lab wil bijdragen aan Schiphol’s strategische doelen d.m.v. efficiency voordelen (kostenbesparingen) en nieuwe verdienmodellen door het kapitaliseren van de waarde van data. We willen u graag meenemen in hoe we gaan werken en wat we gaan doen.
This document provides an overview of orchestration and learning analytics research by Luis P. Prieto. It discusses orchestration as coordinating supportive interventions across learning activities. Orchestration research has modeled teacher practices through observational studies and eye-tracking. Learning analytics aims to aid educators by analyzing teaching and learning processes. The combination of orchestration and learning analytics is called teaching analytics. Prieto envisions applying this research at the CEITER research center through developing tools to support evidence-based teacher practices and orchestration-aware learning designs. Challenges include ensuring trust, privacy, added value and adoption at scale.
El documento resume los inicios de la tecnología desde la prehistoria hasta la era moderna. Algunos de los primeros inventos importantes mencionados incluyen la rueda, el telégrafo y el teléfono móvil. También describe los orígenes de la computadora, incluida la primera generación basada en válvulas, y la invención de la bombilla eléctrica por Thomas Edison.