This document provides an overview and summary of Apache Pulsar, a distributed streaming and messaging platform. It discusses Pulsar's benefits like data durability, scalability, geo-replication and multi-tenancy. It outlines key use cases like message queuing and data streaming. The document also summarizes Pulsar's architecture, subscriptions modes, connectors, and integration with other technologies like Apache Flink, Apache NiFi and MQTT. It highlights real-world customer implementations and provides demos of ingesting IoT data via Pulsar.
Apache Pulsar was developed to address several shortcomings of existing messaging systems including geo-replication, message durability, and lower message latency. We will implement a multi-currency quoting application that feeds pricing information to a crypto-currency trading platform that is deployed around the globe. Given the volatility of the crypto-currency prices, sub-second message latency is critical to traders. Equally important is ensuring consistent quotes are available to all geographical locations, i.e the price of Bitcoin shown to a user in the USA should be the same as it to a trader in Hong Kong. We will highlight the advantages of Apache Pulsar over traditional messaging systems and show how its low latency and replication across multiple geographies make it ideally suited for globally distributed, real-time applications.
StreamNative FLiP into scylladb - scylla summit 2022 Utilizing Apache Pulsar with Apache NiFi, Apache Flink, Apache Spark and Scylla for fast IoT application with MQTT and beyond.
This document provides an overview and summary of Apache Pulsar with MQTT for edge computing. It discusses how Pulsar is an open-source, cloud-native distributed messaging and streaming platform that supports MQTT and other protocols. It also summarizes Pulsar's key capabilities like data durability, scalability, geo-replication, and unified messaging model. The document includes diagrams showcasing Pulsar's publish-subscribe model and different subscription modes. It demonstrates how Pulsar can be used with edge devices via protocols like MQTT and how streams of data from edge can be processed using connectors, functions and SQL.
Cloud lunch and learn real-time streaming in azure Apache pulsar is an open source, cloud-native distributed messaging and streaming platform.
Biography Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. Talk Real-Time Streaming in Any and All Clouds, Hybrid and Beyond Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive. Tools: Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet. References: https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html Source Code: https://github.com/tspannhw/MmFLaNK FLiP Stack StreamNative
ApacheCon 2021 Apache Deep Learning 302 Tuesday 18:00 UTC Apache Deep Learning 302 Timothy Spann This talk will discuss and show examples of using Apache Hadoop, Apache Kudu, Apache Flink, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to previous talks on Apache Deep Learning 101 and 201 and 301 at ApacheCon, Dataworks Summit, Strata and other events. As part of this talk, the presenter will walk through using Apache MXNet Pre-Built Models, integrating new open source Deep Learning libraries with Python and Java, as well as running real-time AI streams from edge devices to servers utilizing Apache NiFi and Apache NiFi - MiNiFi. This talk is geared towards Data Engineers interested in the basics of architecting Deep Learning pipelines with open source Apache tools in a Big Data environment. The presenter will also walk through source code examples available in github and run the code live on Apache NiFi and Apache Flink clusters. Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. * https://github.com/tspannhw/ApacheDeepLearning302/ * https://github.com/tspannhw/nifi-djl-processor * https://github.com/tspannhw/nifi-djlsentimentanalysis-processor * https://github.com/tspannhw/nifi-djlqa-processor * https://www.linkedin.com/pulse/2021-schedule-tim-spann/
This document discusses isolation in Apache Pulsar. It introduces the presenters as experts in distributed systems and the Pulsar open source project. It then outlines ways to isolate resources in Pulsar like brokers, bookies, and clusters to separate namespaces and tenants. The key methods covered are namespace isolation policies, failure domains, anti-affinity groups, and bookie affinity groups. It provides examples of how these are configured and allows scaling resources up and down independently per namespace. Finally, it invites questions and provides contact details.
FLiP Into Trino FLiP into Trino. Flink Pulsar Trino Pulsar SQL (Trino/Presto) Remember the days when you could wait until your batch data load was done and then you could run some simple queries or build stale dashboards? Those days are over, today you need instant analytics as the data is streaming in real-time. You need universal analytics where that data is. I will show you how to do this utilizing the latest cloud native open source tools. In this talk we will utilize Trino, Apache Pulsar, Pulsar SQL and Apache Flink to analyze instantly data from IoT, sensors, transportation systems, Logs, REST endpoints, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach how to use Pulsar SQL to run analytics on live data. Tim Spann Developer Advocate StreamNative David Kjerrumgaard Developer Advocate StreamNative https://www.starburst.io/info/trinosummit/ https://github.com/tspannhw/FLiP-Into-Trino/blob/main/README.md https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/tree/main/src/main/java select * from pulsar."public/default"."weather"; Apache Pulsar plus Trio = fast analytics at scale
Fluentd is an open source log collector that allows flexible collection and routing of log data. It uses JSON format for log messages and supports many input and output plugins. Fluentd can collect logs from files, network services, and applications before routing them to storage and analysis services like MongoDB, HDFS, and Treasure Data. The open source project has grown a large community contributing over 100 plugins to make log collection and processing easier.
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Kafka, and Flink Timothy Spann Twitter - @PaasDev // Blog: www.datainmotion.dev Frequent speaker at major conferences and events. Principal DataFlow Field Engineer for streaming around Apache NiFi, NiFi Registry, MiNiFi, Kafka, Kafka Connect, Kafka Streams, Flink, Flink SQL, SMM, SRM, SR and EFM. Previously at E&Y, HPE, Pivotal & Hortonworks Question #1 What is the most difficult part of an Edge Flow? Gateway Agent Edge Data Collection Processing Data https://github.com/tspannhw/DemoJam2021 https://github.com/tspannhw/CloudDemo2021
(VIRTUAL) Hail Hydrate! From Stream to Lake Using Open Source - Timothy J Spann, StreamNative https://osselc21.sched.com/event/lAPi?iframe=no A cloud data lake that is empty is not useful to anyone. How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero. https://osselc21.sched.com/event/lAPi/virtual-hail-hydrate-from-stream-to-lake-using-open-source-timothy-j-spann-streamnative
Using Apache Spark with IBM SPSS Modeler with Dr. Steve Poulin. An introduction to Apache Spark and its relevant integration with IBM SPSS Modeler. Why integrate? What type of benefits? A review the integration process high level and advise which enhanced features to pay attention to, and common pitfalls to avoid.
#phillyopensource Introduction talk for data engineers for deep learning on apache with apache mxnet, apache nifi, apache hive, apache hadoop, apache spark, python and other tools.
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0 FLiP Stack (Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark) with Influx DB for Edge AI and IoT workloads at scale Tim Spann Developer Advocate StreamNative datainmotion.dev
Real time stock processing with apache nifi, apache flink and apache kafka with Kafka Connect apps, SMM, NiFi Registry, Scheam Registry, Kafka topics, Flink SQL, NiFi
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022 As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed. I will walk through how to get started, some use cases and demos and answer questions. https://www.devfest-uki.com/schedule https://linktr.ee/tspannhw
This document summarizes Tim Spann's presentation on codeless pipelines with Apache Pulsar and Apache Flink. The presentation discusses how StreamNative's platform uses Pulsar and Flink to enable end-to-end streaming data pipelines without code. It provides an overview of Pulsar's capabilities for messaging, stream processing, and integration with other Apache projects like Kafka, NiFi and Flink. Examples are given of ingesting IoT data into Pulsar and running real-time analytics on the data using Flink SQL.
https://github.com/tspannhw/SpeakerProfile/tree/main/2022/talks Fast Streaming into Clickhouse with Apache Pulsar https://github.com/tspannhw/FLiPC-FastStreamingIntoClickhouseWithApachePulsar https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/285271332/ Fast Streaming into Clickhouse with Apache Pulsar - Meetup 2022 StreamNative - Apache Pulsar - Stream to Altinity Cloud - Clickhouse May the 4th Be With You! 04-May-2022 Clickhosue Meetup CREATE TABLE iotjetsonjson_local ( uuid String, camera String, ipaddress String, networktime String, top1pct String, top1 String, cputemp String, gputemp String, gputempf String, cputempf String, runtime String, host String, filename String, host_name String, macaddress String, te String, systemtime String, cpu String, diskusage String, memory String, imageinput String ) ENGINE = MergeTree() PARTITION BY uuid ORDER BY (uuid); CREATE TABLE iotjetsonjson ON CLUSTER '{cluster}' AS iotjetsonjson_local ENGINE = Distributed('{cluster}', default, iotjetsonjson_local, rand()); select uuid, top1pct, top1, gputempf, cputempf from iotjetsonjson where toFloat32OrZero(top1pct) > 40 order by toFloat32OrZero(top1pct) desc, systemtime desc select uuid, systemtime, networktime, te, top1pct, top1, cputempf, gputempf, cpu, diskusage, memory,filename from iotjetsonjson order by systemtime desc select top1, max(toFloat32OrZero(top1pct)), max(gputempf), max(cputempf) from iotjetsonjson group by top1 select top1, max(toFloat32OrZero(top1pct)) as maxTop1, max(gputempf), max(cputempf) from iotjetsonjson group by top1 order by maxTop1 Tim Spann Developer Advocate StreamNative
Data insights and data-driven strategies create the competitive differentiators companies thrive off today. The need for unified messaging and streaming has never been more apparent. Pulsar started with the goal of building a global, geo-replicated infrastructure to serve Yahoo!’s messaging needs. With the increased need to process both business events (such as payment request, billing request) and operational events (such as log data, click events, etc), the team at Yahoo! set out to build a true unified infrastructure platform to handle all in-motion data. That technology became Apache Pulsar. In this talk, Matteo Merli and Sijie Guo will dive into the landscape of unified messaging and streaming, how Pulsar helps companies achieve this vision, and what the future of Pulsar will look like.
Microservices, events, containers, and orchestrators are dominating our vernacular today. As operations teams adapt to support these technologies in production, cloud-native platforms like Pivotal Cloud Foundry and Kubernetes have quickly risen to serve as force multipliers of automation, productivity and value. Apache Kafka® is providing developers a critically important component as they build and modernize applications to cloud-native architecture. This talk will explore: • Why cloud-native platforms and why run Apache Kafka on Kubernetes? • What kind of workloads are best suited for this combination? • Tips to determine the path forward for legacy monoliths in your application portfolio • Demo: Running Apache Kafka as a Streaming Platform on Kubernetes
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and Kafka Apache NiFi, Apache Flink, Apache Kafka Timothy Spann Principal Developer Advocate Cloudera Data in Motion https://budapestdata.hu/2023/en/speakers/timothy-spann/ Timothy Spann Principal Developer Advocate Cloudera (US) LinkedIn · GitHub · datainmotion.dev June 8 · Online · English talk Building Modern Data Streaming Apps with NiFi, Flink and Kafka In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there we build streaming ETL with Apache Flink SQL. We will stream data into Apache Iceberg. We use the best streaming tools for the current applications with FLaNK. flankstack.dev BIO Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Timothy Spann: Apache Pulsar for ML Data Science Online Camp 2023 Winter Website: https://dscamp.org Youtube: https://www.youtube.com/channel/UCeHtPZ_ZLZ-nHFMUCXY81RQ FB: https://www.facebook.com/people/Data-Science-Camp/100064240830422/
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar Streaming Data Platform for cloud-native event-driven applications https://github.com/tspannhw/pulsar-csp-ce/blob/main/weather.md https://github.com/tspannhw/create-nifi-pulsar-flink-apps https://medium.com/@tspann/using-apache-pulsar-with-cloudera-sql-builder-apache-flink-b518aa9eadff https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289674210/ For non-locals, we will Broadcast Live via Youtube. Sign up and we will send out the link. Location: TigerLabs in Princeton on the 2nd floor, walk up and the door will be open. Same that we were using for the old Future of Data - Princeton events 2016-2019. Parking at the school is free. street parking nearby is free. there are meters on some streets, and a few blocks away is a paid parking garage. We are joining forces with our friends Cloudera again on a FLiPN amazing journey into Real-Time Streaming Applications with Apache Flink, Apache NiFi, and Apache Pulsar. Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding. |WHAT THE SESSION WILL COVER| Apache NiFi Apache Pulsar Apache Flink Flink SQL We will show you how to build apps, so download beforehand to Docker, K8, your Laptop, or the cloud. Cloudera CSP Setup Getting Started with Cloudera Stream Processing Community Edition You may download CSP-CE here: Cloudera Stream Processing Community Edition The Cloudera CDP User's page: CDP Resources Page https://youtu.be/s80sz3NWwHo https://docs.cloudera.com/csp-ce/latest/index.html https://www.cloudera.com/downloads/cdf/csp-community-edition.html Apache Pulsar https://pulsar.apache.org/docs/getting-started-standalone/ or https://streamnative.io/free-cloud/ Cloudera + Pulsar https://community.cloudera.com/t5/Cloudera-Stream-Processing-Forum/Using-Apache-Pulsar-with-SQL-Stream-Builder/m-p/349917 https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-with-Apache-Pulsar-for-Streaming/ta-p/337891 |AGENDA| 6:00 - 6:30 PM EST: Food, Drink, and Networking!!! 6:30 - 7:15 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate 7:15 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer 8:00 - 8:30 PM EST: Round Table on Real-Time Streaming, Q&A |ABOUT THE SPEAKERS| John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data. Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, dist
This document provides an overview of Apache Kafka including its main components, architecture, and ecosystem. It describes how LinkedIn used Kafka to solve their data pipeline problem by decoupling systems and allowing for horizontal scaling. The key elements of Kafka are producers that publish data to topics, the Kafka cluster that stores streams of records in a distributed, replicated commit log, and consumers that subscribe to topics. Kafka Connect and the Schema Registry are also introduced as part of the Kafka ecosystem.
During the Confluent Streaming event in Paris, Florent Ramière, Technical Account Manager at Confluent, goes beyond brokers, introducing a whole new ecosystem with Kafka Streams, KSQL, Kafka Connect, Rest proxy, Schema Registry, MirrorMaker, etc.
Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments. These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters. Slides used at OpenShift Meetup Spain: - https://www.meetup.com/es-ES/openshift_spain/events/261284764/
Using FLaNK with InfluxDB for EdgeAI IoT at Scale Timothy from StreamNative take you on a hands-on deep-dive on using Pulsar, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano. The team run deep learning models on the edge devices and send images, and capture real-time GPS and sensor data. Their low-coding IoT applications provide easy edge routing, transformation, data acquisition and alerting before they decide what data to stream real-time to their data space. These edge applications classify images and sensor readings real-time at the edge and then send Deep Learning results to Flink SQL and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to InfluxDB.
Using FLiP with influxdb for EdgeAI IoT at Scale apache pulsar influxdb apache flink streamnative apache spark apache nifi FLiP(N) stack
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0 There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case. Here it is. Never seen or presented.
Timothy will introduce Apache Pulsar, an open-source distributed messaging and streaming platform. He will discuss how to build real-time applications using Pulsar with various libraries, schemas, languages, frameworks and tools. The presentation will cover what Pulsar is, its functions and components, how it compares to other technologies like Apache Kafka, its advantages, and how to integrate it with tools like Apache Flink, Apache Spark, Apache NiFi and more. A demo and Q&A will follow.
This document discusses how Apache Pulsar can be used as a unified messaging platform from edge to multi-cloud environments. It provides an overview of Pulsar's key features such as durability, scalability, geo-replication, and functions. It also compares Pulsar to Apache Kafka and outlines Pulsar's architecture including tenants, namespaces, topics, and message formats. Additionally, it demonstrates how Pulsar can be used with various protocols and frameworks like Kafka, MQTT, AMQP, NiFi, and Flink.
Agenda: - Cloud Native vs. SaaS / Serverless Kafka - The Emergence of Kubernetes - Kafka on K8s Deployment Challenges - Confluent Operator as Kafka Operator - Q&A Confluent Operator enables you to: Provisioning, management and operations of Confluent Platform (including ZooKeeper, Apache Kafka, Kafka Connect, KSQL, Schema Registry, REST Proxy, Control Center) Deployment on any Kubernetes Platform (Vanilla K8s, OpenShift, Rancher, Mesosphere, Cloud Foundry, Amazon EKS, Azure AKS, Google GKE, etc.) Automate provisioning of Kafka pods in minutes Monitor SLAs through Confluent Control Center or Prometheus Scale Kafka elastically, handle fail-over & Automate rolling updates Automate security configuration Built on our first hand knowledge of running Confluent at scale Fully supported for production usage
Modern IT and application environments are increasingly complex, transitioning to cloud, and large in scale. The managed resources, services and applications in these environments generate tremendous data that needs to be observed, consumed and analyzed in real time (or later) by management tools to create insights and to drive operational actions and decisions. In this talk, Srikanth Natarajan will share Micro Focus’ adoption story of Pulsar, including the experience in consuming from and contributing to Apache Pulsar, the lessons learned, and the help that Micro Focus received from a development support partner in their Pulsar journey.
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways. Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space. Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka. -Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers. -Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions. -Understand how Capital One manages Kafka Docker containers using Kubernetes. Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
The document discusses OpenStack and Fibre Channel storage. It provides an overview of OpenStack, including its goals of being an open platform with broad support and empowering users. It describes core OpenStack technologies like Compute, Object Storage, and Block Storage. It outlines the history and current state of Fibre Channel support in OpenStack, including the Fibre Channel Zone Manager that automates zoning. It diagrams the high-level architecture and components involved in provisioning Fibre Channel volumes to virtual machines from OpenStack.
OSSNA Building Modern Data Streaming Apps https://ossna2023.sched.com/event/1Jt05/virtual-building-modern-data-streaming-apps-with-open-source-timothy-spann-streamnative Timothy Spann Cloudera Principal Developer Advocate Data in Motion In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize all the best features. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We make continuous queries against our topics with Flink SQL. We will stream data into various open-source data stores, including Apache Iceberg, Apache Pinot, and others. We use the best streaming tools for the current applications with the open source stack - FLiPN. https://www.flipn.app/ Updates: This will be in-person with live coding based on feedback from the crowd. This will also include new data stores, new sources, and data relevant to and from the Vancouver area. This will also include updates to the platforms and inclusion of Apache Iceberg, Apache Pinot and some other new tech. https://github.com/tspannhw/SpeakerProfile Tim Spann is a Principal Developer Advocate for Cloudera. He works with Apache Kafka, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. Timothy J Spann Cloudera Principal Developer Advocate Hightstown, NJ Websitehttps://datainmotion.dev/
Microservices, events, containers, and orchestrators are dominating our vernacular today. As operations teams adapt to support these technologies in production, cloud-native platforms like Cloud Foundry and Kubernetes have quickly risen to serve as force multipliers of automation, productivity and value. Kafka is providing developers a critically important component as they build and modernize applications to cloud-native architecture. This talk will explore: • Why cloud-native platforms and why run Kafka on Kubernetes? • What kind of workloads are best suited for this combination? • Tips to determine the path forward for legacy monoliths in your application portfolio • Running Kafka as a Streaming Platform on Container Orchestration
Tech Talk: Unstructured Data and Vector Databases Speaker: Tim Spann (Zilliz) Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus. Introduction Unstructured data, vector databases, traditional databases, similarity search Vectors Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture Introducing Milvus What drives Milvus' Emergence as the most widely adopted vector database Hi Unstructured Data Friends! I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below. My source code is available here https://github.com/tspannhw/ Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix. Get Milvused! https://milvus.io/ Read my Newsletter every week! https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here https://www.youtube.com/@MilvusVectorDatabase/videos Unstructured Data Meetups - https://www.meetup.com/unstructured-data-meetup-new-york/ https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7 https://www.meetup.com/pro/unstructureddata/ https://zilliz.com/community/unstructured-data-meetup https://zilliz.com/event Twitter/X: https://x.com/milvusio https://x.com/paasdev LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/ GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw Invitation to join Discord: https://discord.com/invite/FjCMmaJng6 Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476 https://www.aicamp.ai/event/eventdetails/W2024062014
Mehul Shah Startup Grind Princeton 18 June 2024 - AI Advancement AI Advancement Infinity Services Inc. - Artificial Intelligence Development Services linkedin icon www.infinity-services.com
Startup Grind Princeton june 18, 2024 GenAI Event
06-18-2024-Princeton Meetup-Introduction to Milvus tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw https://github.com/milvus-io/milvus Get Milvused! https://milvus.io/ Read my Newsletter every week! https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here https://www.youtube.com/@MilvusVectorDatabase/videos Unstructured Data Meetups - https://www.meetup.com/unstructured-data-meetup-new-york/ https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7 https://www.meetup.com/pro/unstructureddata/ https://zilliz.com/community/unstructured-data-meetup https://zilliz.com/event Twitter/X: https://x.com/milvusio https://x.com/paasdev LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/ GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw Invitation to join Discord: https://discord.com/invite/FjCMmaJng6 Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM by Timothy Spann Principal Developer Advocate https://budapestdata.hu/2024/en/ https://budapestml.hu/2024/en/ tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw https://www.youtube.com/@flank-stack milvus vector database gen ai generative ai deep learning machine learning apache nifi apache pulsar apache kafka apache flink
Codeless Generative AI Pipelines (GenAI with Milvus) https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience. Timothy Spann https://www.youtube.com/@FLaNK-Stack https://medium.com/@tspann https://www.datainmotion.dev/ milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus. A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI Discussion on Vector Databases, Unstructured Data and AI https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.