This document provides an overview of a presentation on optimizing TensorFlow models for high performance and production with GPUs. The presentation covers optimizing both TensorFlow model training and model serving. For model training, topics include using GPUs with TensorFlow, feeding and debugging models, distributed training, and optimizing with XLA compiler. For model serving, topics are post-processing, TensorFlow Serving, and Ahead-of-Time compilation. The code and materials from the presentation are available in an open source GitHub repository.
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now.
While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations.
Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU).
Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk.
Bio
Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
http://pipeline.io
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
Bio
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Github Repo
https://github.com/fluxcapacitor/pipeline
Video
https://youtu.be/oNf3I1fVmg8
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
Pipeline.AI is a platform for deploying and optimizing machine learning models at scale. It allows users to package models with their runtime dependencies, perform load testing and optimizations, deploy models to production safely using techniques like canary deployments, and monitor models both offline and online. The platform aims to enable live, continuous model training directly in production environments.
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, Chris will demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
https://github.com/fluxcapacitor/pipeline/gpu.ml
http://pipeline.io
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
Speaker: Umayah Abdennabi
Agenda
* Intro Grammarly (Umayah Abdennabi, 5 mins)
* Meetup Updates and Announcements (Chris, 5 mins)
* Custom Functions in Spark SQL (30 mins)
Speaker: Umayah Abdennabi
Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them.
* TF 2.0 + Keras (30 mins)
Speaker: Francesco Mosconi
Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras
* SQUAD Deep-Dive: Question & Answer with Context (45 mins)
Speaker: Brett Koonce (https://quarkworks.co)
SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy.
https://rajpurkar.github.io/SQuAD-explorer/
https://dawn.cs.stanford.edu/benchmark/#squad
Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.
This document provides an overview of Apache Submarine, an open source unified machine learning platform. It discusses requirements for machine learning in production, including reusable experimentation and model management. It introduces Submarine's architecture and components like the Submarine service, workbench, and runtime connectors. Demos are provided of the Mini Submarine, Zeppelin integration, and Submarine Workbench. Current status and future plans are outlined, and several community use cases are mentioned.
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
The document discusses Apache Hadoop 3.x updates and provides guidance for upgrading to Hadoop 3. It covers community updates, features in YARN, Submarine, HDFS, and Ozone. Release plans are outlined for Hadoop, Submarine, and upgrades from Hadoop 2 to 3. Express upgrades are recommended over rolling upgrades for the major version change. The session summarizes that Hadoop 3 is an eagerly awaited release with many successful production uses, and that now is a good time for those not yet upgraded.
Performance Benchmarking of Clouds Evaluating OpenStack
Pradeep Kumar surisetty presented on performance benchmarking of clouds and evaluating OpenStack. He discussed key cloud characteristics like elasticity and scalability. He then covered various performance measuring tools like Rally, Browbeat, Perfkit Benchmarker, and SPEC Cloud IaaS 2016 benchmark. He also discussed performance monitoring tools like Ceilometer, Collectd/Graphite/Grafana, and Ganglia. Finally, he provided some tuning tips for hardware, instances, over-subscription, local storage, NUMA nodes, disk pinning, and deployment timings.
High performance network programming on the jvm oscon 2012
This document summarizes a talk on high performance network programming on the JVM. The talk discusses choosing between synchronous and asynchronous I/O, with examples of when each approach is best. It also covers how to optimize synchronous I/O on the JVM to maximize throughput. The document provides benchmarks comparing the performance of a simple synchronous memcache client versus an asynchronous one.
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.
One-click Hadoop Cluster Deployment on OpenPOWER Systems
This document describes how to deploy Hadoop clusters on OpenPOWER systems using OpenStack and the Sahara plugin in 3 steps: 1) Setup OpenStack with Sahara on OpenPOWER servers, 2) Create PowerPC images and node group templates in Sahara, 3) Launch and test a Hadoop cluster from the Sahara dashboard. The deployment was tested on IBM S822L servers running PowerKVM with a 500GB Terasort completing in 7000 seconds on 2 data nodes and 1 name node. Upstream contributions were also made to OpenStack to support PowerPC.
Andi Smith provides an overview of setting up an automated workflow for front-end development using Grunt or Gulp. They discuss choosing a task runner, common tasks for setup like concatenation and minification, tasks for development like autoprefixing and live reloading, and tasks for build like image optimization and compression. The presentation emphasizes setting up a workflow that focuses on speeding up the development process and only including necessary tasks.
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following:
- Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues?
- How do I tune my connection pool configuration? What are the optimal settings for my environment ?
- What is my C* cluster capacity in terms of number of IOPs for a given 95th and 99th latency?
- How do I perf-test my data access layer?
In this talk, Vinay Chella, Cloud Data Architect @ Netflix, will share open source tools, techniques and platform(NDBench) that Netflix uses to perf-test their C* fleet with simulations millions of operations per second.
About the Speaker
Vinay Chella Cloud Data Architect, NETFLIX Inc
About Vinay Chella, Cloud Data Architect at Netflix having deeper understanding of Cassandra and other RDBMS. As an Engineer and Architect, working extensively on data modeling, performance tuning and guiding best practices of various persistence stores. Helping various teams @ Netflix building next generation data access layers.
DevoxxUK: Optimizating Application Performance on Kubernetes
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning a polyglot set of microservices to get the performance that you need can be challenging in Kubernetes. The key to overcoming this is observability. Luckily there are a number of tools such as Prometheus that can provide all the metrics you need, but here is the catch, there is so much of data and metrics that is difficult make sense of it all. This is where Hyperparameter tuning can come to the rescue to help build the right models.
This talk covers best practices that will help attendees
1. To understand and avoid common performance related problems.
2. Discuss observability tools and how they can help identify perf issues.
3. Look closer into Kruize Autotune which is a Open Source Autonomous Performance Tuning Tool for Kubernetes and where it can help.
This is an introduction to polyaxon and why I use polyaxon.
Polyaxon enables me to leverage kubernetes to achieve the objectives:
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/227622666/
Title: Spark on Kubernetes
Abstract: Engineers across several organizations are working on support for Kubernetes as a cluster scheduler backend within Spark. While designing this, we have encountered several challenges in translating Spark to use idiomatic Kubernetes constructs natively. This talk is about our high level design decisions and the current state of our work.
Speaker:
Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. His focus is on running stateful and batch workloads. Previously, he worked on GGC (Google Global Cache) and prior to that, on the infrastructure team at NVIDIA."
This document contains the slides from a talk given by Konrad Malawski on the "Tao/Zen of Programming" using Akka. Some of the key points discussed include:
- Actors are meant to work together and each actor should focus on a single responsibility. Having only one actor limits its capabilities.
- Actors should be structured in a hierarchy with parent-child relationships to allow for supervision. Actors should also be named meaningfully based on their purpose.
- Blocking operations can starve other actors by monopolizing shared resources. Blocking code needs to be isolated on dedicated dispatchers.
- Messages should be processed asynchronously using for/flatMap instead of awaiting futures to avoid blocking
This document summarizes advanced Akka features presented by Martin Kanters and Johan Janssen. It covers local and remote actors, scheduling, clustering, routing, cluster singletons, sharding, persistence, Akka HTTP, and finite state machines. The presentation introduces these features and provides examples to illustrate how they can be used with Akka.
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka is the platform for the next generation event-driven, scalable and fault-tolerant architectures on the JVM
We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it's because we are using the wrong tools and the wrong level of abstraction.
Akka is here to change that.
Using the Actor Model together with Software Transactional Memory we raise the abstraction level and provides a better platform to build correct concurrent and scalable applications.
For fault-tolerance we adopt the "Let it crash" / "Embrace failure" model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop.
Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.
Akka is Open Source and available under the Apache 2 License.
The document introduces Akka, an open-source toolkit for building distributed, concurrent applications on the JVM. It provides a programming model called the actor model that makes it easier to build scalable and fault-tolerant systems. Actors process messages asynchronously and avoid shared state, providing a simpler approach to concurrency than traditional threads and locks. Akka allows actors to be distributed across a network, enabling applications to scale out elastically.
This document provides an overview of Konrad Malawski's presentation on reactive stream processing with Akka Streams. The presentation covers Reactive Streams concepts like back pressure, the Reactive Streams specification and protocol, and how Akka Streams implements reactive stream processing using concepts like linear flows, flow graphs, and integration with Akka actors. It also discusses future plans for Akka Streams including API stabilization, improved testability, and potential features like visualizing flow graphs and distributing computation graphs.
Real time Analytics with Apache Kafka and Apache Spark
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications.
When does one use actors vs futures?
Can we use Akka with, or in place of, Storm?
How did we set up instrumentation and monitoring in production?
How does one use VisualVM to debug Akka apps in production?
What happens if the mailbox gets full?
What is our Akka stack like?
I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.
numPYNQ is a hardware library that offers an accelerated version of NumPy core functions to be used transparently from data science applications. It implements these functions on an FPGA to provide better performance, energy efficiency, and flexibility compared to GPUs. Experimental results show speedups for tasks like matrix multiplication and cross-correlation. The library uses runtime input analysis and adaptation to optimize implementations. It has potential in the growing big data market, and the team plans partnerships and a freemium business model to commercialize numPYNQ.
In-Memory Computing Essentials for Architects and Engineers
Slides of IMC Essentials workshop.
The workshop covers fundamental capabilities of in-memory computing platforms that boost high-load applications and services, and bring existing IT architecture to the next level by storing and processing a massive amount of data both in RAM and, optionally, on disk.
The capabilities and benefits of such platforms will be demonstrated with the usage of Apache Ignite, which is the in-memory computing platform that is durable, strongly consistent, and highly available with powerful SQL, key-value and processing APIs.
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
The Linux Security and Isolation APIs have become the basis of some of the most useful features server-side, providing the isolation required for efficient containers. However, these APIs also form the basis of the Chromium Sandbox on Linux, and we will study them in that context.
This presentation goes more in depth on some key points from the NDC (2017) presentation.
Docker networking allows containers to communicate in several ways. Containers can communicate using Docker's default bridge (Docker0), by binding container ports to the host's ports, or using the host's network stack directly. More advanced options include linking containers to share information, using overlay networks with technologies like Open vSwitch, or running containers across multiple hosts with tunnels. The document provides examples of setting up different Docker networking configurations and discusses which methods suit different communication requirements between containers, hosts, and external networks.
Graduating To Go - A Jumpstart into the Go Programming Language
This workshop jumps through a lot of what is covered in the Go Tour. The exercises are new and match more along with the class content, and some pieces (like testing and APIs) are not covered in the Go Tour.
What in the World is Going on at The Linux Foundation?
The Linux Foundation has over 500 corporate members involved in over 70 member-sponsored projects. In 2016, the Linux Foundation convened over 20,000 people from 85 countries and over 4000 companies at 150 events around the world. Over 800,000 students from 215 countries have enrolled in Linux Foundation training programs. Who is driving this growth? Why do companies invest valuable resources in collaborative development? What have we learned along the way?
This document provides a summary of a presentation on using lock-free algorithms to scale shared mutable state on the JVM. It begins with an introduction to the speaker and discusses why shared mutable state is needed for big data and real-time processing. It then uses a toy problem of implementing a concurrent stack to demonstrate the challenges of synchronization and contention. The presentation introduces the use of atomic references and compare-and-set operations to implement lock-free push and pop operations on the concurrent stack in a non-blocking manner, improving scalability.
Communication hardware refers to electric devices and systems for transferring data or information from one place to another. Examples include modems, cables, fax modems, routers, and wireless technologies like infrared, Bluetooth, and Wi-Fi. The document provides details on each type of communication hardware, including what they are and how they function. It also includes multiple choice questions to test understanding of the different hardware.
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
This document discusses challenges and solutions related to window remote shellcode. It outlines challenges posed by antivirus software, EMET, firewalls, and IDS/IPS systems. It then describes various techniques for bypassing these protections, such as encryption, obfuscation, non-standard programming languages, and the use of tools like Meterpreter and Veil Framework payloads. Specific bypass techniques covered include DLL injection, process hollowing, reflective loading, and the use of techniques like one-way shells and HTTP stagers.
Dive deep into an actual enterprise Linux migration by walking through the planning and execution of the process as seen by our customers. Our enterprise architects will break down the key migration steps to explain the available options, decisions made, and demonstrate actions on a live system. This episode gives you a representative migration experience before you actually migrate, illustrating: Side-by-side comparisons between Red Hat Enterprise Linux and CentOS; steps to consider for the operating system; and
steps to consider for common application stacks and packages.
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Abstract:-
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
Bio:-
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Pipeline.AI was also the recent winner of the O'Reilly Media AI Startup Showcase at the AI conference.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance Distributed TensorFlow with GPUs and Kubernetes
In this deck from the Stanford HPC Conference, Chris Fregly from PipelineAI presents: High Performance Distributed TensorFlow with GPUs and Kubernetes.
"Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements with TensorFlow, Kubernetes, OpenFaaS, GPUs, and PipelineAI.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment. This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production. Following the famous Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data. Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong. Learn to be fast and strong by attending this talk!"
Watch the video: https://youtu.be/k4qAKQHakNg
Learn more: https://pipeline.ai/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
* Bio *
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Video Series High Performance TensorFlow in Production.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member of the IBM Spark Technology Center in San Francisco.
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
Miguel Zuniga presented on managing and scaling Puppet. The presentation covered using a Puppet master with a web cluster for scaling, adding caching to reduce load, using source control with Puppet, multi-datacenter configurations, masterless Puppet in the cloud, and future directions including search capabilities and dynamic configurations. Zuniga took questions at the end.
Miguel Zuniga presented on managing and scaling Puppet. The presentation covered Puppet and the Puppetmaster model, scaling Puppet with a web cluster, using caching to reduce load, integrating Puppet with source control management, multi-datacenter configurations, masterless Puppet in the cloud, and future directions for Puppet. Zuniga concluded by taking questions from the audience.
In this deck from the 2018 Swiss HPC Conference, Axel Koehler from NVIDIA presents: The Convergence of HPC and Deep Learning.
"The intersection of AI and HPC is extending the reach of science and accelerating the pace of scientific innovation like never before. The technology originally developed for HPC has enabled deep learning, and deep learning is enabling many usages in science. Deep learning is also helping deliver real-time results with models that used to take days or months to simulate. The presentation will give an overview about the latest hard- and software developments for HPC and Deep Learning from NVIDIA and will show some examples that Deep Learning can be combined with traditional large scale simulations."
Watch the video: https://wp.me/p3RLHQ-ijM
Learn more: http://nvidia.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Containers are everywhere, google/office365 mailboxes, web applications, healthcare booking, aeroplanes, and many more.
Docker containers are everywhere today, our google/office365 mailboxes, our web applications, our access for medical appointments, airplanes, ...
They are everywhere but not always easy to apprehend, and yet, they have much more similarities with our daily jobs than it seems.
During this webinar, I will present you these famous Docker containers, seen by a chef and a car mechanic and you will see that they have a lot in common.
Build, train, and deploy Machine Learning models at scale (May 2018)
The document discusses Amazon SageMaker, a fully managed service that allows users to build, train and deploy machine learning models at scale. It provides pre-built algorithms and frameworks, managed hosting, one-click deployment and hyperparameter tuning capabilities. It also supports bringing your own custom algorithms by allowing users to run their own Docker containers. The document highlights how SageMaker simplifies and automates ML workflows and provides examples of customers using it at scale for image and data analysis.
The document discusses moving a Tomcat cluster to the cloud. It describes how Tomcat uses multicast for session replication in a cluster, but this does not work in the cloud. The solution presented uses the Kubernetes API to discover cluster nodes instead of multicast, allowing session replication to function in OpenShift. The architecture includes a DynamicMembershipService that refreshes the node list from a KubernetesMemberProvider accessing the Kubernetes API. This allows a Tomcat cluster to run in OpenShift with external session replication.
In this deck from the NVIDIA GPU Technology Conference, Axel Koehler presents: Inside the Volta GPU Architecture and CUDA 9.
"The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release."
Watch the video: https://wp.me/p3RLHQ-iB7
Learn more: https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Talk given at the London AICamp meet up on the 13 July 2023. It's an introduction on building open-source ChatGPT-like chat bots and some of the considerations to have while training/tuning them using Airflow.
The document discusses HTTP and web architectures. It begins with introductions from Nicolas Martignole and Quentin Adam. It then provides an overview of HTTP1 including that it is a text-based specification, uses simple requests and responses over TCP connections, and defines verbs like GET, POST, PUT, and DELETE. It discusses techniques for caching like Expires, Pragma, and Cache-Control headers. It also covers ETags for cache validation and content negotiation for serving multiple representations of resources.
This document discusses cloud native development and DevOps using OpenShift Container Platform. It begins by defining cloud native as involving both application architecture and the development, deployment and management processes used. It then discusses how containers evolve application delivery and how container platforms are part of the DevOps tool kit. The document outlines the path to DevOps, emphasizing culture, automation and using the right platform. It also notes that DevOps and containers often go hand in hand, with many DevOps adopters using containers. The document then discusses various capabilities of OpenShift and how it supports cloud native development.
Deep dive into KFServing: Serverless Model Inferencing Platform built on top of KNative and Istio. Part of the Kubeflow project, and deployed in production across organizations.
In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass).
It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints.
As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Understanding the dynamics of GPU utilization and workloads in containerized systems is critical to creating efficient software systems. We create a set of dashboards to monitor and evaluate GPU performance in the context of TensorFlow. We monitor performance in real time to gain insight into GPU load, GPU memory and temperature metrics in a Kubernetes GPU enabled system. Visualizing TensorFlow training job metrics in real time using Prometheus allows us to tune and optimize GPU usage. Also, because Tensor flow jobs can have both GPU and CPU implementations it is useful to view detailed real time performance data from each implementation and choose the best implementation. To illustrate our system, we will show a live demo gathering and visualizing GPU metrics on a GPU enabled Kubernetes cluster with Prometheus and Grafana.
This document discusses Amazon Web Services (AWS) products and services for building end-to-end machine learning and data strategies. It covers topics such as ML infrastructure, governance, data preparation, model training, deployment, and education. Specific services mentioned include Amazon SageMaker, AWS Lake Formation, Amazon Redshift, Amazon EMR, AWS Glue, and AWS services for hardware acceleration like AWS Trainium and AWS Graviton.
Chris Fregly (Principal Solution Architect, AI and machine learning at AWS) will give a brief presentation on the various ways to perform scalable Pandas, Modin, and Ray on AWS. He will then answer questions from the audience and moderator, Alejandro Herrera (whatever he is) at Ponder.
Chris Fregly is a Principal Solution Architect for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is the organizer of the Global Data Science on AWS meetup. He is co-author of the O'Reilly Book, "Data Science on AWS."
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Talk #0: Introductions and Meetup Announcements By Chris Fregly and Antje Barth
Talk #1: Ray Overview, Ray AI Runtime on AWS using Amazon SageMaker, EC2, EMR, EKS by Chris Fregly, Principal Specialist Solution Architect, AI and Machine Learning @ AWS
Talk #2: Deep-dive Blueprints for Amazon Elastic Kubernetes Service (EKS) including Ray and Spark by Apoorva Kulkarni, Sr. Specialist Solution Architect, Containers and Kubernetes @ AWS
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
The document discusses using multi-armed bandit tests to compare natural language models. It describes training BERT models with TensorFlow and PyTorch, and training a multi-armed bandit model with Vowpal Wabbit for reinforcement learning. It then demonstrates testing the BERT models with the bandit model and scaling multi-armed bandits on AWS.
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
Video here: https://youtu.be/YSXe02Y5pHM
NEW RELEASE! Build, Automate, Manage, and Scale ML Workflows with the NEW Amazon SageMaker Pipelines by Hallie Crosby Weishahn.
Description of Talk and Demo
AWS recently announced Amazon SageMaker Pipelines (https://aws.amazon.com/sagemaker/pipelines/), the first purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning.
SageMaker Pipelines has three main components which improve the operational resilience and reproducibility of your workflows: 1) pipelines, 2) model registry, and 3) projects.
In this talk and demo, Hallie will walk us through the new Amazon SageMaker Pipelines feature including MLOps support.
Date/Time
9-10am US Pacific Time (Third Monday of Every Month)
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Meetup:
https://www.meetup.com/Data-Science-on-AWS/
Zoom:
https://zoom.us/j/690414331
Webinar ID: 690 414 331
Phone:
+1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
Related Links
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
O'Reilly Book: https://datascienceonaws.com
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Support: https://support.pipeline.ai
Monthly Workshop: https://www.eventbrite.com/e/full-day-workshop-kubeflow-gpu-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-tickets-63362929227
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
The document discusses Amazon SageMaker Model Monitor and Debugger for monitoring machine learning models in production. SageMaker Model Monitor collects prediction data from endpoints, creates a baseline, and runs scheduled monitoring jobs to detect deviations from the baseline. It generates reports and metrics in CloudWatch. SageMaker Debugger helps debug training issues by capturing debug data with no code changes and providing real-time alerts and visualizations in Studio. Both services help detect model degradation and take corrective actions like retraining.
Quantum Computing with Amazon Braket
In this talk, I describe some fundamental principles of quantum computing including qu-bits, superposition, and entanglement. I will demonstrate how to perform secure quantum computing tasks across many Quantum Processing Units (QPUs) using Amazon Braket, IAM, and S3.
AI and Machine Learning, Quantum Computing, Amazon Braket, QPU
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
In this talk, we present tips and best practices for scaling a large workshop for 1,000's of simultaneous attendees - both online and in-person. While our workshop is focused on AI and machine learning on AWS, we generalize our learnings for any domain or specialization.
The document provides an overview of announcements from Amazon Web Services' annual re:Invent conference in December 2019. Key details include:
- The conference had 65,000 attendees and 3,000 sessions.
- Announcements covered improving the developer experience, compute, storage, AI/ML, databases/analytics, networking, security, and extending AWS beyond regions.
- New services and features were announced for Lambda, API Gateway, Step Functions, EventBridge, Amplify, SageMaker, EC2, EKS, EBS, S3, Rekognition, Lex, Translate, Transcribe, Comprehend, Personalize, Forecast, Fraud Detector, and more.
This document provides an overview and agenda for a workshop on end-to-end machine learning pipelines using TFX, Kubeflow, Airflow and MLflow. The agenda covers setting up an environment with Kubernetes, using TensorFlow Extended (TFX) components to build pipelines, ML pipelines with Airflow and Kubeflow, hyperparameter tuning with Kubeflow, and deploying notebooks with Kubernetes. Hands-on exercises are also provided to explore key areas like TensorFlow Data Validation, TensorFlow Transform, TensorFlow Model Analysis and Airflow ML pipelines.
Title
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
Video
https://youtu.be/vaB4IM6ySD0
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
Related Links
1. PipelineAI Home: https://pipeline.ai
2. PipelineAI Community Edition: http://community.pipeline.ai
3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline
4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
5. YouTube Videos: https://youtube.pipeline.ai
6. SlideShare Presentations: https://slideshare.pipeline.ai
7. Slack Support: https://joinslack.pipeline.ai
8. Web Support and Knowledge Base: https://support.pipeline.ai
9. Email Support: support@pipeline.ai
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
Traditional machine learning pipelines end with life-less models sitting on disk in the research lab. These traditional models are typically trained on stale, offline, historical batch data. Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production. Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered:
* Continuous Validation (V)
* Continuous Optimizing (O)
* Continuous Training (T)
* Continuous Explainability (E).
The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.)
Attendees will learn to create continuous machine learning pipelines in production with PipelineAI, TensorFlow, and Kafka.
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
Perform Online Predictions using Slack
A/B and multi-armed bandit model compare
Train Online Models with Kafka Streams
Create new models quickly
Deploy to production safely
Mirror traffic to validate online performance
Any Framework, Any Hardware, Any Cloud
Dashboard to manage the lifecycle of models from local development to live production
Generates optimized runtimes for the models
Custom targeting rules, shadow mode, and percentage-based rollouts to safely test features in live production
Continuous model training, model validation, and pipeline optimization
https://youtu.be/zpkH9oiIovU
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/258276286/
Related Links
PipelineAI Home: https://pipeline.ai
PipelineAI Community Edition: https://community.pipeline.ai
PipelineAI GitHub: https://github.com/PipelineAI/pipeline
PipelineAI Quick Start: https://quickstart.pipeline.ai
Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
YouTube Videos: https://youtube.pipeline.ai
SlideShare Presentations: https://slideshare.pipeline.ai
Slack Support:
https://joinslack.pipeline.ai
Web Support and Knowledge Base: https://support.pipeline.ai
Email Support: help@pipeline.ai
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
This document discusses distributed deep learning on the MapR Converged Data Platform. It provides an overview of MapR's enterprise big data journey and capabilities for distributed deep learning. It describes using containers and Kubernetes for deep learning model development and deployment, with NVIDIA GPUs for computation. It presents architectures and patterns for separating or collocating MapR and GPU clusters. Finally, it previews demos of parameter server/workers and real-time face detection using streams.
LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time.
It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Los sistemas distribuidos son difíciles. Los sistemas distribuidos de alto rendimiento, más. Latencias de red, mensajes sin confirmación de recibo, reinicios de servidores, fallos de hardware, bugs en el software, releases problemáticas, timeouts... hay un montón de motivos por los que es muy difícil saber si un mensaje que has enviado se ha recibido y procesado correctamente en destino. Así que para asegurar mandas el mensaje otra vez.. y otra... y cruzas los dedos para que el sistema del otro lado tenga tolerancia a los duplicados.
QuestDB es una base de datos open source diseñada para alto rendimiento. Nos queríamos asegurar de poder ofrecer garantías de "exactly once", deduplicando mensajes en tiempo de ingestión. En esta charla, te cuento cómo diseñamos e implementamos la palabra clave DEDUP en QuestDB, permitiendo deduplicar y además permitiendo Upserts en datos en tiempo real, añadiendo solo un 8% de tiempo de proceso, incluso en flujos con millones de inserciones por segundo.
Además, explicaré nuestra arquitectura de log de escrituras (WAL) paralelo y multithread. Por supuesto, todo esto te lo cuento con demos, para que veas cómo funciona en la práctica.
AIRLINE_SATISFACTION_Data Science Solution on Azure
Airline Satisfaction Project using Azure
This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.
### Data Description and Analysis Summary for Presentation
#### 1. **Importing Libraries**
Libraries used:
- `pandas`, `numpy`: Data manipulation
- `matplotlib`, `seaborn`: Data visualization
- `scikit-learn`: Machine learning utilities
- `statsmodels`, `pmdarima`: Statistical modeling
- `keras`: Deep learning models
#### 2. **Loading and Exploring the Dataset**
**Dataset Overview:**
- **Source:** CSV file (`mumbai-monthly-rains.csv`)
- **Columns:**
- `Year`: The year of the recorded data.
- `Jan` to `Dec`: Monthly rainfall data.
- `Total`: Total annual rainfall.
**Initial Data Checks:**
- Displayed first few rows.
- Summary statistics (mean, standard deviation, min, max).
- Checked for missing values.
- Verified data types.
**Visualizations:**
- **Annual Rainfall Time Series:** Trends in annual rainfall over the years.
- **Monthly Rainfall Over Years:** Patterns and variations in monthly rainfall.
- **Yearly Total Rainfall Distribution:** Distribution and frequency of annual rainfall.
- **Box Plots for Monthly Data:** Spread and outliers in monthly rainfall.
- **Correlation Matrix of Monthly Rainfall:** Relationships between different months' rainfall.
#### 3. **Data Transformation**
**Steps:**
- Ensured 'Year' column is of integer type.
- Created a datetime index.
- Converted monthly data to a time series format.
- Created lag features to capture past values.
- Generated rolling statistics (mean, standard deviation) for different window sizes.
- Added seasonal indicators (dummy variables for months).
- Dropped rows with NaN values.
**Result:**
- Transformed dataset with additional features ready for time series analysis.
#### 4. **Data Splitting**
**Procedure:**
- Split the data into features (`X`) and target (`y`).
- Further split into training (80%) and testing (20%) sets without shuffling to preserve time series order.
**Result:**
- Training set: `(X_train, y_train)`
- Testing set: `(X_test, y_test)`
#### 5. **Automated Hyperparameter Tuning**
**Tool Used:** `pmdarima`
- Automatically selected the best parameters for the SARIMA model.
- Evaluated using metrics such as AIC and BIC.
**Output:**
- Best SARIMA model parameters and statistical summary.
#### 6. **SARIMA Model**
**Steps:**
- Fit the SARIMA model using the training data.
- Evaluated on both training and testing sets using MAE and RMSE.
**Output:**
- **Train MAE:** Indicates accuracy on training data.
- **Test MAE:** Indicates accuracy on unseen data.
- **Train RMSE:** Measures average error magnitude on training data.
- **Test RMSE:** Measures average error magnitude on testing data.
#### 7. **LSTM Model**
**Preparation:**
- Reshaped data for LSTM input.
- Converted data to `float32`.
**Model Building and Training:**
- Built an LSTM model with one LSTM layer and one Dense layer.
- Trained the model on the training data.
**Evaluation:**
- Evaluated on both training and testing sets using MAE and RMSE.
**Output:**
- **Train MAE:** Accuracy on training data.
- **T
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Chris Fregly
This document discusses optimizing and profiling TensorFlow models for training and inference on GPUs. It covers optimizing training using GPUs, data pipelines, the XLA JIT compiler, and distributed training. For inference, it discusses optimizing using the XLA AOT compiler, graph transformation tools, and TensorFlow Serving. The talk compares optimization techniques in production settings.
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...Chris Fregly
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Agenda
Spark ML
Tensorflow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
Tensorflow Model Checkpointing, Saving, Exporting, and Importing
Distributed Tensorflow AI Model Training (Distributed Tensorflow)
Centralized Logging and Visualizing of Distributed Tensorflow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (Tensorflow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous Tensorflow AI Model Deployment (Tensorflow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Microsservices using Request Batching and Circuit Breakers (NetflixOSS)
Github Repo
https://github.com/fluxcapacitor/pipeline
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Chris Fregly
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
Bio:
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now.
While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations.
Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU).
Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk.
Bio
Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017Chris Fregly
http://pipeline.io
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
Bio
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Github Repo
https://github.com/fluxcapacitor/pipeline
Video
https://youtu.be/oNf3I1fVmg8
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...Chris Fregly
Pipeline.AI is a platform for deploying and optimizing machine learning models at scale. It allows users to package models with their runtime dependencies, perform load testing and optimizations, deploy models to production safely using techniques like canary deployments, and monitor models both offline and online. The platform aims to enable live, continuous model training directly in production environments.
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, Chris will demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
https://github.com/fluxcapacitor/pipeline/gpu.ml
http://pipeline.io
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
Speaker: Umayah Abdennabi
Agenda
* Intro Grammarly (Umayah Abdennabi, 5 mins)
* Meetup Updates and Announcements (Chris, 5 mins)
* Custom Functions in Spark SQL (30 mins)
Speaker: Umayah Abdennabi
Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them.
* TF 2.0 + Keras (30 mins)
Speaker: Francesco Mosconi
Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras
* SQUAD Deep-Dive: Question & Answer with Context (45 mins)
Speaker: Brett Koonce (https://quarkworks.co)
SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy.
https://rajpurkar.github.io/SQuAD-explorer/
https://dawn.cs.stanford.edu/benchmark/#squad
Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.
Apache Submarine: Unified Machine Learning PlatformWangda Tan
This document provides an overview of Apache Submarine, an open source unified machine learning platform. It discusses requirements for machine learning in production, including reusable experimentation and model management. It introduces Submarine's architecture and components like the Submarine service, workbench, and runtime connectors. Demos are provided of the Mini Submarine, Zeppelin integration, and Submarine Workbench. Current status and future plans are outlined, and several community use cases are mentioned.
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
The document discusses Apache Hadoop 3.x updates and provides guidance for upgrading to Hadoop 3. It covers community updates, features in YARN, Submarine, HDFS, and Ozone. Release plans are outlined for Hadoop, Submarine, and upgrades from Hadoop 2 to 3. Express upgrades are recommended over rolling upgrades for the major version change. The session summarizes that Hadoop 3 is an eagerly awaited release with many successful production uses, and that now is a good time for those not yet upgraded.
Performance Benchmarking of Clouds Evaluating OpenStackPradeep Kumar
Pradeep Kumar surisetty presented on performance benchmarking of clouds and evaluating OpenStack. He discussed key cloud characteristics like elasticity and scalability. He then covered various performance measuring tools like Rally, Browbeat, Perfkit Benchmarker, and SPEC Cloud IaaS 2016 benchmark. He also discussed performance monitoring tools like Ceilometer, Collectd/Graphite/Grafana, and Ganglia. Finally, he provided some tuning tips for hardware, instances, over-subscription, local storage, NUMA nodes, disk pinning, and deployment timings.
High performance network programming on the jvm oscon 2012 Erik Onnen
This document summarizes a talk on high performance network programming on the JVM. The talk discusses choosing between synchronous and asynchronous I/O, with examples of when each approach is best. It also covers how to optimize synchronous I/O on the JVM to maximize throughput. The document provides benchmarks comparing the performance of a simple synchronous memcache client versus an asynchronous one.
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.
One-click Hadoop Cluster Deployment on OpenPOWER SystemsPradeep Kumar
This document describes how to deploy Hadoop clusters on OpenPOWER systems using OpenStack and the Sahara plugin in 3 steps: 1) Setup OpenStack with Sahara on OpenPOWER servers, 2) Create PowerPC images and node group templates in Sahara, 3) Launch and test a Hadoop cluster from the Sahara dashboard. The deployment was tested on IBM S822L servers running PowerKVM with a 500GB Terasort completing in 7000 seconds on 2 data nodes and 1 name node. Upstream contributions were also made to OpenStack to support PowerPC.
Quest for the Perfect Workflow for McrFREDAndi Smith
Andi Smith provides an overview of setting up an automated workflow for front-end development using Grunt or Gulp. They discuss choosing a task runner, common tasks for setup like concatenation and minification, tasks for development like autoprefixing and live reloading, and tasks for build like image optimization and compression. The presentation emphasizes setting up a workflow that focuses on speeding up the development process and only including necessary tasks.
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...DataStax
Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following:
- Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues?
- How do I tune my connection pool configuration? What are the optimal settings for my environment ?
- What is my C* cluster capacity in terms of number of IOPs for a given 95th and 99th latency?
- How do I perf-test my data access layer?
In this talk, Vinay Chella, Cloud Data Architect @ Netflix, will share open source tools, techniques and platform(NDBench) that Netflix uses to perf-test their C* fleet with simulations millions of operations per second.
About the Speaker
Vinay Chella Cloud Data Architect, NETFLIX Inc
About Vinay Chella, Cloud Data Architect at Netflix having deeper understanding of Cassandra and other RDBMS. As an Engineer and Architect, working extensively on data modeling, performance tuning and guiding best practices of various persistence stores. Helping various teams @ Netflix building next generation data access layers.
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning a polyglot set of microservices to get the performance that you need can be challenging in Kubernetes. The key to overcoming this is observability. Luckily there are a number of tools such as Prometheus that can provide all the metrics you need, but here is the catch, there is so much of data and metrics that is difficult make sense of it all. This is where Hyperparameter tuning can come to the rescue to help build the right models.
This talk covers best practices that will help attendees
1. To understand and avoid common performance related problems.
2. Discuss observability tools and how they can help identify perf issues.
3. Look closer into Kruize Autotune which is a Open Source Autonomous Performance Tuning Tool for Kubernetes and where it can help.
This is an introduction to polyaxon and why I use polyaxon.
Polyaxon enables me to leverage kubernetes to achieve the objectives:
- Make the lead time of experiments as short as possible.
- Make the financial cost to train models as cheap as possible.
- Make the experiments reproducible.
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/227622666/
Title: Spark on Kubernetes
Abstract: Engineers across several organizations are working on support for Kubernetes as a cluster scheduler backend within Spark. While designing this, we have encountered several challenges in translating Spark to use idiomatic Kubernetes constructs natively. This talk is about our high level design decisions and the current state of our work.
Speaker:
Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. His focus is on running stateful and batch workloads. Previously, he worked on GGC (Google Global Cache) and prior to that, on the infrastructure team at NVIDIA."
This document contains the slides from a talk given by Konrad Malawski on the "Tao/Zen of Programming" using Akka. Some of the key points discussed include:
- Actors are meant to work together and each actor should focus on a single responsibility. Having only one actor limits its capabilities.
- Actors should be structured in a hierarchy with parent-child relationships to allow for supervision. Actors should also be named meaningfully based on their purpose.
- Blocking operations can starve other actors by monopolizing shared resources. Blocking code needs to be isolated on dedicated dispatchers.
- Messages should be processed asynchronously using for/flatMap instead of awaiting futures to avoid blocking
This document summarizes advanced Akka features presented by Martin Kanters and Johan Janssen. It covers local and remote actors, scheduling, clustering, routing, cluster singletons, sharding, persistence, Akka HTTP, and finite state machines. The presentation introduces these features and provides examples to illustrate how they can be used with Akka.
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Jonas Bonér
Akka is the platform for the next generation event-driven, scalable and fault-tolerant architectures on the JVM
We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it's because we are using the wrong tools and the wrong level of abstraction.
Akka is here to change that.
Using the Actor Model together with Software Transactional Memory we raise the abstraction level and provides a better platform to build correct concurrent and scalable applications.
For fault-tolerance we adopt the "Let it crash" / "Embrace failure" model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop.
Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.
Akka is Open Source and available under the Apache 2 License.
The document introduces Akka, an open-source toolkit for building distributed, concurrent applications on the JVM. It provides a programming model called the actor model that makes it easier to build scalable and fault-tolerant systems. Actors process messages asynchronously and avoid shared state, providing a simpler approach to concurrency than traditional threads and locks. Akka allows actors to be distributed across a network, enabling applications to scale out elastically.
This document provides an overview of Konrad Malawski's presentation on reactive stream processing with Akka Streams. The presentation covers Reactive Streams concepts like back pressure, the Reactive Streams specification and protocol, and how Akka Streams implements reactive stream processing using concepts like linear flows, flow graphs, and integration with Akka actors. It also discusses future plans for Akka Streams including API stabilization, improved testability, and potential features like visualizing flow graphs and distributing computation graphs.
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications.
When does one use actors vs futures?
Can we use Akka with, or in place of, Storm?
How did we set up instrumentation and monitoring in production?
How does one use VisualVM to debug Akka apps in production?
What happens if the mailbox gets full?
What is our Akka stack like?
I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.
numPYNQ is a hardware library that offers an accelerated version of NumPy core functions to be used transparently from data science applications. It implements these functions on an FPGA to provide better performance, energy efficiency, and flexibility compared to GPUs. Experimental results show speedups for tasks like matrix multiplication and cross-correlation. The library uses runtime input analysis and adaptation to optimize implementations. It has potential in the growing big data market, and the team plans partnerships and a freemium business model to commercialize numPYNQ.
In-Memory Computing Essentials for Architects and EngineersDenis Magda
Slides of IMC Essentials workshop.
The workshop covers fundamental capabilities of in-memory computing platforms that boost high-load applications and services, and bring existing IT architecture to the next level by storing and processing a massive amount of data both in RAM and, optionally, on disk.
The capabilities and benefits of such platforms will be demonstrated with the usage of Apache Ignite, which is the in-memory computing platform that is durable, strongly consistent, and highly available with powerful SQL, key-value and processing APIs.
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Patricia Aas
The Linux Security and Isolation APIs have become the basis of some of the most useful features server-side, providing the isolation required for efficient containers. However, these APIs also form the basis of the Chromium Sandbox on Linux, and we will study them in that context.
This presentation goes more in depth on some key points from the NDC (2017) presentation.
Docker networking allows containers to communicate in several ways. Containers can communicate using Docker's default bridge (Docker0), by binding container ports to the host's ports, or using the host's network stack directly. More advanced options include linking containers to share information, using overlay networks with technologies like Open vSwitch, or running containers across multiple hosts with tunnels. The document provides examples of setting up different Docker networking configurations and discusses which methods suit different communication requirements between containers, hosts, and external networks.
Graduating To Go - A Jumpstart into the Go Programming LanguageKaylyn Gibilterra
This workshop jumps through a lot of what is covered in the Go Tour. The exercises are new and match more along with the class content, and some pieces (like testing and APIs) are not covered in the Go Tour.
The Linux Foundation has over 500 corporate members involved in over 70 member-sponsored projects. In 2016, the Linux Foundation convened over 20,000 people from 85 countries and over 4000 companies at 150 events around the world. Over 800,000 students from 215 countries have enrolled in Linux Foundation training programs. Who is driving this growth? Why do companies invest valuable resources in collaborative development? What have we learned along the way?
Scale Up with Lock-Free Algorithms @ JavaOneRoman Elizarov
This document provides a summary of a presentation on using lock-free algorithms to scale shared mutable state on the JVM. It begins with an introduction to the speaker and discusses why shared mutable state is needed for big data and real-time processing. It then uses a toy problem of implementing a concurrent stack to demonstrate the challenges of synchronization and contention. The presentation introduces the use of atomic references and compare-and-set operations to implement lock-free push and pop operations on the concurrent stack in a non-blocking manner, improving scalability.
Communication hardware refers to electric devices and systems for transferring data or information from one place to another. Examples include modems, cables, fax modems, routers, and wireless technologies like infrared, Bluetooth, and Wi-Fi. The document provides details on each type of communication hardware, including what they are and how they function. It also includes multiple choice questions to test understanding of the different hardware.
[若渴計畫] Challenges and Solutions of Window Remote ShellcodeAj MaChInE
This document discusses challenges and solutions related to window remote shellcode. It outlines challenges posed by antivirus software, EMET, firewalls, and IDS/IPS systems. It then describes various techniques for bypassing these protections, such as encryption, obfuscation, non-standard programming languages, and the use of tools like Meterpreter and Veil Framework payloads. Specific bypass techniques covered include DLL injection, process hollowing, reflective loading, and the use of techniques like one-way shells and HTTP stagers.
Dive deep into an actual enterprise Linux migration by walking through the planning and execution of the process as seen by our customers. Our enterprise architects will break down the key migration steps to explain the available options, decisions made, and demonstrate actions on a live system. This episode gives you a representative migration experience before you actually migrate, illustrating: Side-by-side comparisons between Red Hat Enterprise Linux and CentOS; steps to consider for the operating system; and
steps to consider for common application stacks and packages.
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIData Con LA
Abstract:-
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
Bio:-
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Pipeline.AI was also the recent winner of the O'Reilly Media AI Startup Showcase at the AI conference.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance Distributed TensorFlow with GPUs and Kubernetesinside-BigData.com
In this deck from the Stanford HPC Conference, Chris Fregly from PipelineAI presents: High Performance Distributed TensorFlow with GPUs and Kubernetes.
"Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements with TensorFlow, Kubernetes, OpenFaaS, GPUs, and PipelineAI.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment. This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production. Following the famous Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data. Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong. Learn to be fast and strong by attending this talk!"
Watch the video: https://youtu.be/k4qAKQHakNg
Learn more: https://pipeline.ai/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
* Bio *
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Video Series High Performance TensorFlow in Production.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member of the IBM Spark Technology Center in San Francisco.
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
Managing and Scaling Puppet - PuppetConf 2014Puppet
Miguel Zuniga presented on managing and scaling Puppet. The presentation covered using a Puppet master with a web cluster for scaling, adding caching to reduce load, using source control with Puppet, multi-datacenter configurations, masterless Puppet in the cloud, and future directions including search capabilities and dynamic configurations. Zuniga took questions at the end.
Managing and Scaling Puppet - PuppetConf 2014Miguel Zuniga
Miguel Zuniga presented on managing and scaling Puppet. The presentation covered Puppet and the Puppetmaster model, scaling Puppet with a web cluster, using caching to reduce load, integrating Puppet with source control management, multi-datacenter configurations, masterless Puppet in the cloud, and future directions for Puppet. Zuniga concluded by taking questions from the audience.
In this deck from the 2018 Swiss HPC Conference, Axel Koehler from NVIDIA presents: The Convergence of HPC and Deep Learning.
"The intersection of AI and HPC is extending the reach of science and accelerating the pace of scientific innovation like never before. The technology originally developed for HPC has enabled deep learning, and deep learning is enabling many usages in science. Deep learning is also helping deliver real-time results with models that used to take days or months to simulate. The presentation will give an overview about the latest hard- and software developments for HPC and Deep Learning from NVIDIA and will show some examples that Deep Learning can be combined with traditional large scale simulations."
Watch the video: https://wp.me/p3RLHQ-ijM
Learn more: http://nvidia.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Containers explained as for cook and a mecanics Rachid Zarouali
Containers are everywhere, google/office365 mailboxes, web applications, healthcare booking, aeroplanes, and many more.
Docker containers are everywhere today, our google/office365 mailboxes, our web applications, our access for medical appointments, airplanes, ...
They are everywhere but not always easy to apprehend, and yet, they have much more similarities with our daily jobs than it seems.
During this webinar, I will present you these famous Docker containers, seen by a chef and a car mechanic and you will see that they have a lot in common.
Build, train, and deploy Machine Learning models at scale (May 2018)Julien SIMON
The document discusses Amazon SageMaker, a fully managed service that allows users to build, train and deploy machine learning models at scale. It provides pre-built algorithms and frameworks, managed hosting, one-click deployment and hyperparameter tuning capabilities. It also supports bringing your own custom algorithms by allowing users to run their own Docker containers. The document highlights how SageMaker simplifies and automates ML workflows and provides examples of customers using it at scale for image and data analysis.
The document discusses moving a Tomcat cluster to the cloud. It describes how Tomcat uses multicast for session replication in a cluster, but this does not work in the cloud. The solution presented uses the Kubernetes API to discover cluster nodes instead of multicast, allowing session replication to function in OpenShift. The architecture includes a DynamicMembershipService that refreshes the node list from a KubernetesMemberProvider accessing the Kubernetes API. This allows a Tomcat cluster to run in OpenShift with external session replication.
In this deck from the NVIDIA GPU Technology Conference, Axel Koehler presents: Inside the Volta GPU Architecture and CUDA 9.
"The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release."
Watch the video: https://wp.me/p3RLHQ-iB7
Learn more: https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Talk given at the London AICamp meet up on the 13 July 2023. It's an introduction on building open-source ChatGPT-like chat bots and some of the considerations to have while training/tuning them using Airflow.
The document discusses HTTP and web architectures. It begins with introductions from Nicolas Martignole and Quentin Adam. It then provides an overview of HTTP1 including that it is a text-based specification, uses simple requests and responses over TCP connections, and defines verbs like GET, POST, PUT, and DELETE. It discusses techniques for caching like Expires, Pragma, and Cache-Control headers. It also covers ETags for cache validation and content negotiation for serving multiple representations of resources.
Cloud Native Applications on OpenShiftSerhat Dirik
This document discusses cloud native development and DevOps using OpenShift Container Platform. It begins by defining cloud native as involving both application architecture and the development, deployment and management processes used. It then discusses how containers evolve application delivery and how container platforms are part of the DevOps tool kit. The document outlines the path to DevOps, emphasizing culture, automation and using the right platform. It also notes that DevOps and containers often go hand in hand, with many DevOps adopters using containers. The document then discusses various capabilities of OpenShift and how it supports cloud native development.
KFServing - Serverless Model InferencingAnimesh Singh
Deep dive into KFServing: Serverless Model Inferencing Platform built on top of KNative and Istio. Part of the Kubeflow project, and deployed in production across organizations.
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass).
It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints.
As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.
Monitoring of GPU Usage with Tensorflow Models Using PrometheusDatabricks
Understanding the dynamics of GPU utilization and workloads in containerized systems is critical to creating efficient software systems. We create a set of dashboards to monitor and evaluate GPU performance in the context of TensorFlow. We monitor performance in real time to gain insight into GPU load, GPU memory and temperature metrics in a Kubernetes GPU enabled system. Visualizing TensorFlow training job metrics in real time using Prometheus allows us to tune and optimize GPU usage. Also, because Tensor flow jobs can have both GPU and CPU implementations it is useful to view detailed real time performance data from each implementation and choose the best implementation. To illustrate our system, we will show a live demo gathering and visualizing GPU metrics on a GPU enabled Kubernetes cluster with Prometheus and Grafana.
AWS reInvent 2022 reCap AI/ML and DataChris Fregly
This document discusses Amazon Web Services (AWS) products and services for building end-to-end machine learning and data strategies. It covers topics such as ML infrastructure, governance, data preparation, model training, deployment, and education. Specific services mentioned include Amazon SageMaker, AWS Lake Formation, Amazon Redshift, Amazon EMR, AWS Glue, and AWS services for hardware acceleration like AWS Trainium and AWS Graviton.
Pandas on AWS - Let me count the ways.pdfChris Fregly
Chris Fregly (Principal Solution Architect, AI and machine learning at AWS) will give a brief presentation on the various ways to perform scalable Pandas, Modin, and Ray on AWS. He will then answer questions from the audience and moderator, Alejandro Herrera (whatever he is) at Ponder.
Chris Fregly is a Principal Solution Architect for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is the organizer of the Global Data Science on AWS meetup. He is co-author of the O'Reilly Book, "Data Science on AWS."
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupChris Fregly
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Talk #0: Introductions and Meetup Announcements By Chris Fregly and Antje Barth
Talk #1: Ray Overview, Ray AI Runtime on AWS using Amazon SageMaker, EC2, EMR, EKS by Chris Fregly, Principal Specialist Solution Architect, AI and Machine Learning @ AWS
Talk #2: Deep-dive Blueprints for Amazon Elastic Kubernetes Service (EKS) including Ray and Spark by Apoorva Kulkarni, Sr. Specialist Solution Architect, Containers and Kubernetes @ AWS
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedChris Fregly
The document discusses using multi-armed bandit tests to compare natural language models. It describes training BERT models with TensorFlow and PyTorch, and training a multi-armed bandit model with Vowpal Wabbit for reinforcement learning. It then demonstrates testing the BERT models with the bandit model and scaling multi-armed bandits on AWS.
Amazon reInvent 2020 Recap: AI and Machine LearningChris Fregly
Amazon reInvent 2020 Recap: AI and Machine Learning
Video here: https://youtu.be/YSXe02Y5pHM
NEW RELEASE! Build, Automate, Manage, and Scale ML Workflows with the NEW Amazon SageMaker Pipelines by Hallie Crosby Weishahn.
Description of Talk and Demo
AWS recently announced Amazon SageMaker Pipelines (https://aws.amazon.com/sagemaker/pipelines/), the first purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning.
SageMaker Pipelines has three main components which improve the operational resilience and reproducibility of your workflows: 1) pipelines, 2) model registry, and 3) projects.
In this talk and demo, Hallie will walk us through the new Amazon SageMaker Pipelines feature including MLOps support.
Date/Time
9-10am US Pacific Time (Third Monday of Every Month)
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Meetup:
https://www.meetup.com/Data-Science-on-AWS/
Zoom:
https://zoom.us/j/690414331
Webinar ID: 690 414 331
Phone:
+1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
Related Links
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
O'Reilly Book: https://datascienceonaws.com
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Support: https://support.pipeline.ai
Monthly Workshop: https://www.eventbrite.com/e/full-day-workshop-kubeflow-gpu-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-tickets-63362929227
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
The document discusses Amazon SageMaker Model Monitor and Debugger for monitoring machine learning models in production. SageMaker Model Monitor collects prediction data from endpoints, creates a baseline, and runs scheduled monitoring jobs to detect deviations from the baseline. It generates reports and metrics in CloudWatch. SageMaker Debugger helps debug training issues by capturing debug data with no code changes and providing real-time alerts and visualizations in Studio. Both services help detect model degradation and take corrective actions like retraining.
Quantum Computing with Amazon Braket
In this talk, I describe some fundamental principles of quantum computing including qu-bits, superposition, and entanglement. I will demonstrate how to perform secure quantum computing tasks across many Quantum Processing Units (QPUs) using Amazon Braket, IAM, and S3.
AI and Machine Learning, Quantum Computing, Amazon Braket, QPU
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
In this talk, we present tips and best practices for scaling a large workshop for 1,000's of simultaneous attendees - both online and in-person. While our workshop is focused on AI and machine learning on AWS, we generalize our learnings for any domain or specialization.
The document provides an overview of announcements from Amazon Web Services' annual re:Invent conference in December 2019. Key details include:
- The conference had 65,000 attendees and 3,000 sessions.
- Announcements covered improving the developer experience, compute, storage, AI/ML, databases/analytics, networking, security, and extending AWS beyond regions.
- New services and features were announced for Lambda, API Gateway, Step Functions, EventBridge, Amplify, SageMaker, EC2, EKS, EBS, S3, Rekognition, Lex, Translate, Transcribe, Comprehend, Personalize, Forecast, Fraud Detector, and more.
This document provides an overview and agenda for a workshop on end-to-end machine learning pipelines using TFX, Kubeflow, Airflow and MLflow. The agenda covers setting up an environment with Kubernetes, using TensorFlow Extended (TFX) components to build pipelines, ML pipelines with Airflow and Kubeflow, hyperparameter tuning with Kubeflow, and deploying notebooks with Kubernetes. Hands-on exercises are also provided to explore key areas like TensorFlow Data Validation, TensorFlow Transform, TensorFlow Model Analysis and Airflow ML pipelines.
Title
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
Video
https://youtu.be/vaB4IM6ySD0
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
Related Links
1. PipelineAI Home: https://pipeline.ai
2. PipelineAI Community Edition: http://community.pipeline.ai
3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline
4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
5. YouTube Videos: https://youtube.pipeline.ai
6. SlideShare Presentations: https://slideshare.pipeline.ai
7. Slack Support: https://joinslack.pipeline.ai
8. Web Support and Knowledge Base: https://support.pipeline.ai
9. Email Support: support@pipeline.ai
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...Chris Fregly
Traditional machine learning pipelines end with life-less models sitting on disk in the research lab. These traditional models are typically trained on stale, offline, historical batch data. Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production. Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered:
* Continuous Validation (V)
* Continuous Optimizing (O)
* Continuous Training (T)
* Continuous Explainability (E).
The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.)
Attendees will learn to create continuous machine learning pipelines in production with PipelineAI, TensorFlow, and Kafka.
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...Chris Fregly
Perform Online Predictions using Slack
A/B and multi-armed bandit model compare
Train Online Models with Kafka Streams
Create new models quickly
Deploy to production safely
Mirror traffic to validate online performance
Any Framework, Any Hardware, Any Cloud
Dashboard to manage the lifecycle of models from local development to live production
Generates optimized runtimes for the models
Custom targeting rules, shadow mode, and percentage-based rollouts to safely test features in live production
Continuous model training, model validation, and pipeline optimization
https://youtu.be/zpkH9oiIovU
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/258276286/
Related Links
PipelineAI Home: https://pipeline.ai
PipelineAI Community Edition: https://community.pipeline.ai
PipelineAI GitHub: https://github.com/PipelineAI/pipeline
PipelineAI Quick Start: https://quickstart.pipeline.ai
Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
YouTube Videos: https://youtube.pipeline.ai
SlideShare Presentations: https://slideshare.pipeline.ai
Slack Support:
https://joinslack.pipeline.ai
Web Support and Knowledge Base: https://support.pipeline.ai
Email Support: help@pipeline.ai
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
This document discusses distributed deep learning on the MapR Converged Data Platform. It provides an overview of MapR's enterprise big data journey and capabilities for distributed deep learning. It describes using containers and Kubernetes for deep learning model development and deployment, with NVIDIA GPUs for computation. It presents architectures and patterns for separating or collocating MapR and GPU clusters. Finally, it previews demos of parameter server/workers and real-time face detection using streams.
LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time.
It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...javier ramirez
Los sistemas distribuidos son difíciles. Los sistemas distribuidos de alto rendimiento, más. Latencias de red, mensajes sin confirmación de recibo, reinicios de servidores, fallos de hardware, bugs en el software, releases problemáticas, timeouts... hay un montón de motivos por los que es muy difícil saber si un mensaje que has enviado se ha recibido y procesado correctamente en destino. Así que para asegurar mandas el mensaje otra vez.. y otra... y cruzas los dedos para que el sistema del otro lado tenga tolerancia a los duplicados.
QuestDB es una base de datos open source diseñada para alto rendimiento. Nos queríamos asegurar de poder ofrecer garantías de "exactly once", deduplicando mensajes en tiempo de ingestión. En esta charla, te cuento cómo diseñamos e implementamos la palabra clave DEDUP en QuestDB, permitiendo deduplicar y además permitiendo Upserts en datos en tiempo real, añadiendo solo un 8% de tiempo de proceso, incluso en flujos con millones de inserciones por segundo.
Además, explicaré nuestra arquitectura de log de escrituras (WAL) paralelo y multithread. Por supuesto, todo esto te lo cuento con demos, para que veas cómo funciona en la práctica.
Airline Satisfaction Project using Azure
This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.
### Data Description and Analysis Summary for Presentation
#### 1. **Importing Libraries**
Libraries used:
- `pandas`, `numpy`: Data manipulation
- `matplotlib`, `seaborn`: Data visualization
- `scikit-learn`: Machine learning utilities
- `statsmodels`, `pmdarima`: Statistical modeling
- `keras`: Deep learning models
#### 2. **Loading and Exploring the Dataset**
**Dataset Overview:**
- **Source:** CSV file (`mumbai-monthly-rains.csv`)
- **Columns:**
- `Year`: The year of the recorded data.
- `Jan` to `Dec`: Monthly rainfall data.
- `Total`: Total annual rainfall.
**Initial Data Checks:**
- Displayed first few rows.
- Summary statistics (mean, standard deviation, min, max).
- Checked for missing values.
- Verified data types.
**Visualizations:**
- **Annual Rainfall Time Series:** Trends in annual rainfall over the years.
- **Monthly Rainfall Over Years:** Patterns and variations in monthly rainfall.
- **Yearly Total Rainfall Distribution:** Distribution and frequency of annual rainfall.
- **Box Plots for Monthly Data:** Spread and outliers in monthly rainfall.
- **Correlation Matrix of Monthly Rainfall:** Relationships between different months' rainfall.
#### 3. **Data Transformation**
**Steps:**
- Ensured 'Year' column is of integer type.
- Created a datetime index.
- Converted monthly data to a time series format.
- Created lag features to capture past values.
- Generated rolling statistics (mean, standard deviation) for different window sizes.
- Added seasonal indicators (dummy variables for months).
- Dropped rows with NaN values.
**Result:**
- Transformed dataset with additional features ready for time series analysis.
#### 4. **Data Splitting**
**Procedure:**
- Split the data into features (`X`) and target (`y`).
- Further split into training (80%) and testing (20%) sets without shuffling to preserve time series order.
**Result:**
- Training set: `(X_train, y_train)`
- Testing set: `(X_test, y_test)`
#### 5. **Automated Hyperparameter Tuning**
**Tool Used:** `pmdarima`
- Automatically selected the best parameters for the SARIMA model.
- Evaluated using metrics such as AIC and BIC.
**Output:**
- Best SARIMA model parameters and statistical summary.
#### 6. **SARIMA Model**
**Steps:**
- Fit the SARIMA model using the training data.
- Evaluated on both training and testing sets using MAE and RMSE.
**Output:**
- **Train MAE:** Indicates accuracy on training data.
- **Test MAE:** Indicates accuracy on unseen data.
- **Train RMSE:** Measures average error magnitude on training data.
- **Test RMSE:** Measures average error magnitude on testing data.
#### 7. **LSTM Model**
**Preparation:**
- Reshaped data for LSTM input.
- Converted data to `float32`.
**Model Building and Training:**
- Built an LSTM model with one LSTM layer and one Dense layer.
- Trained the model on the training data.
**Evaluation:**
- Evaluated on both training and testing sets using MAE and RMSE.
**Output:**
- **Train MAE:** Accuracy on training data.
- **T
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 2017
1. HIGH PERFORMANCE TENSORFLOW IN
PRODUCTION WITH GPUS
CHRIS FREGLY, FOUNDER @PIPELINE.AI
BIG DATA SPAIN, MADRID - NOV 15, 2017
I LOVE THIS CONFERENCE!!
2. INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @ PipelineAI
§ Formerly Netflix and Databricks
§ Advanced Spark and TensorFlow Meetup
Please Join Our 50,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
3. INTRODUCTIONS: YOU
§ Software Engineer, Data Scientist, Data Engineer, Data Analyst
§ Interested in Optimizing and Deploying TF Models to Production
§ Nice to Have a Working Knowledge of TensorFlow (Not Required)
4. CONTENT BREAKDOWN
50% Model Training Optimizations
(GPU, Ingestion Pipeline, JIT)
Boring & Batch
Offline in Research Lab
No Real-Time Serving Skills
Pipeline Stops at Training
No Feedback with Runtime
Small Number of Data Scientists
10’s Training Jobs / Day
Exciting & Real-Time!!
Online in Live Production
Unique Real-Time Serving Skills
Pipeline Extends into Production
Continuous Feedback with Training
Large Number of App Devs & Users
1,000,000’s Predictions / Sec<<<
50% Model Serving Optimizations
(Post-Processing, TF Serving, AOT)
5. AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
6. 100% OPEN SOURCE CODE
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
7. HANDS-ON EXERCISES
§ Combo of Jupyter Notebooks and Command Line
§ Command Line through Jupyter Terminal
§ Some Exercises Based on Experimental Features
You May See Errors. Stay Calm. You Will Be OK!!
9. AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
10. PACKAGE MODEL + RUNTIME AS ONE
§ Package Model + Runtime into Immutable Docker Image
§ Same Environment: Local, Dev, and Prod
§ No Dependency Surprises in Production
§ Deploy and Tune Model + Runtime Together
pipeline predict-server-build --model-type=tensorflow
--model-name=mnist
--model-tag=”c”
--model-path=./models/tensorflow/mnist/
Package Model
Server C Locally
pipeline predict-server-push --model-type=tensorflow
--model-name=mnist
--model-tag=”c”
Push Image C To
Docker Registry
11. TUNE MODEL + RUNTIME TOGETHER
§ Try Different Model Hyper-Parameters + Runtime Configs
§ Even Different Runtimes: TF Serving, TensorRT
§ Auto-Quantize Model Weights + Activations
§ Auto-Fuse Neural Network Layers Together
§ Generate Native CPU + GPU Code
pipeline predict-server-start --model-type=tensorflow
--model-name=mnist
--model-tag=”c"
Start Model
Server C Locally
12. LOAD TEST MODEL + RUNTIME LOCALLY
§ Perform Mini-Load Test on Local Model Server
§ Provides Immediate Feedback on Prediction Performance
§ Relative Performance Compared to Other Variations
§ No Need to Deploy to Test or Prod for Prediction Metrics
§ See Where Time is Being Spent During Prediction
pipeline predict --model-server-url=http://localhost:6969
--model-type=tensorflow
--model-name=mnist
--model-tag=”c”
--test-request-concurrency=1000
Load Test Model
Server C Locally
13. RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ Similar to TF Graph Transform Tool
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
14. AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
15. DEPLOY MODELS SAFELY TO PROD
§ Deploy from Jupyter Notebook in 1-Click
§ Deploy to 1-2% Split or Shadowed Traffic
§ Tear-Down or Rollback Quickly
§ Use Command Line Interface (CLI)
pipeline predict-cluster-start --model-type=tensorflow
--model-name=mnist
--model-tag=”b”
--traffic-split=“0.02”
Start Model
Cluster B in Prod
pipeline predict-cluster-start --model-type=tensorflow
--model-name=mnist
--model-tag=”c”
--traffic-split=“0.01”
Start Model
Cluster C in Prod
pipeline predict-cluster-start --model-type=tensorflow
--model-name=mnist
--model-tag=”a”
--traffic-split=“0.97”
Start Model
Cluster A in Prod
Implementation Details…
16. DEPLOY EXPERIMENTS SAFELY TO PROD
§ Create Experiments Directly from Jupyter or Command Line
§ Deploy Experiment
pipeline experiment-add --experiment-name=my_experiment
--model-type=tensorflow
--model-name=mnist
--model-tag=“a”
--traffic-split=“97%”
CLI
Drag
n’ Drop
pipeline experiment-start --experiment-name=my_experiment
--traffic-shadow=“20%”
pipeline experiment-add --experiment-name=my_experiment
--model-type=tensorflow
--model-name=mnist
--model-tag=“b”
--traffic-split=“2%”
pipeline experiment-add --experiment-name=my_experiment
--model-type=tensorflow
--model-name=mnist
--model-tag=“c”
--traffic-split=“1%”
1-Click
Start Experiment
with 20% Shadowed
of Production Traffic
17. AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
18. COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation Accuracy
§ Training Accuracy
§ CPU/GPU Utilization
§ Live Prediction Values
§ Compare Model Precision
§ Online, Real-Time Metrics
§ Response Time & Throughput
§ Cost Per Prediction
21. CONTINUOUS MODEL TRAINING
§ Identify and Fix Borderline Predictions (~50-50% Confidence)
§ Fix Along Class Boundaries
§ Retrain on New Labeled Data
§ Game-ify Labeling Process
§ Enables Crowd Sourcing
22. AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
23. SHIFT TRAFFIC TO MAX(REVENUE)
§ Shift Traffic to Winning Model using AI Bandit Algorithms
Implementation Details…
24. SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Across Clouds & On-Premise
§ Real-Time Cost Per Prediction
§ Bandit-based Explore/Exploit
25. AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
26. AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
28. SETUP ENVIRONMENT
§ Step 1: Browse to the following:
http://allocator.community.pipeline.ai/allocate
§ Step 2: Browse to the following:
http://<ip-address>
§ Step 3: Browse around.
I will provide a Jupyter Username/Password soon.
Need Help?
Use the Chat!
30. LET’S EXPLORE OUR ENVIRONMENT
§ Navigate to the following notebook:
01_Explore_Environment
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
32. BREAK
§ Please 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Need Help?
Use the Chat!
33. SETTING UP TENSORFLOW WITH GPUS
§ Very Painful!
§ Especially inside Docker
§ Use nvidia-docker
§ Especially on Kubernetes!
§ Use Kubernetes 1.8+
§ http://pipeline.ai for GitHub + DockerHub Links
35. GPU HALF-PRECISION SUPPORT
§ FP32 is “Full Precision”, FP16 is “Half Precision”
§ Supported by Pascal P100 (2016) and Volta V100 (2017)
§ Two(2) FP16’s in Each FP32 GPU Core for 2x Throughput!
§ Half-Precision is OK for Approximate Deep Learning Use Cases
You Can Set
TF_FP16_MATMUL_USE_FP32_COMPUTE=0
on GPU w/ Compute Capability(CC) 5.3+
36. VOLTA V100 (2017) VS. PASCAL P100 (2016)
§ 84 Streaming Multiprocessors (SM’s)
§ 5,376 GPU Cores
§ 672 Tensor Cores (ie. Google TPU)
§ Mixed FP16/FP32 Precision
§ Matrix Dims Should be Multiples of 8
§ More Shared Memory
§ New L0 Instruction Cache
§ Faster L1 Data Cache
§ V100 vs. P100 Performance
§ 12x Training, 6x Inference
37. FP32 VS. FP16 ON AWS GPU INSTANCES
FP16 Half Precision
87.2 T ops/second for p3 Volta V100
4.1 T ops/second for g3 Tesla M60
1.6 T ops/second for p2 Tesla K80
FP32 Full Precision
15.4 T ops/second for p3 Volta V100
4.0 T ops/second for g3 Tesla M60
3.3 T ops/second for p2 Tesla K80
38. § Currently Supports the Following:
§ Tesla K80
§ Pascal P100
§ TPUs
§ Attach GPUs to CPU Instances
§ Similar to AWS Elastic GPU, except less confusing
WHAT ABOUT GOOGLE CLOUD GPUS?
39. V100 AND CUDA 9
§ Independent Thread Scheduling - Finally!!
§ Similar to CPU fine-grained thread synchronization semantics
§ Allows GPU to yield execution of any thread
§ Still Optimized for SIMT (Same Instruction Multiple Thread)
§ SIMT units automatically scheduled together
§ Explicit Synchronization
P100 V100
40. GPU CUDA PROGRAMMING
§ Barbaric, But Fun Barbaric
§ Must Know Hardware Very Well
§ Hardware Changes are Painful
§ Use the Profilers & Debuggers
41. CUDA STREAMS
§ Asynchronous I/O Transfer
§ Overlap Compute and I/O
§ Keeps GPUs Saturated
§ Fundamental to Queue Framework in TensorFlow
42. LET’S SEE WHAT THIS THING CAN DO!
§ Navigate to the following notebook:
01a_Explore_GPU
01b_Explore_Numba
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
43. AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
44. TRAINING TERMINOLOGY
§ Tensors: N-Dimensional Arrays
§ ie. Scalar, Vector, Matrix
§ Operations: MatMul, Add, SummaryLog,…
§ Graph: Graph of Operations (DAG)
§ Session: Contains Graph(s)
§ Feeds: Feed Inputs into Placeholder
§ Fetches: Fetch Output from Operation
§ Variables: What We Learn Through Training
§ aka “Weights”, “Parameters”
§ Devices: Hardware Device (GPU, CPU, TPU, ...)
-TensorFlow-
Trains
Variables
-User-
Fetches
Outputs
-User-
Feeds
Inputs
-TensorFlow-
Performs
Operations
-TensorFlow-
Flows
Tensors
with tf.device(“/cpu:0,/gpu:15”):
46. TENSORFLOW GRAPH EXECUTION
§ Lazy Execution by Default
§ Similar to Spark
§ Eager Execution Now Supported (TensorFlow 1.4)
§ Similar to PyTorch
§ "Linearize” Execution to Minimize RAM Usage
§ Useful on Single GPU with Limited RAM
47. TENSORFLOW MODEL
§ MetaGraph
§ Combines GraphDef and Metadata
§ GraphDef
§ Architecture of your model (nodes, edges)
§ Metadata
§ Asset: Accompanying assets to your model
§ SignatureDef: Maps external : internal tensors
§ Variables
§ Stored separately during training (checkpoint)
§ Allows training to continue from any checkpoint
§ Variables are “frozen” into Constants when preparing for inference
GraphDef
x
W
mul add
b
MetaGraph
Metadata
Assets
SignatureDef
Tags
Version
Variables:
“W” : 0.328
“b” : -1.407
48. BATCH NORMALIZATION (2015)
§ Each Mini-Batch May Have Wildly Different Distributions
§ Normalize per Batch (and Layer)
§ Faster Training, Learns Quicker
§ Final Model is More Accurate
§ TensorFlow is already on 2nd Generation Batch Algorithm
§ First-Class Support for Fusing Batch Norm Layers
§ Final mean + variance Are Folded Into Our Graph Later
-- (Almost)Always Use Batch Normalization! --
z = tf.matmul(a_prev, W)
a = tf.nn.relu(z)
a_mean, a_var = tf.nn.moments(a, [0])
scale = tf.Variable(tf.ones([depth/channels]))
beta = tf.Variable(tf.zeros ([depth/channels]))
bn = tf.nn.batch_normalizaton(a, a_mean, a_var,
beta, scale, 0.001)
49. DROPOUT (2014)
§ Training Technique
§ Prevents Overfitting
§ Helps Avoid Local Minima
§ Inherent Ensembling Technique
§ Creates and Combines Different Neural Architectures
§ Expressed as Probability Percentage (ie. 50%)
§ Boost Other Weights During Validation & Prediction
Perform Dropout
(Training Phase)
Boost for Dropout
(Validation & Prediction Phase)
0%
Dropout
50%
Dropout
52. FEED TENSORFLOW TRAINING PIPELINE
§ Training is Almost Always Limited by Ingestion Pipeline
§ THE Number One Problem We See Today
§ Scaling GPUs Up / Out Doesn’t Help
§ GPUs are Heavily Under-Utilized
Tesla K80 Volta V100
53. DON’T USE FEED_DICT!!
§ feed_dict Requires Python <-> C++ Serialization
§ Not Optimized for Production Ingestion Pipelines
§ Retrieves Next Batch After Current Batch is Done
§ Single-Threaded, Synchronous
§ CPUs/GPUs Not Fully Utilized!
§ Use Queue or Dataset APIs
§ Queues are old and complex
sess.run(train_step, feed_dict={…}
54. DETECT UNDERUTILIZED CPUS, GPUS
§ Instrument training code to generate “timelines”
§ Analyze with Google Web
Tracing Framework (WTF)
§ Monitor CPU with top, GPU with nvidia-smi
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
55. QUEUES
§ More than traditional Queue
§ Uses CUDA Streams
§ Perform I/O, pre-processing, cropping, shuffling, …
§ Pull from HDFS, S3, Google Storage, Kafka, ...
§ Combine many small files into large TFRecord files
§ Use CPUs to free GPUs for compute
§ Helps saturate CPUs and GPUs
56. QUEUE CAPACITY PLANNING
§ batch_size
§ # examples / batch (ie. 64 jpg)
§ Limited by GPU RAM
§ num_processing_threads
§ CPU threads pull and pre-process batches of data
§ Limited by CPU Cores
§ queue_capacity
§ Limited by CPU RAM (ie. 5 * batch_size)
57. DATASET API
§ tf.Tensor => tf.data.Dataset
§ Functional Transformations
§ Python Generator => tf.data.Dataset
Dataset.from_tensors((features, labels))
Dataset.from_tensor_slices((features, labels))
TextLineDataset(filenames)
dataset.map(lambda x: tf.decode_jpeg(x))
dataset.repeat(NUM_EPOCHS)
dataset.batch(BATCH_SIZE)
def generator():
while True:
yield ...
dataset.from_generator(generator, tf.int32)
§ Dataset => One-Shot Iterator
§ Dataset => Initializable Iter
iter = dataset.make_one_shot_iterator()
next_element = iter.get_next()
while …:
sess.run(next_element)
iter = dataset.make_initializable_iterator()
sess.run(iter.initializer,
feed_dict=PARAMS)
next_element = iter.get_next()
while …:
sess.run(next_element)
58. FUTURE OF DATASET API
§ Advanced, RL-based Device Placement Strategies
§ Automatic GPU Data Staging
§ More Functional Operators
59. LET’S FEED DATA WITH A QUEUE
§ Navigate to the following notebook:
02_Feed_Queue_HDFS
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
61. BREAK
§ Please 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Need Help?
Use the Chat!
62. LET’S TRAIN A MODEL (CPU)
§ Navigate to the following notebook:
03_Train_Model_CPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
63. LET’S TRAIN A MODEL (GPU)
§ Navigate to the following notebook:
03a_Train_Model_GPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
64. TENSORFLOW DEBUGGER
§ Step through Operations
§ Inspect Inputs and Outputs
§ Wrap Session in Debug Session
sess = tf.Session(config=config)
sess =
tf_debug.LocalCLIDebugWrapperSession(sess)
65. LET’S DEBUG A MODEL
§ Navigate to the following notebook:
04_Debug_Model
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
66. AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
67. SINGLE NODE, MULTI-GPU TRAINING
§ cpu:0
§ By default, all CPUs
§ Requires extra config to target a CPU
§ gpu:0..n
§ Each GPU has a unique id
§ TF usually prefers a single GPU
§ xla_cpu:0, xla_gpu:0..n
§ “JIT Compiler Device”
§ Hints TensorFlow to attempt JIT Compile
with tf.device(“/cpu:0”):
with tf.device(“/gpu:0”):
with tf.device(“/gpu:1”):
GPU 0 GPU 1
68. DISTRIBUTED, MULTI-NODE TRAINING
§ TensorFlow Automatically Inserts Send and Receive Ops into Graph
§ Parameter Server Synchronously Aggregates Updates to Variables
§ Nodes with Multiple GPUs will Pre-Aggregate Before Sending to PS
Worker0 Worker0
Worker1
Worker0 Worker1 Worker2
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0
gpu1
gpu0
gpu0
Single
Node
Multiple
Nodes
69. DATA PARALLEL VS MODEL PARALLEL
§ Data Parallel (“Between-Graph Replication”)
§ Send exact same model to each device
§ Each device operates on partition of data
§ ie. Spark sends same function to many workers
§ Each worker operates on their partition of data
§ Model Parallel (“In-Graph Replication”)
§ Send different partition of model to each device
§ Each device operates on all data
§ Difficult, but required for larger models with lower-memory GPUs
70. SYNCHRONOUS VS. ASYNCHRONOUS
§ Synchronous
§ Nodes compute gradients
§ Nodes update Parameter Server (PS)
§ Nodes sync on PS for latest gradients
§ Asynchronous
§ Some nodes delay in computing gradients
§ Nodes don’t update PS
§ Nodes get stale gradients from PS
§ May not converge due to stale reads!
71. CHIEF WORKER
§ Chief Defaults to Worker Task 0
§ Task 0 is guaranteed to exist
§ Performs Maintenance Tasks
§ Writes log summaries
§ Instructs PS to checkpoint vars
§ Performs PS health checks
§ (Re-)Initialize variables at (re-)start of training
72. NODE AND PROCESS FAILURES
§ Checkpoint to Persistent Storage (HDFS, S3)
§ Use MonitoredTrainingSession and Hooks
§ Use a Good Cluster Orchestrator (ie. Kubernetes,Mesos)
§ Understand Failure Modes and Recovery States
Stateless, Not Bad: Training Continues Stateful, Bad: Training Must Stop Dios Mio! Long Night Ahead…
73. ESTIMATOR, EXPERIMENT API
§ Simplify Model Building
§ Provide Clear Path to Production
§ Enable Rapid Model Experiments
§ Provide Flexible Parameter Tuning
§ Enable Downstream Optimizing & Serving Infra( )
§ Nudge Users to Best Practices Through Opinions
§ Provide Hooks/Callbacks to Override Opinions
§ Unified API for Local and Distributed TensorFlow
74. ESTIMATOR API
§ “Train-to-Serve” Design
§ Create Custom - or Use a Canned Estimator
§ Hides Session, Graph, Layers, Iterative Loops (Train, Eval, Predict)
§ Hooks for All Phases of Model Training and Evaluation
§ Load Input: input_fn()
§ Train: model_fn() and train()
§ Evaluate: evaluate()
§ Save and Export: export_savedmodel()
§ Predict: predict() Uses sess.run() Slow Predictions!
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/customestimator/
75. CANNED ESTIMATORS
§ Commonly-Used Estimators
§ Pre-Tested and Pre-Tuned
§ DNNClassifer, TensorForestEstimator
§ Always Use Canned Estimators If Possible
§ Reduce Lines of Code, Complexity, and Bugs
§ Use FeatureColumns to Define & Create Features
Custom vs. Canned
@ Google, August, 2017
76. COMBINE ESTIMATOR + DATASET API
def input_fn():
def generator():
while True:
yield ...
my_dataset = tf.data.dataset.from_generator(generator, tf.int32)
# A one-shot iterator automatically initializes itself on first use.
iter = my_dataset.make_one_shot_iterator()
# The return value of get_next() matches the dataset element type.
images, labels = iter.get_next()
return images, labels
# The input_fn can be used as a regular Estimator input function.
estimator = tf.estimator.Estimator(…)
estimator.train(train_input_fn=input_fn, …)
77. FEATURECOLUMN ABSTRACTION
§ Used by Canned Estimator
§ Simplifies Input Ingestion
§ Declarative Way to Specify Model Training Inputs
§ Converts Sparse Features to Dense Tensors
§ Sparse Features: Query Keyword, Url, ProductID,…
§ Wide/Linear Models Use Feature-Crossing
§ Deep Models Use Embeddings
78. SINGLE VS. MULTI-OBJECTIVES + HEADS
§ Single-Objective Estimator
§ Single classification prediction
§ Multi-Objective Estimator
§ Two (2) classification predictions
§ One (1) classification prediction + One(1) final layer
§ Multiple Heads Are Used to Ensemble Models
§ Treats neural network as a feature engineering step!
§ Supported by TensorFlow Serving
79. LAYERS API
§ Standalone Layer or Entire Sub-Graphs
§ Functions of Tensor Inputs & Outputs
§ Mix and Match with Operations
§ Assumes 1st Dimension is Batch Size
§ Handles One (1) to Many (*) Inputs
§ Metrics are Layers
§ Loss Metric (Per Mini-Batch)
§ Accuracy and MSE (Across Mini-Batches)
80. EXPERIMENT API
§ Easier-to-Use Distributed TensorFlow
§ Same API for Local and Distributed (*Theoretically)
§ Combines Estimator with input_fn()
§ Used for Training, Evaluation, & Hyper-Parameter Tuning
§ Distributed Training Defaults to Data-Parallel & Async
§ Cluster Configuration is Fixed at Start of Training Job
§ No Auto-Scaling Allowed!!
81. ESTIMATOR, EXPERIMENT CONFIGS
§ TF_CONFIG
§ Special environment variable for config
§ Defines ClusterSpec in JSON incl. master, workers, PS’s
§ Distributed mode ‘{“environment”:“cloud”}’
§ Local: ‘{environment”:“local”, {“task”:{”type”:”worker”}}’
§ RunConfig: Defines checkpoint interval, output directory,
§ HParams: Hyper-parameter tuning parameters and ranges
§ learn_runner creates RunConfig before calling run() & tune()
§ schedule is set based on {”task”:{”type”}}
TF_CONFIG=
'{
"environment": "cloud",
"cluster":
{
"master":["worker0:2222”],
"worker":["worker1:2222"],
"ps": ["ps0:2222"]
},
"task": {"type": "ps",
"index": "0"}
}'
83. SEPARATE TRAINING + EVALUATION
§ Separate Training and Evaluation Clusters
§ Evaluate Upon Checkpoint
§ Avoid Resource Contention
§ Let Training Continue in Parallel with Evaluation
Training
Cluster
Evaluation
Cluster
Parameter Server
Cluster
84. LET’S TRAIN DISTRIBUTED TENSORFLOW
§ Navigate to the following notebook:
05_Train_Model_Distributed_CPU
or 05a_Train_Model_Distributed_GPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
86. BREAK
§ Please 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Need Help?
Use the Chat!
87. AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
88. XLA FRAMEWORK
§ XLA: “Accelerated Linear Algebra”
§ Reduce Reliance on Custom Operators
§ Improve Execution Speed
§ Improve Memory Usage
§ Reduce Mobile Footprint
§ Improve Portability
Helps TensorFlow Stay Flexible, Yet Still Performant
89. XLA HIGH LEVEL OPTIMIZER (HLO)
§ HLO: “High Level Optimizer”
§ Compiler Intermediate Representation (IR)
§ Independent of source and target language
§ XLA Step 1 Emits Target-Independent HLO
§ XLA Step 2 Emits Target-Dependent LLVM
§ LLVM Emits Native Code Specific to Target
§ Supports x86-64, ARM64 (CPU), and NVPTX (GPU)
90. JIT COMPILER
§ JIT: “Just-In-Time” Compiler
§ Built on XLA Framework
§ Reduce Memory Movement – Especially with GPUs
§ Reduce Overhead of Multiple Function Calls
§ Similar to Spark Operator Fusing in Spark 2.0
§ Unroll Loops, Fuse Operators, Fold Constants, …
§ Scopes: session, device, `with jit_scope():`
91. VISUALIZING JIT COMPILER IN ACTION
Before JIT After JIT
Google Web Tracing Framework:
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(options=run_options,
run_metadata=run_metadata)
93. LET’S TRAIN WITH XLA CPU
§ Navigate to the following notebook:
06_Train_Model_XLA_CPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
94. LET’S TRAIN WITH XLA GPU
§ Navigate to the following notebook:
06a_Train_Model_XLA_GPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
95. AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
96. AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
97. AOT COMPILER
§ Standalone, Ahead-Of-Time (AOT) Compiler
§ Built on XLA framework
§ tfcompile
§ Creates executable with minimal TensorFlow Runtime needed
§ Includes only dependencies needed by subgraph computation
§ Creates functions with feeds (inputs) and fetches (outputs)
§ Packaged as cc_libary header and object files to link into your app
§ Commonly used for mobile device inference graph
§ Currently, only CPU x86-64 and ARM are supported - no GPU
98. GRAPH TRANSFORM TOOL (GTT)
§ Post-Training Optimization to Prepare for Inference
§ Remove Training-only Ops (checkpoint, drop out, logs)
§ Remove Unreachable Nodes between Given feed -> fetch
§ Fuse Adjacent Operators to Improve Memory Bandwidth
§ Fold Final Batch Norm mean and variance into Variables
§ Round Weights/Variables to improve compression (ie. 70%)
§ Quantize (FP32 -> INT8) to Speed Up Math Operations
101. AFTER STRIPPING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ Results
§ Graph much simpler
§ File size much smaller
102. AFTER REMOVING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ Results
§ Pesky nodes removed
§ File size a bit smaller
103. AFTER FOLDING CONSTANTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ Results
§ Placeholders (feeds) -> Variables*
(*Why Variables and not Constants?)
104. AFTER FOLDING BATCH NORMS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ Results
§ Graph remains the same
§ File size approximately the same
105. AFTER QUANTIZING WEIGHTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ quantize_weights
§ Results
§ Graph is same, file size is smaller, compute is faster
106. WEIGHT QUANTIZATION
§ FP16 and INT8 Are Smaller and Computationally Simpler
§ Weights/Variables are Constants
§ Easy to Linearly Quantize
107. LET’S OPTIMIZE FOR INFERENCE
§ Navigate to the following notebook:
07_Optimize_Model*
*Why just CPU version? Why not GPU?
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
109. ACTIVATION QUANTIZATION
§ Activations Not Known Ahead of Time
§ Depends on input, not easy to quantize
§ Requires Additional Calibration Step
§ Use a “representative” dataset
§ Per Neural Network Layer…
§ Collect histogram of activation values
§ Generate many quantized distributions with different saturation thresholds
§ Choose threshold to minimize…
KL_divergence(ref_distribution, quant_distribution)
§ Not Much Time or Data is Required (Minutes on Commodity Hardware)
111. LET’S OPTIMIZE FOR INFERENCE
§ Navigate to the following notebook:
08_Optimize_Model_Activations
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
113. AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
114. MODEL SERVING TERMINOLOGY
§ Inference
§ Only Forward Propagation through Network
§ Predict, Classify, Regress, …
§ Bundle
§ GraphDef, Variables, Metadata, …
§ Assets
§ ie. Map of ClassificationID -> String
§ {9283: “penguin”, 9284: “bridge”}
§ Version
§ Every Model Has a Version Number (Integer)
§ Version Policy
§ ie. Serve Only Latest (Highest), Serve Both Latest and Previous, …
115. TENSORFLOW SERVING FEATURES
§ Supports Auto-Scaling
§ Custom Loaders beyond File-based
§ Tune for Low-latency or High-throughput
§ Serve Diff Models/Versions in Same Process
§ Customize Models Types beyond HashMap and TensorFlow
§ Customize Version Policies for A/B and Bandit Tests
§ Support Request Draining for Graceful Model Updates
§ Enable Request Batching for Diff Use Cases and HW
§ Supports Optimized Transport with GRPC and Protocol Buffers
116. PREDICTION SERVICE
§ Predict (Original, Generic)
§ Input: List of Tensor
§ Output: List of Tensor
§ Classify
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (class_label: String, score: float)
§ Regress
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (label: String, score: float)
118. MULTI-HEADED INFERENCE
§ Inputs Pass Through Model One Time
§ Model Returns Multiple Predictions:
1. Human-readable prediction (ie. “penguin”, “church”,…)
2. Final layer of scores (float vector)
§ Final Layer of floats Pass to the Next Model in Ensemble
§ Optimizes Bandwidth, CPU/GPU, Latency, Memory
§ Enables Complex Model Composing and Ensembling
119. BUILD YOUR OWN MODEL SERVER
§ Adapt GRPC(Google) <-> HTTP (REST of the World)
§ Perform Batch Inference vs. Request/Response
§ Handle Requests Asynchronously
§ Support Mobile, Embedded Inference
§ Customize Request Batching
§ Add Circuit Breakers, Fallbacks
§ Control Latency Requirements
§ Reduce Number of Moving Parts
#include
“tensorflow_serving/model_servers/server_core.h”
class MyTensorFlowModelServer {
ServerCore::Options options;
// set options (model name, path, etc)
std::unique_ptr<ServerCore> core;
TF_CHECK_OK(
ServerCore::Create(std::move(options), &core)
);
}
Compile and Link with
libtensorflow.so
120. RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ Similar to TF Graph Transform Tool
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
121. AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
122. SAVED MODEL FORMAT
§ Navigate to the following notebook:
09_Deploy_Optimized_Model
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
123. AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
124. REQUEST BATCH TUNING
§ max_batch_size
§ Enables throughput/latency tradeoff
§ Bounded by RAM
§ batch_timeout_micros
§ Defines batch time window, latency upper-bound
§ Bounded by RAM
§ num_batch_threads
§ Defines parallelism
§ Bounded by CPU cores
§ max_enqueued_batches
§ Defines queue upper bound, throttling
§ Bounded by RAM
Reaching either threshold
will trigger a batch
125. ADVANCED BATCHING & SERVING TIPS
§ Batch Just the GPU/TPU Portions of the Computation Graph
§ Batch Arbitrary Sub-Graphs using Batch / Unbatch Graph Ops
§ Distribute Large Models Into Shards Across TensorFlow Model Servers
§ Batch RNNs Used for Sequential and Time-Series Data
§ Find Best Batching Strategy For Your Data Through Experimentation
§ BasicBatchScheduler: Homogeneous requests (ie Regress or Classify)
§ SharedBatchScheduler: Mixed requests, multi-step, ensemble predict
§ StreamingBatchScheduler: Mixed CPU/GPU/IO-bound Workloads
§ Serve Only One (1) Model Inside One (1) TensorFlow Serving Process
§ Much Easier to Debug, Tune, Scale, and Manage Models in Production.
126. LET’S DEPLOY OPTIMIZED MODEL
§ Navigate to the following notebook:
10_Optimize_Model_Server
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
127. AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
128. THANK YOU!! QUESTIONS?
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Contact Me
chris@pipeline.ai
@cfregly