SlideShare a Scribd company logo
HIGH PERFORMANCE TENSORFLOW IN
PRODUCTION WITH GPUS
CHRIS FREGLY, FOUNDER @PIPELINE.AI
BIG DATA SPAIN, MADRID - NOV 15, 2017
I LOVE THIS CONFERENCE!!
INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @ PipelineAI
§ Formerly Netflix and Databricks
§ Advanced Spark and TensorFlow Meetup
Please Join Our 50,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
INTRODUCTIONS: YOU
§ Software Engineer, Data Scientist, Data Engineer, Data Analyst
§ Interested in Optimizing and Deploying TF Models to Production
§ Nice to Have a Working Knowledge of TensorFlow (Not Required)
CONTENT BREAKDOWN
50% Model Training Optimizations
(GPU, Ingestion Pipeline, JIT)
Boring & Batch
Offline in Research Lab
No Real-Time Serving Skills
Pipeline Stops at Training
No Feedback with Runtime
Small Number of Data Scientists
10’s Training Jobs / Day
Exciting & Real-Time!!
Online in Live Production
Unique Real-Time Serving Skills
Pipeline Extends into Production
Continuous Feedback with Training
Large Number of App Devs & Users
1,000,000’s Predictions / Sec<<<
50% Model Serving Optimizations
(Post-Processing, TF Serving, AOT)

Recommended for you

Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...

Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now. While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations. Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU). Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk. Bio Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs." Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.

hyper-parameter tuningai pipelinemodel training
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017

http://pipeline.io Title PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop *A GPU-based cloud instance will be provided to each attendee as part of this event Highlights We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance. At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD! Pre-requisites Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest. Agenda Spark ML TensorFlow AI Storing and Serving Models with HDFS Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out CUDA + cuDNN GPU Development Overview TensorFlow Model Checkpointing, Saving, Exporting, and Importing Distributed TensorFlow AI Model Training (Distributed Tensorflow) TensorFlow's Accelerated Linear Algebra Framework (XLA) TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard) Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving) Centralized Logging and Metrics Collection (Prometheus, Grafana) Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow) Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes) High-Performance and Fault-Tolerant Micro-services (NetflixOSS) Bio Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production." Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco. Github Repo https://github.com/fluxcapacitor/pipeline Video https://youtu.be/oNf3I1fVmg8

datasetpythonartificial intelligence
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...

Pipeline.AI is a platform for deploying and optimizing machine learning models at scale. It allows users to package models with their runtime dependencies, perform load testing and optimizations, deploy models to production safely using techniques like canary deployments, and monitor models both offline and online. The platform aims to enable live, continuous model training directly in production environments.

machine learningartificial intelligencehigh performance
AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
100% OPEN SOURCE CODE
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
HANDS-ON EXERCISES
§ Combo of Jupyter Notebooks and Command Line
§ Command Line through Jupyter Terminal
§ Some Exercises Based on Experimental Features
You May See Errors. Stay Calm. You Will Be OK!!
PIPELINE.AI OVERVIEW
400,000 Docker Downloads
50,000 Users registered for
PipelineAI GA Release
2,000 GitHub Stars
15 Enterprise Beta Users

Recommended for you

High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...

Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, Chris will demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster. https://github.com/fluxcapacitor/pipeline/gpu.ml http://pipeline.io

machine learningamazon web serviceslinear algebra
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs

Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017 Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise. In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers. Github Repo (100% Open Source!) https://github.com/fluxcapacitor/pipeline http://pipeline.io

tensorflownumbascipy
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...

Speaker: Umayah Abdennabi Agenda * Intro Grammarly (Umayah Abdennabi, 5 mins) * Meetup Updates and Announcements (Chris, 5 mins) * Custom Functions in Spark SQL (30 mins) Speaker: Umayah Abdennabi Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them. * TF 2.0 + Keras (30 mins) Speaker: Francesco Mosconi Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras * SQUAD Deep-Dive: Question & Answer with Context (45 mins) Speaker: Brett Koonce (https://quarkworks.co) SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy. https://rajpurkar.github.io/SQuAD-explorer/ https://dawn.cs.stanford.edu/benchmark/#squad Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.

sparkspark sqludf
AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
PACKAGE MODEL + RUNTIME AS ONE
§ Package Model + Runtime into Immutable Docker Image
§ Same Environment: Local, Dev, and Prod
§ No Dependency Surprises in Production
§ Deploy and Tune Model + Runtime Together
pipeline predict-server-build --model-type=tensorflow 
--model-name=mnist 
--model-tag=”c” 
--model-path=./models/tensorflow/mnist/
Package Model
Server C Locally
pipeline predict-server-push --model-type=tensorflow 
--model-name=mnist 
--model-tag=”c” 
Push Image C To
Docker Registry
TUNE MODEL + RUNTIME TOGETHER
§ Try Different Model Hyper-Parameters + Runtime Configs
§ Even Different Runtimes: TF Serving, TensorRT
§ Auto-Quantize Model Weights + Activations
§ Auto-Fuse Neural Network Layers Together
§ Generate Native CPU + GPU Code
pipeline predict-server-start --model-type=tensorflow 
--model-name=mnist 
--model-tag=”c"
Start Model
Server C Locally
LOAD TEST MODEL + RUNTIME LOCALLY
§ Perform Mini-Load Test on Local Model Server
§ Provides Immediate Feedback on Prediction Performance
§ Relative Performance Compared to Other Variations
§ No Need to Deploy to Test or Prod for Prediction Metrics
§ See Where Time is Being Spent During Prediction
pipeline predict --model-server-url=http://localhost:6969 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=”c”
--test-request-concurrency=1000
Load Test Model
Server C Locally

Recommended for you

Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform

This document provides an overview of Apache Submarine, an open source unified machine learning platform. It discusses requirements for machine learning in production, including reusable experimentation and model management. It introduces Submarine's architecture and components like the Submarine service, workbench, and runtime connectors. Demos are provided of the Mini Submarine, Zeppelin integration, and Submarine Workbench. Current status and future plans are outlined, and several community use cases are mentioned.

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY

The document discusses Apache Hadoop 3.x updates and provides guidance for upgrading to Hadoop 3. It covers community updates, features in YARN, Submarine, HDFS, and Ozone. Release plans are outlined for Hadoop, Submarine, and upgrades from Hadoop 2 to 3. Express upgrades are recommended over rolling upgrades for the major version change. The session summarizes that Hadoop 3 is an eagerly awaited release with many successful production uses, and that now is a good time for those not yet upgraded.

Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack

Pradeep Kumar surisetty presented on performance benchmarking of clouds and evaluating OpenStack. He discussed key cloud characteristics like elasticity and scalability. He then covered various performance measuring tools like Rally, Browbeat, Perfkit Benchmarker, and SPEC Cloud IaaS 2016 benchmark. He also discussed performance monitoring tools like Ceilometer, Collectd/Graphite/Grafana, and Ganglia. Finally, he provided some tuning tips for hardware, instances, over-subscription, local storage, NUMA nodes, disk pinning, and deployment timings.

RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ Similar to TF Graph Transform Tool
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
DEPLOY MODELS SAFELY TO PROD
§ Deploy from Jupyter Notebook in 1-Click
§ Deploy to 1-2% Split or Shadowed Traffic
§ Tear-Down or Rollback Quickly
§ Use Command Line Interface (CLI)
pipeline predict-cluster-start --model-type=tensorflow 
--model-name=mnist 
--model-tag=”b” 
--traffic-split=“0.02”
Start Model
Cluster B in Prod
pipeline predict-cluster-start --model-type=tensorflow 
--model-name=mnist 
--model-tag=”c” 
--traffic-split=“0.01”
Start Model
Cluster C in Prod
pipeline predict-cluster-start --model-type=tensorflow 
--model-name=mnist 
--model-tag=”a” 
--traffic-split=“0.97”
Start Model
Cluster A in Prod
Implementation Details…
DEPLOY EXPERIMENTS SAFELY TO PROD
§ Create Experiments Directly from Jupyter or Command Line
§ Deploy Experiment
pipeline experiment-add --experiment-name=my_experiment 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=“a” 
--traffic-split=“97%”
CLI
Drag
n’ Drop
pipeline experiment-start --experiment-name=my_experiment 
--traffic-shadow=“20%”
pipeline experiment-add --experiment-name=my_experiment 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=“b” 
--traffic-split=“2%”
pipeline experiment-add --experiment-name=my_experiment 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=“c” 
--traffic-split=“1%”
1-Click
Start Experiment
with 20% Shadowed
of Production Traffic

Recommended for you

High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012

This document summarizes a talk on high performance network programming on the JVM. The talk discusses choosing between synchronous and asynchronous I/O, with examples of when each approach is best. It also covers how to optimize synchronous I/O on the JVM to maximize throughput. The document provides benchmarks comparing the performance of a simple synchronous memcache client versus an asynchronous one.

Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on Kubernetes

Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.

kubernetescontainersdocker
One-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER SystemsOne-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER Systems

This document describes how to deploy Hadoop clusters on OpenPOWER systems using OpenStack and the Sahara plugin in 3 steps: 1) Setup OpenStack with Sahara on OpenPOWER servers, 2) Create PowerPC images and node group templates in Sahara, 3) Launch and test a Hadoop cluster from the Sahara dashboard. The deployment was tested on IBM S822L servers running PowerKVM with a 500GB Terasort completing in 7000 seconds on 2 data nodes and 1 name node. Upstream contributions were also made to OpenStack to support PowerPC.

AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation Accuracy
§ Training Accuracy
§ CPU/GPU Utilization
§ Live Prediction Values
§ Compare Model Precision
§ Online, Real-Time Metrics
§ Response Time & Throughput
§ Cost Per Prediction
PREDICTION PROFILING AND TUNING
§ Pinpoint Performance Bottlenecks
§ Fine-Grained Prediction Metrics
§ Three (3) Logic Prediction Steps
1. transform_request()
2. predict()
3. transform_response()
VIEW REAL-TIME PREDICTION STREAM
§ Visually Compare Real-Time Predictions
Prediction
Inputs
Prediction
Result +
Confidence

Recommended for you

Quest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFREDQuest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFRED

Andi Smith provides an overview of setting up an automated workflow for front-end development using Grunt or Gulp. They discuss choosing a task runner, common tasks for setup like concatenation and minification, tasks for development like autoprefixing and live reloading, and tasks for build like image optimization and compression. The presentation emphasizes setting up a workflow that focuses on speeding up the development process and only including necessary tasks.

gulpjsgulpgruntjs
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014

Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.

startupjustin lintzcloud computing
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...

Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following: - Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues? - How do I tune my connection pool configuration? What are the optimal settings for my environment ? - What is my C* cluster capacity in terms of number of IOPs for a given 95th and 99th latency? - How do I perf-test my data access layer? In this talk, Vinay Chella, Cloud Data Architect @ Netflix, will share open source tools, techniques and platform(NDBench) that Netflix uses to perf-test their C* fleet with simulations millions of operations per second. About the Speaker Vinay Chella Cloud Data Architect, NETFLIX Inc About Vinay Chella, Cloud Data Architect at Netflix having deeper understanding of Cassandra and other RDBMS. As an Engineer and Architect, working extensively on data modeling, performance tuning and guiding best practices of various persistence stores. Helping various teams @ Netflix building next generation data access layers.

datastaxsessionswide partition
CONTINUOUS MODEL TRAINING
§ Identify and Fix Borderline Predictions (~50-50% Confidence)
§ Fix Along Class Boundaries
§ Retrain on New Labeled Data
§ Game-ify Labeling Process
§ Enables Crowd Sourcing
AGENDA
Part 0: Latest PipelineAI Research
§ Package, Deploy, and Tune Both Model + Runtime
§ Deploy Models and Experiments Safely to Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
SHIFT TRAFFIC TO MAX(REVENUE)
§ Shift Traffic to Winning Model using AI Bandit Algorithms
Implementation Details…
SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Across Clouds & On-Premise
§ Real-Time Cost Per Prediction
§ Bandit-based Explore/Exploit

Recommended for you

DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes

Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning a polyglot set of microservices to get the performance that you need can be challenging in Kubernetes. The key to overcoming this is observability. Luckily there are a number of tools such as Prometheus that can provide all the metrics you need, but here is the catch, there is so much of data and metrics that is difficult make sense of it all. This is where Hyperparameter tuning can come to the rescue to help build the right models. This talk covers best practices that will help attendees 1. To understand and avoid common performance related problems. 2. Discuss observability tools and how they can help identify perf issues. 3. Look closer into Kruize Autotune which is a Open Source Autonomous Performance Tuning Tool for Kubernetes and where it can help.

kubernetesmachine learningdocker
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to Polyaxon

This is an introduction to polyaxon and why I use polyaxon. Polyaxon enables me to leverage kubernetes to achieve the objectives: - Make the lead time of experiments as short as possible. - Make the financial cost to train models as cheap as possible. - Make the experiments reproducible.

aiopsmachine learningsysml
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...

https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/227622666/ Title: Spark on Kubernetes Abstract: Engineers across several organizations are working on support for Kubernetes as a cluster scheduler backend within Spark. While designing this, we have encountered several challenges in translating Spark to use idiomatic Kubernetes constructs natively. This talk is about our high level design decisions and the current state of our work. Speaker: Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. His focus is on running stateful and batch workloads. Previously, he worked on GGC (Google Global Cache) and prior to that, on the infrastructure team at NVIDIA."

distributed systemsadvanced spark and tensorflow meetuptensorflow
AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
EVERYBODY GETS A GPU!
SETUP ENVIRONMENT
§ Step 1: Browse to the following:
http://allocator.community.pipeline.ai/allocate
§ Step 2: Browse to the following:
http://<ip-address>
§ Step 3: Browse around.
I will provide a Jupyter Username/Password soon.
Need Help?
Use the Chat!

Recommended for you

Zen of Akka
Zen of AkkaZen of Akka
Zen of Akka

This document contains the slides from a talk given by Konrad Malawski on the "Tao/Zen of Programming" using Akka. Some of the key points discussed include: - Actors are meant to work together and each actor should focus on a single responsibility. Having only one actor limits its capabilities. - Actors should be structured in a hierarchy with parent-child relationships to allow for supervision. Actors should also be named meaningfully based on their purpose. - Blocking operations can starve other actors by monopolizing shared resources. Blocking code needs to be isolated on dedicated dispatchers. - Messages should be processed asynchronously using for/flatMap instead of awaiting futures to avoid blocking

akkareactiveactor model
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka features

This document summarizes advanced Akka features presented by Martin Kanters and Johan Janssen. It covers local and remote actors, scheduling, clustering, routing, cluster singletons, sharding, persistence, Akka HTTP, and finite state machines. The presentation introduces these features and provides examples to illustrate how they can be used with Akka.

Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...

Akka is the platform for the next generation event-driven, scalable and fault-tolerant architectures on the JVM We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it's because we are using the wrong tools and the wrong level of abstraction. Akka is here to change that. Using the Actor Model together with Software Transactional Memory we raise the abstraction level and provides a better platform to build correct concurrent and scalable applications. For fault-tolerance we adopt the "Let it crash" / "Embrace failure" model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop. Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications. Akka is Open Source and available under the Apache 2 License.

scalability fault-tolerance high-availability scal
VERIFY SETUP
http://<ip-address>
Any username,
Any password!
LET’S EXPLORE OUR ENVIRONMENT
§ Navigate to the following notebook:
01_Explore_Environment
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
PULSE CHECK
BREAK
§ Please 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Need Help?
Use the Chat!

Recommended for you

Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka

The document introduces Akka, an open-source toolkit for building distributed, concurrent applications on the JVM. It provides a programming model called the actor model that makes it easier to build scalable and fault-tolerant systems. Actors process messages asynchronously and avoid shared state, providing a simpler approach to concurrency than traditional threads and locks. Akka allows actors to be distributed across a network, enabling applications to scale out elastically.

concurrencyakkajava
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams

This document provides an overview of Konrad Malawski's presentation on reactive stream processing with Akka Streams. The presentation covers Reactive Streams concepts like back pressure, the Reactive Streams specification and protocol, and how Akka Streams implements reactive stream processing using concepts like linear flows, flow graphs, and integration with Akka actors. It also discusses future plans for Akka Streams including API stabilization, improved testability, and potential features like visualizing flow graphs and distributing computation graphs.

akka streamsakka
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark

A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.

apache sparkapache kafkaanalytics
SETTING UP TENSORFLOW WITH GPUS
§ Very Painful!
§ Especially inside Docker
§ Use nvidia-docker
§ Especially on Kubernetes!
§ Use Kubernetes 1.8+
§ http://pipeline.ai for GitHub + DockerHub Links
TENSORFLOW + CUDA + NVIDIA GPU
GPU HALF-PRECISION SUPPORT
§ FP32 is “Full Precision”, FP16 is “Half Precision”
§ Supported by Pascal P100 (2016) and Volta V100 (2017)
§ Two(2) FP16’s in Each FP32 GPU Core for 2x Throughput!
§ Half-Precision is OK for Approximate Deep Learning Use Cases
You Can Set
TF_FP16_MATMUL_USE_FP32_COMPUTE=0
on GPU w/ Compute Capability(CC) 5.3+
VOLTA V100 (2017) VS. PASCAL P100 (2016)
§ 84 Streaming Multiprocessors (SM’s)
§ 5,376 GPU Cores
§ 672 Tensor Cores (ie. Google TPU)
§ Mixed FP16/FP32 Precision
§ Matrix Dims Should be Multiples of 8
§ More Shared Memory
§ New L0 Instruction Cache
§ Faster L1 Data Cache
§ V100 vs. P100 Performance
§ 12x Training, 6x Inference

Recommended for you

Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015

Everyone in the Scala world is using or looking into using Akka for low-latency, scalable, distributed or concurrent systems. I'd like to share my story of developing and productionizing multiple Akka apps, including low-latency ingestion and real-time processing systems, and Spark-based applications. When does one use actors vs futures? Can we use Akka with, or in place of, Storm? How did we set up instrumentation and monitoring in production? How does one use VisualVM to debug Akka apps in production? What happens if the mailbox gets full? What is our Akka stack like? I will share best practices for building Akka and Scala apps, pitfalls and things we'd like to avoid, and a vision of where we would like to go for ideal Akka monitoring, instrumentation, and debugging facilities. Plus backpressure and at-least-once processing.

akkamonitoringscala
numPYNQ @ NGCLE@e-Novia 15.11.2017
numPYNQ @ NGCLE@e-Novia 15.11.2017numPYNQ @ NGCLE@e-Novia 15.11.2017
numPYNQ @ NGCLE@e-Novia 15.11.2017

numPYNQ is a hardware library that offers an accelerated version of NumPy core functions to be used transparently from data science applications. It implements these functions on an FPGA to provide better performance, energy efficiency, and flexibility compared to GPUs. Experimental results show speedups for tasks like matrix multiplication and cross-correlation. The library uses runtime input analysis and adaptation to optimize implementations. It has potential in the growing big data market, and the team plans partnerships and a freemium business model to commercialize numPYNQ.

numpynqngclee-novia
In-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersIn-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and Engineers

Slides of IMC Essentials workshop. The workshop covers fundamental capabilities of in-memory computing platforms that boost high-load applications and services, and bring existing IT architecture to the next level by storing and processing a massive amount of data both in RAM and, optionally, on disk. The capabilities and benefits of such platforms will be demonstrated with the usage of Apache Ignite, which is the in-memory computing platform that is durable, strongly consistent, and highly available with powerful SQL, key-value and processing APIs.

apache ignitedatabasedistributed database
FP32 VS. FP16 ON AWS GPU INSTANCES
FP16 Half Precision
87.2 T ops/second for p3 Volta V100
4.1 T ops/second for g3 Tesla M60
1.6 T ops/second for p2 Tesla K80
FP32 Full Precision
15.4 T ops/second for p3 Volta V100
4.0 T ops/second for g3 Tesla M60
3.3 T ops/second for p2 Tesla K80
§ Currently Supports the Following:
§ Tesla K80
§ Pascal P100
§ TPUs
§ Attach GPUs to CPU Instances
§ Similar to AWS Elastic GPU, except less confusing
WHAT ABOUT GOOGLE CLOUD GPUS?
V100 AND CUDA 9
§ Independent Thread Scheduling - Finally!!
§ Similar to CPU fine-grained thread synchronization semantics
§ Allows GPU to yield execution of any thread
§ Still Optimized for SIMT (Same Instruction Multiple Thread)
§ SIMT units automatically scheduled together
§ Explicit Synchronization
P100 V100
GPU CUDA PROGRAMMING
§ Barbaric, But Fun Barbaric
§ Must Know Hardware Very Well
§ Hardware Changes are Painful
§ Use the Profilers & Debuggers

Recommended for you

Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)

The Linux Security and Isolation APIs have become the basis of some of the most useful features server-side, providing the isolation required for efficient containers. However, these APIs also form the basis of the Chromium Sandbox on Linux, and we will study them in that context. This presentation goes more in depth on some key points from the NDC (2017) presentation.

securitychromiumlinux
Docker Networking
Docker NetworkingDocker Networking
Docker Networking

Docker networking allows containers to communicate in several ways. Containers can communicate using Docker's default bridge (Docker0), by binding container ports to the host's ports, or using the host's network stack directly. More advanced options include linking containers to share information, using overlay networks with technologies like Open vSwitch, or running containers across multiple hosts with tunnels. The document provides examples of setting up different Docker networking configurations and discusses which methods suit different communication requirements between containers, hosts, and external networks.

docker networkingdocker0docker
Graduating To Go - A Jumpstart into the Go Programming Language
Graduating To Go - A Jumpstart into the Go Programming LanguageGraduating To Go - A Jumpstart into the Go Programming Language
Graduating To Go - A Jumpstart into the Go Programming Language

This workshop jumps through a lot of what is covered in the Go Tour. The exercises are new and match more along with the class content, and some pieces (like testing and APIs) are not covered in the Go Tour.

CUDA STREAMS
§ Asynchronous I/O Transfer
§ Overlap Compute and I/O
§ Keeps GPUs Saturated
§ Fundamental to Queue Framework in TensorFlow
LET’S SEE WHAT THIS THING CAN DO!
§ Navigate to the following notebook:
01a_Explore_GPU
01b_Explore_Numba
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
TRAINING TERMINOLOGY
§ Tensors: N-Dimensional Arrays
§ ie. Scalar, Vector, Matrix
§ Operations: MatMul, Add, SummaryLog,…
§ Graph: Graph of Operations (DAG)
§ Session: Contains Graph(s)
§ Feeds: Feed Inputs into Placeholder
§ Fetches: Fetch Output from Operation
§ Variables: What We Learn Through Training
§ aka “Weights”, “Parameters”
§ Devices: Hardware Device (GPU, CPU, TPU, ...)
-TensorFlow-
Trains
Variables
-User-
Fetches
Outputs
-User-
Feeds
Inputs
-TensorFlow-
Performs
Operations
-TensorFlow-
Flows
Tensors
with tf.device(“/cpu:0,/gpu:15”):

Recommended for you

What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?

The Linux Foundation has over 500 corporate members involved in over 70 member-sponsored projects. In 2016, the Linux Foundation convened over 20,000 people from 85 countries and over 4000 companies at 150 events around the world. Over 800,000 students from 215 countries have enrolled in Linux Foundation training programs. Who is driving this growth? Why do companies invest valuable resources in collaborative development? What have we learned along the way?

black duck softwareblack duck flight 2017linux foundation
Scale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneScale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOne

This document provides a summary of a presentation on using lock-free algorithms to scale shared mutable state on the JVM. It begins with an introduction to the speaker and discusses why shared mutable state is needed for big data and real-time processing. It then uses a toy problem of implementing a concurrent stack to demonstrate the challenges of synchronization and contention. The presentation introduces the use of atomic references and compare-and-set operations to implement lock-free push and pop operations on the concurrent stack in a non-blocking manner, improving scalability.

lock-freekotlinjava
Communication hardware
Communication hardwareCommunication hardware
Communication hardware

Communication hardware refers to electric devices and systems for transferring data or information from one place to another. Examples include modems, cables, fax modems, routers, and wireless technologies like infrared, Bluetooth, and Wi-Fi. The document provides details on each type of communication hardware, including what they are and how they function. It also includes multiple choice questions to test understanding of the different hardware.

TENSORFLOW SESSION
Session
graph: GraphDef
Variables:
“W” : 0.328
“b” : -1.407
Variables are
Randomly
Initialized, then
Periodically
Checkpointed
GraphDef is
Created
During
Training, then
Frozen for
Inference
TENSORFLOW GRAPH EXECUTION
§ Lazy Execution by Default
§ Similar to Spark
§ Eager Execution Now Supported (TensorFlow 1.4)
§ Similar to PyTorch
§ "Linearize” Execution to Minimize RAM Usage
§ Useful on Single GPU with Limited RAM
TENSORFLOW MODEL
§ MetaGraph
§ Combines GraphDef and Metadata
§ GraphDef
§ Architecture of your model (nodes, edges)
§ Metadata
§ Asset: Accompanying assets to your model
§ SignatureDef: Maps external : internal tensors
§ Variables
§ Stored separately during training (checkpoint)
§ Allows training to continue from any checkpoint
§ Variables are “frozen” into Constants when preparing for inference
GraphDef
x
W
mul add
b
MetaGraph
Metadata
Assets
SignatureDef
Tags
Version
Variables:
“W” : 0.328
“b” : -1.407
BATCH NORMALIZATION (2015)
§ Each Mini-Batch May Have Wildly Different Distributions
§ Normalize per Batch (and Layer)
§ Faster Training, Learns Quicker
§ Final Model is More Accurate
§ TensorFlow is already on 2nd Generation Batch Algorithm
§ First-Class Support for Fusing Batch Norm Layers
§ Final mean + variance Are Folded Into Our Graph Later
-- (Almost)Always Use Batch Normalization! --
z = tf.matmul(a_prev, W)
a = tf.nn.relu(z)
a_mean, a_var = tf.nn.moments(a, [0])
scale = tf.Variable(tf.ones([depth/channels]))
beta = tf.Variable(tf.zeros ([depth/channels]))
bn = tf.nn.batch_normalizaton(a, a_mean, a_var,
beta, scale, 0.001)

Recommended for you

[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode

This document discusses challenges and solutions related to window remote shellcode. It outlines challenges posed by antivirus software, EMET, firewalls, and IDS/IPS systems. It then describes various techniques for bypassing these protections, such as encryption, obfuscation, non-standard programming languages, and the use of tools like Meterpreter and Veil Framework payloads. Specific bypass techniques covered include DLL injection, process hollowing, reflective loading, and the use of techniques like one-way shells and HTTP stagers.

window shellcode
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migration

Dive deep into an actual enterprise Linux migration by walking through the planning and execution of the process as seen by our customers. Our enterprise architects will break down the key migration steps to explain the available options, decisions made, and demonstrate actions on a live system. This episode gives you a representative migration experience before you actually migrate, illustrating: Side-by-side comparisons between Red Hat Enterprise Linux and CentOS; steps to consider for the operating system; and steps to consider for common application stacks and packages.

linuxsoftare migrationcentos
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation

Slides of the Golang Lyon meetup talk done on 15/11/2017 about memory allocation in Go (stack management, escape analysis, ...).

golangmemory managementlinux
DROPOUT (2014)
§ Training Technique
§ Prevents Overfitting
§ Helps Avoid Local Minima
§ Inherent Ensembling Technique
§ Creates and Combines Different Neural Architectures
§ Expressed as Probability Percentage (ie. 50%)
§ Boost Other Weights During Validation & Prediction
Perform Dropout
(Training Phase)
Boost for Dropout
(Validation & Prediction Phase)
0%
Dropout
50%
Dropout
FOLLOW SOME TENSORFLOW EXPERTS
§ https://github.com/yaroslavvb/stuff
EXTEND EXISTING DATA PIPELINES
§ Data Processing
§ HDFS/Hadoop
§ Spark
§ Containers
§ Docker
§ Schedulers
§ Kubernetes
§ Mesos
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>tensorflow-hadoop</artifactId>
</dependency>
https://github.com/tensorflow/ecosystem
FEED TENSORFLOW TRAINING PIPELINE
§ Training is Almost Always Limited by Ingestion Pipeline
§ THE Number One Problem We See Today
§ Scaling GPUs Up / Out Doesn’t Help
§ GPUs are Heavily Under-Utilized
Tesla K80 Volta V100

Recommended for you

Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI

Abstract:- Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster. Bio:- Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production." Pipeline.AI was also the recent winner of the O'Reilly Media AI Startup Showcase at the AI conference. Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.

los angeles big data users grouplos angeles apache spark users grouplabdug
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes

In this deck from the Stanford HPC Conference, Chris Fregly from PipelineAI presents: High Performance Distributed TensorFlow with GPUs and Kubernetes. "Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements with TensorFlow, Kubernetes, OpenFaaS, GPUs, and PipelineAI. In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment. This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production. Following the famous Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data. Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong. Learn to be fast and strong by attending this talk!" Watch the video: https://youtu.be/k4qAKQHakNg Learn more: https://pipeline.ai/ and http://hpcadvisorycouncil.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

stanford hpc conferencehpcsupercomputing
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...

Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster. * Bio * Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Video Series High Performance TensorFlow in Production. Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member of the IBM Spark Technology Center in San Francisco.

dataworks summithadoophadoop summit
DON’T USE FEED_DICT!!
§ feed_dict Requires Python <-> C++ Serialization
§ Not Optimized for Production Ingestion Pipelines
§ Retrieves Next Batch After Current Batch is Done
§ Single-Threaded, Synchronous
§ CPUs/GPUs Not Fully Utilized!
§ Use Queue or Dataset APIs
§ Queues are old and complex
sess.run(train_step, feed_dict={…}
DETECT UNDERUTILIZED CPUS, GPUS
§ Instrument training code to generate “timelines”
§ Analyze with Google Web
Tracing Framework (WTF)
§ Monitor CPU with top, GPU with nvidia-smi
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
QUEUES
§ More than traditional Queue
§ Uses CUDA Streams
§ Perform I/O, pre-processing, cropping, shuffling, …
§ Pull from HDFS, S3, Google Storage, Kafka, ...
§ Combine many small files into large TFRecord files
§ Use CPUs to free GPUs for compute
§ Helps saturate CPUs and GPUs
QUEUE CAPACITY PLANNING
§ batch_size
§ # examples / batch (ie. 64 jpg)
§ Limited by GPU RAM
§ num_processing_threads
§ CPU threads pull and pre-process batches of data
§ Limited by CPU Cores
§ queue_capacity
§ Limited by CPU RAM (ie. 5 * batch_size)

Recommended for you

TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js

Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/

tensorflowmeetupkeras
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014

Miguel Zuniga presented on managing and scaling Puppet. The presentation covered using a Puppet master with a web cluster for scaling, adding caching to reduce load, using source control with Puppet, multi-datacenter configurations, masterless Puppet in the cloud, and future directions including search capabilities and dynamic configurations. Zuniga took questions at the end.

puppetconf 2014
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014

Miguel Zuniga presented on managing and scaling Puppet. The presentation covered Puppet and the Puppetmaster model, scaling Puppet with a web cluster, using caching to reduce load, integrating Puppet with source control management, multi-datacenter configurations, masterless Puppet in the cloud, and future directions for Puppet. Zuniga concluded by taking questions from the audience.

puppetpuppetconfpuppetconf2014
DATASET API
§ tf.Tensor => tf.data.Dataset
§ Functional Transformations
§ Python Generator => tf.data.Dataset
Dataset.from_tensors((features, labels))
Dataset.from_tensor_slices((features, labels))
TextLineDataset(filenames)
dataset.map(lambda x: tf.decode_jpeg(x))
dataset.repeat(NUM_EPOCHS)
dataset.batch(BATCH_SIZE)
def generator():
while True:
yield ...
dataset.from_generator(generator, tf.int32)
§ Dataset => One-Shot Iterator
§ Dataset => Initializable Iter
iter = dataset.make_one_shot_iterator()
next_element = iter.get_next()
while …:
sess.run(next_element)
iter = dataset.make_initializable_iterator()
sess.run(iter.initializer,
feed_dict=PARAMS)
next_element = iter.get_next()
while …:
sess.run(next_element)
FUTURE OF DATASET API
§ Advanced, RL-based Device Placement Strategies
§ Automatic GPU Data Staging
§ More Functional Operators
LET’S FEED DATA WITH A QUEUE
§ Navigate to the following notebook:
02_Feed_Queue_HDFS
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
PULSE CHECK

Recommended for you

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning

In this deck from the 2018 Swiss HPC Conference, Axel Koehler from NVIDIA presents: The Convergence of HPC and Deep Learning. "The intersection of AI and HPC is extending the reach of science and accelerating the pace of scientific innovation like never before. The technology originally developed for HPC has enabled deep learning, and deep learning is enabling many usages in science. Deep learning is also helping deliver real-time results with models that used to take days or months to simulate. The presentation will give an overview about the latest hard- and software developments for HPC and Deep Learning from NVIDIA and will show some examples that Deep Learning can be combined with traditional large scale simulations." Watch the video: https://wp.me/p3RLHQ-ijM Learn more: http://nvidia.com and http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

nvidiaswitzerland hpc conferencedeep learning
Containers explained as for cook and a mecanics
 Containers explained as for cook and a mecanics  Containers explained as for cook and a mecanics
Containers explained as for cook and a mecanics

Containers are everywhere, google/office365 mailboxes, web applications, healthcare booking, aeroplanes, and many more. Docker containers are everywhere today, our google/office365 mailboxes, our web applications, our access for medical appointments, airplanes, ... They are everywhere but not always easy to apprehend, and yet, they have much more similarities with our daily jobs than it seems. During this webinar, I will present you these famous Docker containers, seen by a chef and a car mechanic and you will see that they have a lot in common.

dockerintroductioncontainer
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)

The document discusses Amazon SageMaker, a fully managed service that allows users to build, train and deploy machine learning models at scale. It provides pre-built algorithms and frameworks, managed hosting, one-click deployment and hyperparameter tuning capabilities. It also supports bringing your own custom algorithms by allowing users to run their own Docker containers. The document highlights how SageMaker simplifies and automates ML workflows and provides examples of customers using it at scale for image and data analysis.

awsamazon web servicesmachine learning
BREAK
§ Please 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Need Help?
Use the Chat!
LET’S TRAIN A MODEL (CPU)
§ Navigate to the following notebook:
03_Train_Model_CPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
LET’S TRAIN A MODEL (GPU)
§ Navigate to the following notebook:
03a_Train_Model_GPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
TENSORFLOW DEBUGGER
§ Step through Operations
§ Inspect Inputs and Outputs
§ Wrap Session in Debug Session
sess = tf.Session(config=config)
sess =
tf_debug.LocalCLIDebugWrapperSession(sess)

Recommended for you

Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3

The document discusses moving a Tomcat cluster to the cloud. It describes how Tomcat uses multicast for session replication in a cluster, but this does not work in the cloud. The solution presented uses the Kubernetes API to discover cluster nodes instead of multicast, allowing session replication to function in OpenShift. The architecture includes a DynamicMembershipService that refreshes the node list from a KubernetesMemberProvider accessing the Kubernetes API. This allows a Tomcat cluster to run in OpenShift with external session replication.

cluster kubernetes cloudtomcat rpi3
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs

Scaling out the existing CPU resources with attachable Elastic GPUs to boost training performance and cut down on costs.

deep learningtensorflowmachine learning
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9

In this deck from the NVIDIA GPU Technology Conference, Axel Koehler presents: Inside the Volta GPU Architecture and CUDA 9. "The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release." Watch the video: https://wp.me/p3RLHQ-iB7 Learn more: https://www.nvidia.com/en-us/gtc/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

hpcsupercomputingnvidia
LET’S DEBUG A MODEL
§ Navigate to the following notebook:
04_Debug_Model
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
SINGLE NODE, MULTI-GPU TRAINING
§ cpu:0
§ By default, all CPUs
§ Requires extra config to target a CPU
§ gpu:0..n
§ Each GPU has a unique id
§ TF usually prefers a single GPU
§ xla_cpu:0, xla_gpu:0..n
§ “JIT Compiler Device”
§ Hints TensorFlow to attempt JIT Compile
with tf.device(“/cpu:0”):
with tf.device(“/gpu:0”):
with tf.device(“/gpu:1”):
GPU 0 GPU 1
DISTRIBUTED, MULTI-NODE TRAINING
§ TensorFlow Automatically Inserts Send and Receive Ops into Graph
§ Parameter Server Synchronously Aggregates Updates to Variables
§ Nodes with Multiple GPUs will Pre-Aggregate Before Sending to PS
Worker0 Worker0
Worker1
Worker0 Worker1 Worker2
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0
gpu1
gpu0
gpu0
Single
Node
Multiple
Nodes

Recommended for you

Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow

Talk given at the London AICamp meet up on the 13 July 2023. It's an introduction on building open-source ChatGPT-like chat bots and some of the considerations to have while training/tuning them using Airflow.

apache airflowopenaipython
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc  2015 HTTP 1, HTTP 2 and folksDevoxx Maroc  2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks

The document discusses HTTP and web architectures. It begins with introductions from Nicolas Martignole and Quentin Adam. It then provides an overview of HTTP1 including that it is a text-based specification, uses simple requests and responses over TCP connections, and defines verbs like GET, POST, PUT, and DELETE. It discusses techniques for caching like Expires, Pragma, and Cache-Control headers. It also covers ETags for cache validation and content negotiation for serving multiple representations of resources.

httpdevoxx
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift

This document discusses cloud native development and DevOps using OpenShift Container Platform. It begins by defining cloud native as involving both application architecture and the development, deployment and management processes used. It then discusses how containers evolve application delivery and how container platforms are part of the DevOps tool kit. The document outlines the path to DevOps, emphasizing culture, automation and using the right platform. It also notes that DevOps and containers often go hand in hand, with many DevOps adopters using containers. The document then discusses various capabilities of OpenShift and how it supports cloud native development.

openshiftcloud-nativemicroservice
DATA PARALLEL VS MODEL PARALLEL
§ Data Parallel (“Between-Graph Replication”)
§ Send exact same model to each device
§ Each device operates on partition of data
§ ie. Spark sends same function to many workers
§ Each worker operates on their partition of data
§ Model Parallel (“In-Graph Replication”)
§ Send different partition of model to each device
§ Each device operates on all data
§ Difficult, but required for larger models with lower-memory GPUs
SYNCHRONOUS VS. ASYNCHRONOUS
§ Synchronous
§ Nodes compute gradients
§ Nodes update Parameter Server (PS)
§ Nodes sync on PS for latest gradients
§ Asynchronous
§ Some nodes delay in computing gradients
§ Nodes don’t update PS
§ Nodes get stale gradients from PS
§ May not converge due to stale reads!
CHIEF WORKER
§ Chief Defaults to Worker Task 0
§ Task 0 is guaranteed to exist
§ Performs Maintenance Tasks
§ Writes log summaries
§ Instructs PS to checkpoint vars
§ Performs PS health checks
§ (Re-)Initialize variables at (re-)start of training
NODE AND PROCESS FAILURES
§ Checkpoint to Persistent Storage (HDFS, S3)
§ Use MonitoredTrainingSession and Hooks
§ Use a Good Cluster Orchestrator (ie. Kubernetes,Mesos)
§ Understand Failure Modes and Recovery States
Stateless, Not Bad: Training Continues Stateful, Bad: Training Must Stop Dios Mio! Long Night Ahead…

Recommended for you

KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing

Deep dive into KFServing: Serverless Model Inferencing Platform built on top of KNative and Istio. Part of the Kubeflow project, and deployed in production across organizations.

artificialintelligencejupyterkfserving
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018

This is the slides of my presentation at "the conf br 2018" called "Deep Learning - Segmentation, Identification and Classification"

deep learningcaffènvidia
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown

In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass). It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints. As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.

high throughput and low latencyp99p99 conf
ESTIMATOR, EXPERIMENT API
§ Simplify Model Building
§ Provide Clear Path to Production
§ Enable Rapid Model Experiments
§ Provide Flexible Parameter Tuning
§ Enable Downstream Optimizing & Serving Infra( )
§ Nudge Users to Best Practices Through Opinions
§ Provide Hooks/Callbacks to Override Opinions
§ Unified API for Local and Distributed TensorFlow
ESTIMATOR API
§ “Train-to-Serve” Design
§ Create Custom - or Use a Canned Estimator
§ Hides Session, Graph, Layers, Iterative Loops (Train, Eval, Predict)
§ Hooks for All Phases of Model Training and Evaluation
§ Load Input: input_fn()
§ Train: model_fn() and train()
§ Evaluate: evaluate()
§ Save and Export: export_savedmodel()
§ Predict: predict() Uses sess.run() Slow Predictions!
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/customestimator/
CANNED ESTIMATORS
§ Commonly-Used Estimators
§ Pre-Tested and Pre-Tuned
§ DNNClassifer, TensorForestEstimator
§ Always Use Canned Estimators If Possible
§ Reduce Lines of Code, Complexity, and Bugs
§ Use FeatureColumns to Define & Create Features
Custom vs. Canned
@ Google, August, 2017
COMBINE ESTIMATOR + DATASET API
def input_fn():
def generator():
while True:
yield ...
my_dataset = tf.data.dataset.from_generator(generator, tf.int32)
# A one-shot iterator automatically initializes itself on first use.
iter = my_dataset.make_one_shot_iterator()
# The return value of get_next() matches the dataset element type.
images, labels = iter.get_next()
return images, labels
# The input_fn can be used as a regular Estimator input function.
estimator = tf.estimator.Estimator(…)
estimator.train(train_input_fn=input_fn, …)

Recommended for you

Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus

Understanding the dynamics of GPU utilization and workloads in containerized systems is critical to creating efficient software systems. We create a set of dashboards to monitor and evaluate GPU performance in the context of TensorFlow. We monitor performance in real time to gain insight into GPU load, GPU memory and temperature metrics in a Kubernetes GPU enabled system. Visualizing TensorFlow training job metrics in real time using Prometheus allows us to tune and optimize GPU usage. Also, because Tensor flow jobs can have both GPU and CPU implementations it is useful to view detailed real time performance data from each implementation and choose the best implementation. To illustrate our system, we will show a live demo gathering and visualizing GPU metrics on a GPU enabled Kubernetes cluster with Prometheus and Grafana.

Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps

Ankara Tech Talks Meetup 23.01.2020 https://www.meetup.com/tr-TR/Ankara-Tech-Talks/events/268054959/

devopsmachine learningmlflow
AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data

This document discusses Amazon Web Services (AWS) products and services for building end-to-end machine learning and data strategies. It covers topics such as ML infrastructure, governance, data preparation, model training, deployment, and education. Specific services mentioned include Amazon SageMaker, AWS Lake Formation, Amazon Redshift, Amazon EMR, AWS Glue, and AWS services for hardware acceleration like AWS Trainium and AWS Graviton.

awsdata science on awsmachine learning
FEATURECOLUMN ABSTRACTION
§ Used by Canned Estimator
§ Simplifies Input Ingestion
§ Declarative Way to Specify Model Training Inputs
§ Converts Sparse Features to Dense Tensors
§ Sparse Features: Query Keyword, Url, ProductID,…
§ Wide/Linear Models Use Feature-Crossing
§ Deep Models Use Embeddings
SINGLE VS. MULTI-OBJECTIVES + HEADS
§ Single-Objective Estimator
§ Single classification prediction
§ Multi-Objective Estimator
§ Two (2) classification predictions
§ One (1) classification prediction + One(1) final layer
§ Multiple Heads Are Used to Ensemble Models
§ Treats neural network as a feature engineering step!
§ Supported by TensorFlow Serving
LAYERS API
§ Standalone Layer or Entire Sub-Graphs
§ Functions of Tensor Inputs & Outputs
§ Mix and Match with Operations
§ Assumes 1st Dimension is Batch Size
§ Handles One (1) to Many (*) Inputs
§ Metrics are Layers
§ Loss Metric (Per Mini-Batch)
§ Accuracy and MSE (Across Mini-Batches)
EXPERIMENT API
§ Easier-to-Use Distributed TensorFlow
§ Same API for Local and Distributed (*Theoretically)
§ Combines Estimator with input_fn()
§ Used for Training, Evaluation, & Hyper-Parameter Tuning
§ Distributed Training Defaults to Data-Parallel & Async
§ Cluster Configuration is Fixed at Start of Training Job
§ No Auto-Scaling Allowed!!

Recommended for you

Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf

Chris Fregly (Principal Solution Architect, AI and machine learning at AWS) will give a brief presentation on the various ways to perform scalable Pandas, Modin, and Ray on AWS. He will then answer questions from the audience and moderator, Alejandro Herrera (whatever he is) at Ponder. Chris Fregly is a Principal Solution Architect for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is the organizer of the Global Data Science on AWS meetup. He is co-author of the O'Reilly Book, "Data Science on AWS." Related Links O'Reilly Book: https://www.amazon.com/dp/1492079391/ Website: https://datascienceonaws.com Meetup: https://meetup.datascienceonaws.com GitHub Repo: https://github.com/data-science-on-aws/ YouTube: https://youtube.datascienceonaws.com Slideshare: https://slideshare.datascienceonaws.com

pandasawsaws sagemaker
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup

RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154 Talk #0: Introductions and Meetup Announcements By Chris Fregly and Antje Barth Talk #1: Ray Overview, Ray AI Runtime on AWS using Amazon SageMaker, EC2, EMR, EKS by Chris Fregly, Principal Specialist Solution Architect, AI and Machine Learning @ AWS Talk #2: Deep-dive Blueprints for Amazon Elastic Kubernetes Service (EKS) including Ray and Spark by Apoorva Kulkarni, Sr. Specialist Solution Architect, Containers and Kubernetes @ AWS RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154 Zoom link: https://us02web.zoom.us/j/82308186562 Related Links O'Reilly Book: https://www.amazon.com/dp/1492079391/ Website: https://datascienceonaws.com Meetup: https://meetup.datascienceonaws.com GitHub Repo: https://github.com/data-science-on-aws/ YouTube: https://youtube.datascienceonaws.com Slideshare: https://slideshare.datascienceonaws.com

Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated

The document discusses using multi-armed bandit tests to compare natural language models. It describes training BERT models with TensorFlow and PyTorch, and training a multi-armed bandit model with Vowpal Wabbit for reinforcement learning. It then demonstrates testing the BERT models with the bandit model and scaling multi-armed bandits on AWS.

softwarebertmachine learning
ESTIMATOR, EXPERIMENT CONFIGS
§ TF_CONFIG
§ Special environment variable for config
§ Defines ClusterSpec in JSON incl. master, workers, PS’s
§ Distributed mode ‘{“environment”:“cloud”}’
§ Local: ‘{environment”:“local”, {“task”:{”type”:”worker”}}’
§ RunConfig: Defines checkpoint interval, output directory,
§ HParams: Hyper-parameter tuning parameters and ranges
§ learn_runner creates RunConfig before calling run() & tune()
§ schedule is set based on {”task”:{”type”}}
TF_CONFIG=
'{
"environment": "cloud",
"cluster":
{
"master":["worker0:2222”],
"worker":["worker1:2222"],
"ps": ["ps0:2222"]
},
"task": {"type": "ps",
"index": "0"}
}'
OPTIMIZER, ESTIMATOR API + TPU’S
run_config = tpu_config.RunConfig()
estimator = tpu_estimator.TpuEstimator(model_fn=model_fn,
config=run_config)
estimator.train(input_fn=input_fn,
num_epochs=10,
…)
optimizer = tpu_optimizer.CrossShardOptimizer(
tf.train.GradientDescentOptimizer(learning_rate=…)
)
train_op = optimizer.minimize(loss)
estimator_spec = tf.estimator.EstimatorSpec(train_op=train_op,
loss=…)
SEPARATE TRAINING + EVALUATION
§ Separate Training and Evaluation Clusters
§ Evaluate Upon Checkpoint
§ Avoid Resource Contention
§ Let Training Continue in Parallel with Evaluation
Training
Cluster
Evaluation
Cluster
Parameter Server
Cluster
LET’S TRAIN DISTRIBUTED TENSORFLOW
§ Navigate to the following notebook:
05_Train_Model_Distributed_CPU
or 05a_Train_Model_Distributed_GPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks

Recommended for you

Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning

Amazon reInvent 2020 Recap: AI and Machine Learning Video here: https://youtu.be/YSXe02Y5pHM NEW RELEASE! Build, Automate, Manage, and Scale ML Workflows with the NEW Amazon SageMaker Pipelines by Hallie Crosby Weishahn. Description of Talk and Demo AWS recently announced Amazon SageMaker Pipelines (https://aws.amazon.com/sagemaker/pipelines/), the first purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning. SageMaker Pipelines has three main components which improve the operational resilience and reproducibility of your workflows: 1) pipelines, 2) model registry, and 3) projects. In this talk and demo, Hallie will walk us through the new Amazon SageMaker Pipelines feature including MLOps support. Date/Time 9-10am US Pacific Time (Third Monday of Every Month) RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154 Meetup: https://www.meetup.com/Data-Science-on-AWS/ Zoom: https://zoom.us/j/690414331 Webinar ID: 690 414 331 Phone: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll) Related Links Meetup: https://meetup.datascienceonaws.com GitHub Repo: https://github.com/data-science-on-aws/ O'Reilly Book: https://datascienceonaws.com YouTube: https://youtube.datascienceonaws.com Slideshare: https://slideshare.datascienceonaws.com Support: https://support.pipeline.ai Monthly Workshop: https://www.eventbrite.com/e/full-day-workshop-kubeflow-gpu-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-tickets-63362929227 RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154

tensorflow aimachine learning pipelinemachine learning
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...

The document discusses Amazon SageMaker Model Monitor and Debugger for monitoring machine learning models in production. SageMaker Model Monitor collects prediction data from endpoints, creates a baseline, and runs scheduled monitoring jobs to detect deviations from the baseline. It generates reports and metrics in CloudWatch. SageMaker Debugger helps debug training issues by capturing debug data with no code changes and providing real-time alerts and visualizations in Studio. Both services help detect model degradation and take corrective actions like retraining.

artificial intelligencemachine learningmachine learning pipeline
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket

Quantum Computing with Amazon Braket In this talk, I describe some fundamental principles of quantum computing including qu-bits, superposition, and entanglement. I will demonstrate how to perform secure quantum computing tasks across many Quantum Processing Units (QPUs) using Amazon Braket, IAM, and S3. AI and Machine Learning, Quantum Computing, Amazon Braket, QPU

quantum computingamazon braketqubit
PULSE CHECK
BREAK
§ Please 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Need Help?
Use the Chat!
AGENDA
Part 1: Optimize TensorFlow Model Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Model Training on a Cluster
§ Optimize Training with JIT XLA Compiler
XLA FRAMEWORK
§ XLA: “Accelerated Linear Algebra”
§ Reduce Reliance on Custom Operators
§ Improve Execution Speed
§ Improve Memory Usage
§ Reduce Mobile Footprint
§ Improve Portability
Helps TensorFlow Stay Flexible, Yet Still Performant

Recommended for you

15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person

In this talk, we present tips and best practices for scaling a large workshop for 1,000's of simultaneous attendees - both online and in-person. While our workshop is focused on AI and machine learning on AWS, we generalize our learnings for any domain or specialization.

artificial intelligencemachine learninghigh scalability
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap

The document provides an overview of announcements from Amazon Web Services' annual re:Invent conference in December 2019. Key details include: - The conference had 65,000 attendees and 3,000 sessions. - Announcements covered improving the developer experience, compute, storage, AI/ML, databases/analytics, networking, security, and extending AWS beyond regions. - New services and features were announced for Lambda, API Gateway, Step Functions, EventBridge, Amplify, SageMaker, EC2, EKS, EBS, S3, Rekognition, Lex, Translate, Transcribe, Comprehend, Personalize, Forecast, Fraud Detector, and more.

awsaws sagemakerreinvent
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...

This document provides an overview and agenda for a workshop on end-to-end machine learning pipelines using TFX, Kubeflow, Airflow and MLflow. The agenda covers setting up an environment with Kubernetes, using TensorFlow Extended (TFX) components to build pipelines, ML pipelines with Airflow and Kubeflow, hyperparameter tuning with Kubeflow, and deploying notebooks with Kubernetes. Hands-on exercises are also provided to explore key areas like TensorFlow Data Validation, TensorFlow Transform, TensorFlow Model Analysis and Airflow ML pipelines.

tensorflowkeraspytorch
XLA HIGH LEVEL OPTIMIZER (HLO)
§ HLO: “High Level Optimizer”
§ Compiler Intermediate Representation (IR)
§ Independent of source and target language
§ XLA Step 1 Emits Target-Independent HLO
§ XLA Step 2 Emits Target-Dependent LLVM
§ LLVM Emits Native Code Specific to Target
§ Supports x86-64, ARM64 (CPU), and NVPTX (GPU)
JIT COMPILER
§ JIT: “Just-In-Time” Compiler
§ Built on XLA Framework
§ Reduce Memory Movement – Especially with GPUs
§ Reduce Overhead of Multiple Function Calls
§ Similar to Spark Operator Fusing in Spark 2.0
§ Unroll Loops, Fuse Operators, Fold Constants, …
§ Scopes: session, device, `with jit_scope():`
VISUALIZING JIT COMPILER IN ACTION
Before JIT After JIT
Google Web Tracing Framework:
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(options=run_options,
run_metadata=run_metadata)
VISUALIZING FUSING OPERATORS
pip install graphviz
dot -Tpng 
/tmp/hlo_graph_1.w5LcGs.dot 
-o hlo_graph_1.png
GraphViz:
http://www.graphviz.org
hlo_*.dot files generated by XLA

Recommended for you

Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...

Title TensorFlow + Swift + OpenAI's Unsupervised Sentiment Neuron, KubeFlow, TFX, Kubernetes, Kafka, Airflow, Jupyter, Scikit Agenda 1. TensorFlow + Swift + OpenAI's Unsupervised Sentiment Neuron (45 mins) Speaker: Tanmay Bakshi (https://www.linkedin.com/in/tanmay-bakshi-b15012a1) (More details coming soon...) 2. KubeFlow, TFX, Kubernetes, Kafka, Airflow, Jupyter, Scikit, GPU/TPU, Kafka, Scikit-Learn and JupyterLab (15 mins) (More details coming soon...) ** RSVP & LOGIN HERE ** Eventbrite: https://www.eventbrite.com/e/1-hr-free-pipelineai-gpu-tpu-spark-tensorflow-kubernetes-kafka-scikit-tickets-45852865154 Meetup: https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/kczhrpyxhbcc/ Zoom: https://zoom.us/j/690414331 Webinar ID: 690 414 331 Phone: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll) Related Links PipelineAI Home: https://pipeline.ai PipelineAI Community Edition: http://community.pipeline.ai PipelineAI GitHub: https://github.com/PipelineAI/pipeline Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup YouTube Videos: https://youtube.pipeline.ai SlideShare Presentations: https://slideshare.pipeline.ai Slack Support: https://joinslack.pipeline.ai Web Support and Knowledge Base: https://support.pipeline.ai Email Support: help@pipeline.ai

swifttensorflowkeras
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...

Title Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU Video https://youtu.be/vaB4IM6ySD0 Description In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning. Pre-requisites Modern browser - and that's it! Every attendee will receive a cloud instance Nothing will be installed on your local laptop Everything can be downloaded at the end of the workshop Location Online Workshop Agenda 1. Create a Kubernetes cluster 2. Install KubeFlow, Airflow, TFX, and Jupyter 3. Setup ML Training Pipelines with KubeFlow and Airflow 4. Transform Data with TFX Transform 5. Validate Training Data with TFX Data Validation 6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow 7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow 8. Analyze Models using TFX Model Analysis and Jupyter 9. Perform Hyper-Parameter Tuning with KubeFlow 10. Select the Best Model using KubeFlow Experiment Tracking 11. Reproduce Model Training with TFX Metadata Store and Pachyderm 12. Deploy the Model to Production with TensorFlow Serving and Istio 13. Save and Download your Workspace Key Takeaways Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools. Related Links 1. PipelineAI Home: https://pipeline.ai 2. PipelineAI Community Edition: http://community.pipeline.ai 3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline 4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup 5. YouTube Videos: https://youtube.pipeline.ai 6. SlideShare Presentations: https://slideshare.pipeline.ai 7. Slack Support: https://joinslack.pipeline.ai 8. Web Support and Knowledge Base: https://support.pipeline.ai 9. Email Support: support@pipeline.ai

tensorflowkeraskubeflow
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...

Traditional machine learning pipelines end with life-less models sitting on disk in the research lab. These traditional models are typically trained on stale, offline, historical batch data. Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production. Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered: * Continuous Validation (V) * Continuous Optimizing (O) * Continuous Training (T) * Continuous Explainability (E). The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.) Attendees will learn to create continuous machine learning pipelines in production with PipelineAI, TensorFlow, and Kafka.

tensorflowkerasmachine learning
LET’S TRAIN WITH XLA CPU
§ Navigate to the following notebook:
06_Train_Model_XLA_CPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
LET’S TRAIN WITH XLA GPU
§ Navigate to the following notebook:
06a_Train_Model_XLA_GPU
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime

Recommended for you

PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...

Perform Online Predictions using Slack A/B and multi-armed bandit model compare Train Online Models with Kafka Streams Create new models quickly Deploy to production safely Mirror traffic to validate online performance Any Framework, Any Hardware, Any Cloud Dashboard to manage the lifecycle of models from local development to live production Generates optimized runtimes for the models Custom targeting rules, shadow mode, and percentage-based rollouts to safely test features in live production Continuous model training, model validation, and pipeline optimization https://youtu.be/zpkH9oiIovU https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/258276286/ Related Links PipelineAI Home: https://pipeline.ai PipelineAI Community Edition: https://community.pipeline.ai PipelineAI GitHub: https://github.com/PipelineAI/pipeline PipelineAI Quick Start: https://quickstart.pipeline.ai Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup YouTube Videos: https://youtube.pipeline.ai SlideShare Presentations: https://slideshare.pipeline.ai Slack Support: https://joinslack.pipeline.ai Web Support and Knowledge Base: https://support.pipeline.ai Email Support: help@pipeline.ai

artificial intelligencedistributed tensorflowmachine learning
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...

This document discusses distributed deep learning on the MapR Converged Data Platform. It provides an overview of MapR's enterprise big data journey and capabilities for distributed deep learning. It describes using containers and Kubernetes for deep learning model development and deployment, with NVIDIA GPUs for computation. It presents architectures and patterns for separating or collocating MapR and GPU clusters. Finally, it previews demos of parameter server/workers and real-time face detection using streams.

streaming datakubernetestensorflow
Cloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco ProductsCloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco Products

Analytics use cases for telco

AOT COMPILER
§ Standalone, Ahead-Of-Time (AOT) Compiler
§ Built on XLA framework
§ tfcompile
§ Creates executable with minimal TensorFlow Runtime needed
§ Includes only dependencies needed by subgraph computation
§ Creates functions with feeds (inputs) and fetches (outputs)
§ Packaged as cc_libary header and object files to link into your app
§ Commonly used for mobile device inference graph
§ Currently, only CPU x86-64 and ARM are supported - no GPU
GRAPH TRANSFORM TOOL (GTT)
§ Post-Training Optimization to Prepare for Inference
§ Remove Training-only Ops (checkpoint, drop out, logs)
§ Remove Unreachable Nodes between Given feed -> fetch
§ Fuse Adjacent Operators to Improve Memory Bandwidth
§ Fold Final Batch Norm mean and variance into Variables
§ Round Weights/Variables to improve compression (ie. 70%)
§ Quantize (FP32 -> INT8) to Speed Up Math Operations
AFTER TRAINING, BEFORE OPTIMIZATION
-TensorFlow-
Trains
Variables
-User-
Fetches
Outputs
-User-
Feeds
Inputs
-TensorFlow-
Performs
Operations
-TensorFlow-
Flows
Tensors ?!
POST-TRAINING GRAPH TRANSFORMS
transform_graph 
--in_graph=tensorflow_inception_graph.pb  ß Original Graph
--out_graph=optimized_inception_graph.pb  ß Transformed Graph
--inputs='Mul'  ß Feed (Input)
--outputs='softmax'  ß Fetch (Output)
--transforms=' ß List of Transforms
strip_unused_nodes
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
quantize_weights
quantize_nodes'

Recommended for you

Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)

Sin Involves More Than You Might Think (We'll Explain)

Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe

Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe

NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme

Data & Finanace

nps
AFTER STRIPPING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ Results
§ Graph much simpler
§ File size much smaller
AFTER REMOVING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ Results
§ Pesky nodes removed
§ File size a bit smaller
AFTER FOLDING CONSTANTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ Results
§ Placeholders (feeds) -> Variables*
(*Why Variables and not Constants?)
AFTER FOLDING BATCH NORMS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ Results
§ Graph remains the same
§ File size approximately the same

Recommended for you

LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx

LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time. It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.

Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money

Seamlessly Pay Online, Pay In Stores or Send Money

Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe

Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe

AFTER QUANTIZING WEIGHTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ quantize_weights
§ Results
§ Graph is same, file size is smaller, compute is faster
WEIGHT QUANTIZATION
§ FP16 and INT8 Are Smaller and Computationally Simpler
§ Weights/Variables are Constants
§ Easy to Linearly Quantize
LET’S OPTIMIZE FOR INFERENCE
§ Navigate to the following notebook:
07_Optimize_Model*
*Why just CPU version? Why not GPU?
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
BUT WAIT, THERE’S MORE!

Recommended for you

Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...

Los sistemas distribuidos son difíciles. Los sistemas distribuidos de alto rendimiento, más. Latencias de red, mensajes sin confirmación de recibo, reinicios de servidores, fallos de hardware, bugs en el software, releases problemáticas, timeouts... hay un montón de motivos por los que es muy difícil saber si un mensaje que has enviado se ha recibido y procesado correctamente en destino. Así que para asegurar mandas el mensaje otra vez.. y otra... y cruzas los dedos para que el sistema del otro lado tenga tolerancia a los duplicados. QuestDB es una base de datos open source diseñada para alto rendimiento. Nos queríamos asegurar de poder ofrecer garantías de "exactly once", deduplicando mensajes en tiempo de ingestión. En esta charla, te cuento cómo diseñamos e implementamos la palabra clave DEDUP en QuestDB, permitiendo deduplicar y además permitiendo Upserts en datos en tiempo real, añadiendo solo un 8% de tiempo de proceso, incluso en flujos con millones de inserciones por segundo. Además, explicaré nuestra arquitectura de log de escrituras (WAL) paralelo y multithread. Por supuesto, todo esto te lo cuento con demos, para que veas cómo funciona en la práctica.

time-seriesquestdbdatabases
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe

South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe

Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe

Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe

ACTIVATION QUANTIZATION
§ Activations Not Known Ahead of Time
§ Depends on input, not easy to quantize
§ Requires Additional Calibration Step
§ Use a “representative” dataset
§ Per Neural Network Layer…
§ Collect histogram of activation values
§ Generate many quantized distributions with different saturation thresholds
§ Choose threshold to minimize…
KL_divergence(ref_distribution, quant_distribution)
§ Not Much Time or Data is Required (Minutes on Commodity Hardware)
AFTER ACTIVATION QUANTIZATION
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ quantize_weights
§ quantize_nodes (activations)
§ Results
§ Larger graph, needs calibration!
Requires Additional
freeze_requantization_ranges
LET’S OPTIMIZE FOR INFERENCE
§ Navigate to the following notebook:
08_Optimize_Model_Activations
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
FREEZING MODEL FOR DEPLOYMENT
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ quantize_weights
§ quantize_nodes
§ freeze_graph
§ Results
§ Variables -> Constants
Finally!
We’re Ready to Deploy!!

Recommended for you

AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure

Airline Satisfaction Project using Azure This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.

data science
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT

### Data Description and Analysis Summary for Presentation #### 1. **Importing Libraries** Libraries used: - `pandas`, `numpy`: Data manipulation - `matplotlib`, `seaborn`: Data visualization - `scikit-learn`: Machine learning utilities - `statsmodels`, `pmdarima`: Statistical modeling - `keras`: Deep learning models #### 2. **Loading and Exploring the Dataset** **Dataset Overview:** - **Source:** CSV file (`mumbai-monthly-rains.csv`) - **Columns:** - `Year`: The year of the recorded data. - `Jan` to `Dec`: Monthly rainfall data. - `Total`: Total annual rainfall. **Initial Data Checks:** - Displayed first few rows. - Summary statistics (mean, standard deviation, min, max). - Checked for missing values. - Verified data types. **Visualizations:** - **Annual Rainfall Time Series:** Trends in annual rainfall over the years. - **Monthly Rainfall Over Years:** Patterns and variations in monthly rainfall. - **Yearly Total Rainfall Distribution:** Distribution and frequency of annual rainfall. - **Box Plots for Monthly Data:** Spread and outliers in monthly rainfall. - **Correlation Matrix of Monthly Rainfall:** Relationships between different months' rainfall. #### 3. **Data Transformation** **Steps:** - Ensured 'Year' column is of integer type. - Created a datetime index. - Converted monthly data to a time series format. - Created lag features to capture past values. - Generated rolling statistics (mean, standard deviation) for different window sizes. - Added seasonal indicators (dummy variables for months). - Dropped rows with NaN values. **Result:** - Transformed dataset with additional features ready for time series analysis. #### 4. **Data Splitting** **Procedure:** - Split the data into features (`X`) and target (`y`). - Further split into training (80%) and testing (20%) sets without shuffling to preserve time series order. **Result:** - Training set: `(X_train, y_train)` - Testing set: `(X_test, y_test)` #### 5. **Automated Hyperparameter Tuning** **Tool Used:** `pmdarima` - Automatically selected the best parameters for the SARIMA model. - Evaluated using metrics such as AIC and BIC. **Output:** - Best SARIMA model parameters and statistical summary. #### 6. **SARIMA Model** **Steps:** - Fit the SARIMA model using the training data. - Evaluated on both training and testing sets using MAE and RMSE. **Output:** - **Train MAE:** Indicates accuracy on training data. - **Test MAE:** Indicates accuracy on unseen data. - **Train RMSE:** Measures average error magnitude on training data. - **Test RMSE:** Measures average error magnitude on testing data. #### 7. **LSTM Model** **Preparation:** - Reshaped data for LSTM input. - Converted data to `float32`. **Model Building and Training:** - Built an LSTM model with one LSTM layer and one Dense layer. - Trained the model on the training data. **Evaluation:** - Evaluated on both training and testing sets using MAE and RMSE. **Output:** - **Train MAE:** Accuracy on training data. - **T

mumbai rainfalls
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm

LSTM algorithm

AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
MODEL SERVING TERMINOLOGY
§ Inference
§ Only Forward Propagation through Network
§ Predict, Classify, Regress, …
§ Bundle
§ GraphDef, Variables, Metadata, …
§ Assets
§ ie. Map of ClassificationID -> String
§ {9283: “penguin”, 9284: “bridge”}
§ Version
§ Every Model Has a Version Number (Integer)
§ Version Policy
§ ie. Serve Only Latest (Highest), Serve Both Latest and Previous, …
TENSORFLOW SERVING FEATURES
§ Supports Auto-Scaling
§ Custom Loaders beyond File-based
§ Tune for Low-latency or High-throughput
§ Serve Diff Models/Versions in Same Process
§ Customize Models Types beyond HashMap and TensorFlow
§ Customize Version Policies for A/B and Bandit Tests
§ Support Request Draining for Graceful Model Updates
§ Enable Request Batching for Diff Use Cases and HW
§ Supports Optimized Transport with GRPC and Protocol Buffers
PREDICTION SERVICE
§ Predict (Original, Generic)
§ Input: List of Tensor
§ Output: List of Tensor
§ Classify
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (class_label: String, score: float)
§ Regress
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (label: String, score: float)

Recommended for you

Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeLaxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript

学历认证补办制【微信:A575476】【(ACU毕业证)澳大利亚天主教大学毕业证成绩单offer】【微信:A575476】(留信学历认证永久存档查询)采用学校原版纸张,特殊工艺完全按照原版一比一制作(包括:隐形水印,阴影底纹,钢印LOGO烫金烫银,LOGO烫金烫银复合重叠,文字图案浮雕,激光镭射,紫外荧光,温感,复印防伪)行业标杆!精益求精,诚心合作,真诚制作!多年品质 ,按需精细制作,24小时接单,全套进口原装设备,十五年致力于帮助留学生解决难题,业务范围有加拿大、英国、澳洲、韩国、美国、新加坡,新西兰等学历材料,包您满意。 【业务选择办理准则】 一、工作未确定,回国需先给父母、亲戚朋友看下文凭的情况,办理一份就读学校的毕业证【微信:A575476】文凭即可 二、回国进私企、外企、自己做生意的情况,这些单位是不查询毕业证真伪的,而且国内没有渠道去查询国外文凭的真假,也不需要提供真实教育部认证。鉴于此,办理一份毕业证【微信:A575476】即可 三、进国企,银行,事业���位,考公务员等等,这些单位是必需要提供真实教育部认证的,办理教育部认证所需资料众多且烦琐,所有材料您都必须提供原件,我们凭借丰富的经验,快捷的绿色通道帮您快速整合材料,让您少走弯路。 留信网认证的作用: 1:该专业认证可证明留学生真实身份【微信:A575476】 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内,将在公安局网内查询个人身份证信息后,同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料,供国家高端企业选择人才 → 【关于价格问题(保证一手价格) 我们所定的价格是非常合理的,而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子 我给客户的都是第一手的代理价格,因为我想坦诚对待大家 不想跟大家在价格方面浪费时间 对于老客户或者被老客户介绍过来的朋友,我们都会适当给一些优惠。 选择实体注册公司办理,更放心,更安全!我们的承诺:可来公司面谈,可签订合同,会陪同客户一起到教育部认证窗口递交认证材料,客户在教育部官方认证查询网站查询到认证通过结果后付款,不成功不收费! 办理(ACU毕业证)澳大利亚天主教大学毕业证【微信:A575476】外观非常精致,由特殊纸质材料制成,上面印有校徽、校名、毕业生姓名、专业等信息。 办理(ACU毕业证)澳大利亚天主教大学毕业证【微信:A575476】格式相对统一,各专业都有相应的模板。通常包括以下部分: 校徽:象征着学校的荣誉和传承。 校名:学校英文全称 授予学位:本部分将注明获得的具体学位名称。 毕业生姓名:这是最重要的信息之一,标志着该证书是由特定人员获得的。 颁发日期:这是毕业正式生效的时间,也代表着毕业生学业的结束。 其他信息:根据不同的专业和学位,可能会有一些特定的信息或章节。 办理(ACU毕业证)澳大利亚天主教大学毕业证【微信:A575476】价值很高,需要妥善保管。一般来说,应放置在安全、干燥、防潮的地方,避免长时间暴露在阳光下。如需使用,最好使用复印件而不是原件,以免丢失。 综上所述,办理(ACU毕业证)澳大利亚天主教大学毕业证【微信:A575476 】是证明身份和学历的高价值文件。外观简单庄重,格式统一,包括重要的个人信息和发布日期。对持有人来说,妥善保管是非常重要的。

埃尔福特应用技术大学毕业证埃尔福特大学毕业证埃尔朗根-纽伦堡大学毕业证
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

PREDICTION INPUTS + OUTPUTS
§ SignatureDef
§ Defines inputs and outputs
§ Maps external (logical) to internal (physical) tensor names
§ Allows internal (physical) tensor names to change
from tensorflow.python.saved_model import utils
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
graph = tf.get_default_graph()
x_observed = graph.get_tensor_by_name('x_observed:0')
y_pred = graph.get_tensor_by_name('add:0')
inputs_map = {'inputs': x_observed}
outputs_map = {'outputs': y_pred}
predict_signature =
signature_def_utils.predict_signature_def(inputs=inputs_map,
outputs=outputs_map)
MULTI-HEADED INFERENCE
§ Inputs Pass Through Model One Time
§ Model Returns Multiple Predictions:
1. Human-readable prediction (ie. “penguin”, “church”,…)
2. Final layer of scores (float vector)
§ Final Layer of floats Pass to the Next Model in Ensemble
§ Optimizes Bandwidth, CPU/GPU, Latency, Memory
§ Enables Complex Model Composing and Ensembling
BUILD YOUR OWN MODEL SERVER
§ Adapt GRPC(Google) <-> HTTP (REST of the World)
§ Perform Batch Inference vs. Request/Response
§ Handle Requests Asynchronously
§ Support Mobile, Embedded Inference
§ Customize Request Batching
§ Add Circuit Breakers, Fallbacks
§ Control Latency Requirements
§ Reduce Number of Moving Parts
#include
“tensorflow_serving/model_servers/server_core.h”
class MyTensorFlowModelServer {
ServerCore::Options options;
// set options (model name, path, etc)
std::unique_ptr<ServerCore> core;
TF_CHECK_OK(
ServerCore::Create(std::move(options), &core)
);
}
Compile and Link with
libtensorflow.so
RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ Similar to TF Graph Transform Tool
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!

Recommended for you

[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers

간단해 보이지만 실제로는 복잡한 몇 가지 Amazon DynamoDB 디자인 퍼즐을 함께 해결하며 DynamoDB가 대규모로 작동하는 방식에 대해 자세히 알아봅니다. DynamoDB의 작동 방식을 이해함으로써 더 효과적이고 확장 가능한 솔루션을 찾는 방법을 알아보세요.

awsdatabasedynamodb
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks

From Clues to Connections: How Social Media Investigators Expose Hidden Networks

social media investigatorssocialconnections
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe

Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe

AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
SAVED MODEL FORMAT
§ Navigate to the following notebook:
09_Deploy_Optimized_Model
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
AGENDA
Part 2: Optimize TensorFlow Model Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
REQUEST BATCH TUNING
§ max_batch_size
§ Enables throughput/latency tradeoff
§ Bounded by RAM
§ batch_timeout_micros
§ Defines batch time window, latency upper-bound
§ Bounded by RAM
§ num_batch_threads
§ Defines parallelism
§ Bounded by CPU cores
§ max_enqueued_batches
§ Defines queue upper bound, throttling
§ Bounded by RAM
Reaching either threshold
will trigger a batch

Recommended for you

Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe

Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe

ADVANCED BATCHING & SERVING TIPS
§ Batch Just the GPU/TPU Portions of the Computation Graph
§ Batch Arbitrary Sub-Graphs using Batch / Unbatch Graph Ops
§ Distribute Large Models Into Shards Across TensorFlow Model Servers
§ Batch RNNs Used for Sequential and Time-Series Data
§ Find Best Batching Strategy For Your Data Through Experimentation
§ BasicBatchScheduler: Homogeneous requests (ie Regress or Classify)
§ SharedBatchScheduler: Mixed requests, multi-step, ensemble predict
§ StreamingBatchScheduler: Mixed CPU/GPU/IO-bound Workloads
§ Serve Only One (1) Model Inside One (1) TensorFlow Serving Process
§ Much Easier to Debug, Tune, Scale, and Manage Models in Production.
LET’S DEPLOY OPTIMIZED MODEL
§ Navigate to the following notebook:
10_Optimize_Model_Server
§ https://github.com/PipelineAI/pipeline/tree/master/
gpu.ml/notebooks
AGENDA
Part 0: Latest PipelineAI Research
Part 1: Optimize TensorFlow Model Training
Part 2: Optimize TensorFlow Model Serving
THANK YOU!! QUESTIONS?
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
Contact Me
chris@pipeline.ai
@cfregly

Recommended for you

More Related Content

What's hot

Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Chris Fregly
 
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
Chris Fregly
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
Chris Fregly
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
Chris Fregly
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
Chris Fregly
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Chris Fregly
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Chris Fregly
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
Wangda Tan
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Wangda Tan
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
Pradeep Kumar
 
High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012
Erik Onnen
 
Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on Kubernetes
Dinakar Guniguntala
 
One-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER SystemsOne-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER Systems
Pradeep Kumar
 
Quest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFREDQuest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFRED
Andi Smith
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
Amazon Web Services
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
DataStax
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
Dinakar Guniguntala
 
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to Polyaxon
Yu Ishikawa
 

What's hot (20)

Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
 
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
 
High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012
 
Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on Kubernetes
 
One-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER SystemsOne-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER Systems
 
Quest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFREDQuest for the Perfect Workflow for McrFRED
Quest for the Perfect Workflow for McrFRED
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to Polyaxon
 

Viewers also liked

Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 
Zen of Akka
Zen of AkkaZen of Akka
Zen of Akka
Konrad Malawski
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka features
Grzegorz Duda
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Jonas Bonér
 
Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka
Jonas Bonér
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
Konrad Malawski
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
Evan Chan
 
numPYNQ @ NGCLE@e-Novia 15.11.2017
numPYNQ @ NGCLE@e-Novia 15.11.2017numPYNQ @ NGCLE@e-Novia 15.11.2017
numPYNQ @ NGCLE@e-Novia 15.11.2017
NECST Lab @ Politecnico di Milano
 
In-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersIn-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and Engineers
Denis Magda
 
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Patricia Aas
 
Docker Networking
Docker NetworkingDocker Networking
Docker Networking
Kingston Smiler
 
Graduating To Go - A Jumpstart into the Go Programming Language
Graduating To Go - A Jumpstart into the Go Programming LanguageGraduating To Go - A Jumpstart into the Go Programming Language
Graduating To Go - A Jumpstart into the Go Programming Language
Kaylyn Gibilterra
 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?
Black Duck by Synopsys
 
Scale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneScale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOne
Roman Elizarov
 
Communication hardware
Communication hardwareCommunication hardware
Communication hardware
Hans Mallen
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
Aj MaChInE
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migration
Rogue Wave Software
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
Joris Bonnefoy
 

Viewers also liked (19)

Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Zen of Akka
Zen of AkkaZen of Akka
Zen of Akka
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka features
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
 
Introducing Akka
Introducing AkkaIntroducing Akka
Introducing Akka
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
numPYNQ @ NGCLE@e-Novia 15.11.2017
numPYNQ @ NGCLE@e-Novia 15.11.2017numPYNQ @ NGCLE@e-Novia 15.11.2017
numPYNQ @ NGCLE@e-Novia 15.11.2017
 
In-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersIn-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and Engineers
 
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
 
Docker Networking
Docker NetworkingDocker Networking
Docker Networking
 
Graduating To Go - A Jumpstart into the Go Programming Language
Graduating To Go - A Jumpstart into the Go Programming LanguageGraduating To Go - A Jumpstart into the Go Programming Language
Graduating To Go - A Jumpstart into the Go Programming Language
 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?
 
Scale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneScale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOne
 
Communication hardware
Communication hardwareCommunication hardware
Communication hardware
 
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode[若渴計畫] Challenges and Solutions of Window Remote Shellcode
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migration
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
 

Similar to High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 2017

Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Data Con LA
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
inside-BigData.com
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014
Puppet
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014
Miguel Zuniga
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
inside-BigData.com
 
Containers explained as for cook and a mecanics
 Containers explained as for cook and a mecanics  Containers explained as for cook and a mecanics
Containers explained as for cook and a mecanics
Rachid Zarouali
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
Julien SIMON
 
Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3
Jean-Frederic Clere
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Altoros
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
inside-BigData.com
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
Tatiana Al-Chueyr
 
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc  2015 HTTP 1, HTTP 2 and folksDevoxx Maroc  2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
Nicolas Martignole
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
Serhat Dirik
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
Animesh Singh
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
Fabio Janiszevski
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Databricks
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
Fatih Baltacı
 

Similar to High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 2017 (20)

Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Containers explained as for cook and a mecanics
 Containers explained as for cook and a mecanics  Containers explained as for cook and a mecanics
Containers explained as for cook and a mecanics
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
 
Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc  2015 HTTP 1, HTTP 2 and folksDevoxx Maroc  2015 HTTP 1, HTTP 2 and folks
Devoxx Maroc 2015 HTTP 1, HTTP 2 and folks
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 

More from Chris Fregly

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
Chris Fregly
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
Chris Fregly
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Chris Fregly
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Chris Fregly
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Chris Fregly
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
Chris Fregly
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
Chris Fregly
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
Chris Fregly
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
Chris Fregly
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
Chris Fregly
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Chris Fregly
 

More from Chris Fregly (15)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 

Recently uploaded

Cloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco ProductsCloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco Products
luqmansyauqi2
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
sapna sharmap11
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
vasudha malikmonii$A17
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
ASISHSABAT3
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
gargtinna79
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
kumkum tuteja$A17
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
simmi singh$A17
 
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
nikita dubey$A17
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
GaneshGanesh399816
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
fatimaezzahraboumaiz2
 
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeLaxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
yogita singh$A17
 
Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript
taqyea
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
aashuverma204
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
Milind Agarwal
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
khansayyad1256
 

Recently uploaded (20)

Cloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco ProductsCloud Analytics Use Cases - Telco Products
Cloud Analytics Use Cases - Telco Products
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeSouth Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
South Ex @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
 
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeLaxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden NetworksFrom Clues to Connections: How Social Media Investigators Expose Hidden Networks
From Clues to Connections: How Social Media Investigators Expose Hidden Networks
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
 

High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 2017

  • 1. HIGH PERFORMANCE TENSORFLOW IN PRODUCTION WITH GPUS CHRIS FREGLY, FOUNDER @PIPELINE.AI BIG DATA SPAIN, MADRID - NOV 15, 2017 I LOVE THIS CONFERENCE!!
  • 2. INTRODUCTIONS: ME § Chris Fregly, Founder & Engineer @ PipelineAI § Formerly Netflix and Databricks § Advanced Spark and TensorFlow Meetup Please Join Our 50,000+ Global Members!! Contact Me chris@pipeline.ai @cfregly Global Locations * San Francisco * Chicago * Austin * Washington DC * Dusseldorf * London
  • 3. INTRODUCTIONS: YOU § Software Engineer, Data Scientist, Data Engineer, Data Analyst § Interested in Optimizing and Deploying TF Models to Production § Nice to Have a Working Knowledge of TensorFlow (Not Required)
  • 4. CONTENT BREAKDOWN 50% Model Training Optimizations (GPU, Ingestion Pipeline, JIT) Boring & Batch Offline in Research Lab No Real-Time Serving Skills Pipeline Stops at Training No Feedback with Runtime Small Number of Data Scientists 10’s Training Jobs / Day Exciting & Real-Time!! Online in Live Production Unique Real-Time Serving Skills Pipeline Extends into Production Continuous Feedback with Training Large Number of App Devs & Users 1,000,000’s Predictions / Sec<<< 50% Model Serving Optimizations (Post-Processing, TF Serving, AOT)
  • 5. AGENDA Part 0: Latest PipelineAI Research Part 1: Optimize TensorFlow Model Training Part 2: Optimize TensorFlow Model Serving
  • 6. 100% OPEN SOURCE CODE § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline/tree/master/gpu.ml
  • 7. HANDS-ON EXERCISES § Combo of Jupyter Notebooks and Command Line § Command Line through Jupyter Terminal § Some Exercises Based on Experimental Features You May See Errors. Stay Calm. You Will Be OK!!
  • 8. PIPELINE.AI OVERVIEW 400,000 Docker Downloads 50,000 Users registered for PipelineAI GA Release 2,000 GitHub Stars 15 Enterprise Beta Users
  • 9. AGENDA Part 0: Latest PipelineAI Research § Package, Deploy, and Tune Both Model + Runtime § Deploy Models and Experiments Safely to Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud
  • 10. PACKAGE MODEL + RUNTIME AS ONE § Package Model + Runtime into Immutable Docker Image § Same Environment: Local, Dev, and Prod § No Dependency Surprises in Production § Deploy and Tune Model + Runtime Together pipeline predict-server-build --model-type=tensorflow --model-name=mnist --model-tag=”c” --model-path=./models/tensorflow/mnist/ Package Model Server C Locally pipeline predict-server-push --model-type=tensorflow --model-name=mnist --model-tag=”c” Push Image C To Docker Registry
  • 11. TUNE MODEL + RUNTIME TOGETHER § Try Different Model Hyper-Parameters + Runtime Configs § Even Different Runtimes: TF Serving, TensorRT § Auto-Quantize Model Weights + Activations § Auto-Fuse Neural Network Layers Together § Generate Native CPU + GPU Code pipeline predict-server-start --model-type=tensorflow --model-name=mnist --model-tag=”c" Start Model Server C Locally
  • 12. LOAD TEST MODEL + RUNTIME LOCALLY § Perform Mini-Load Test on Local Model Server § Provides Immediate Feedback on Prediction Performance § Relative Performance Compared to Other Variations § No Need to Deploy to Test or Prod for Prediction Metrics § See Where Time is Being Spent During Prediction pipeline predict --model-server-url=http://localhost:6969 --model-type=tensorflow --model-name=mnist --model-tag=”c” --test-request-concurrency=1000 Load Test Model Server C Locally
  • 13. RUNTIME OPTION: NVIDIA TENSOR-RT § Post-Training Model Optimizations § Specific to Nvidia GPU § Similar to TF Graph Transform Tool § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  • 14. AGENDA Part 0: Latest PipelineAI Research § Package, Deploy, and Tune Both Model + Runtime § Deploy Models and Experiments Safely to Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud
  • 15. DEPLOY MODELS SAFELY TO PROD § Deploy from Jupyter Notebook in 1-Click § Deploy to 1-2% Split or Shadowed Traffic § Tear-Down or Rollback Quickly § Use Command Line Interface (CLI) pipeline predict-cluster-start --model-type=tensorflow --model-name=mnist --model-tag=”b” --traffic-split=“0.02” Start Model Cluster B in Prod pipeline predict-cluster-start --model-type=tensorflow --model-name=mnist --model-tag=”c” --traffic-split=“0.01” Start Model Cluster C in Prod pipeline predict-cluster-start --model-type=tensorflow --model-name=mnist --model-tag=”a” --traffic-split=“0.97” Start Model Cluster A in Prod Implementation Details…
  • 16. DEPLOY EXPERIMENTS SAFELY TO PROD § Create Experiments Directly from Jupyter or Command Line § Deploy Experiment pipeline experiment-add --experiment-name=my_experiment --model-type=tensorflow --model-name=mnist --model-tag=“a” --traffic-split=“97%” CLI Drag n’ Drop pipeline experiment-start --experiment-name=my_experiment --traffic-shadow=“20%” pipeline experiment-add --experiment-name=my_experiment --model-type=tensorflow --model-name=mnist --model-tag=“b” --traffic-split=“2%” pipeline experiment-add --experiment-name=my_experiment --model-type=tensorflow --model-name=mnist --model-tag=“c” --traffic-split=“1%” 1-Click Start Experiment with 20% Shadowed of Production Traffic
  • 17. AGENDA Part 0: Latest PipelineAI Research § Package, Deploy, and Tune Both Model + Runtime § Deploy Models and Experiments Safely to Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud
  • 18. COMPARE MODELS OFFLINE & ONLINE § Offline, Batch Metrics § Validation Accuracy § Training Accuracy § CPU/GPU Utilization § Live Prediction Values § Compare Model Precision § Online, Real-Time Metrics § Response Time & Throughput § Cost Per Prediction
  • 19. PREDICTION PROFILING AND TUNING § Pinpoint Performance Bottlenecks § Fine-Grained Prediction Metrics § Three (3) Logic Prediction Steps 1. transform_request() 2. predict() 3. transform_response()
  • 20. VIEW REAL-TIME PREDICTION STREAM § Visually Compare Real-Time Predictions Prediction Inputs Prediction Result + Confidence
  • 21. CONTINUOUS MODEL TRAINING § Identify and Fix Borderline Predictions (~50-50% Confidence) § Fix Along Class Boundaries § Retrain on New Labeled Data § Game-ify Labeling Process § Enables Crowd Sourcing
  • 22. AGENDA Part 0: Latest PipelineAI Research § Package, Deploy, and Tune Both Model + Runtime § Deploy Models and Experiments Safely to Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud
  • 23. SHIFT TRAFFIC TO MAX(REVENUE) § Shift Traffic to Winning Model using AI Bandit Algorithms Implementation Details…
  • 24. SHIFT TRAFFIC TO MIN(CLOUD CO$T) § Across Clouds & On-Premise § Real-Time Cost Per Prediction § Bandit-based Explore/Exploit
  • 25. AGENDA Part 0: Latest PipelineAI Research Part 1: Optimize TensorFlow Model Training Part 2: Optimize TensorFlow Model Serving
  • 26. AGENDA Part 1: Optimize TensorFlow Model Training § GPUs and TensorFlow § Feed, Train, and Debug TensorFlow Models § TensorFlow Distributed Model Training on a Cluster § Optimize Training with JIT XLA Compiler
  • 28. SETUP ENVIRONMENT § Step 1: Browse to the following: http://allocator.community.pipeline.ai/allocate § Step 2: Browse to the following: http://<ip-address> § Step 3: Browse around. I will provide a Jupyter Username/Password soon. Need Help? Use the Chat!
  • 30. LET’S EXPLORE OUR ENVIRONMENT § Navigate to the following notebook: 01_Explore_Environment § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 32. BREAK § Please 🌟 this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline/tree/master/gpu.ml Need Help? Use the Chat!
  • 33. SETTING UP TENSORFLOW WITH GPUS § Very Painful! § Especially inside Docker § Use nvidia-docker § Especially on Kubernetes! § Use Kubernetes 1.8+ § http://pipeline.ai for GitHub + DockerHub Links
  • 34. TENSORFLOW + CUDA + NVIDIA GPU
  • 35. GPU HALF-PRECISION SUPPORT § FP32 is “Full Precision”, FP16 is “Half Precision” § Supported by Pascal P100 (2016) and Volta V100 (2017) § Two(2) FP16’s in Each FP32 GPU Core for 2x Throughput! § Half-Precision is OK for Approximate Deep Learning Use Cases You Can Set TF_FP16_MATMUL_USE_FP32_COMPUTE=0 on GPU w/ Compute Capability(CC) 5.3+
  • 36. VOLTA V100 (2017) VS. PASCAL P100 (2016) § 84 Streaming Multiprocessors (SM’s) § 5,376 GPU Cores § 672 Tensor Cores (ie. Google TPU) § Mixed FP16/FP32 Precision § Matrix Dims Should be Multiples of 8 § More Shared Memory § New L0 Instruction Cache § Faster L1 Data Cache § V100 vs. P100 Performance § 12x Training, 6x Inference
  • 37. FP32 VS. FP16 ON AWS GPU INSTANCES FP16 Half Precision 87.2 T ops/second for p3 Volta V100 4.1 T ops/second for g3 Tesla M60 1.6 T ops/second for p2 Tesla K80 FP32 Full Precision 15.4 T ops/second for p3 Volta V100 4.0 T ops/second for g3 Tesla M60 3.3 T ops/second for p2 Tesla K80
  • 38. § Currently Supports the Following: § Tesla K80 § Pascal P100 § TPUs § Attach GPUs to CPU Instances § Similar to AWS Elastic GPU, except less confusing WHAT ABOUT GOOGLE CLOUD GPUS?
  • 39. V100 AND CUDA 9 § Independent Thread Scheduling - Finally!! § Similar to CPU fine-grained thread synchronization semantics § Allows GPU to yield execution of any thread § Still Optimized for SIMT (Same Instruction Multiple Thread) § SIMT units automatically scheduled together § Explicit Synchronization P100 V100
  • 40. GPU CUDA PROGRAMMING § Barbaric, But Fun Barbaric § Must Know Hardware Very Well § Hardware Changes are Painful § Use the Profilers & Debuggers
  • 41. CUDA STREAMS § Asynchronous I/O Transfer § Overlap Compute and I/O § Keeps GPUs Saturated § Fundamental to Queue Framework in TensorFlow
  • 42. LET’S SEE WHAT THIS THING CAN DO! § Navigate to the following notebook: 01a_Explore_GPU 01b_Explore_Numba § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 43. AGENDA Part 1: Optimize TensorFlow Model Training § GPUs and TensorFlow § Feed, Train, and Debug TensorFlow Models § TensorFlow Distributed Model Training on a Cluster § Optimize Training with JIT XLA Compiler
  • 44. TRAINING TERMINOLOGY § Tensors: N-Dimensional Arrays § ie. Scalar, Vector, Matrix § Operations: MatMul, Add, SummaryLog,… § Graph: Graph of Operations (DAG) § Session: Contains Graph(s) § Feeds: Feed Inputs into Placeholder § Fetches: Fetch Output from Operation § Variables: What We Learn Through Training § aka “Weights”, “Parameters” § Devices: Hardware Device (GPU, CPU, TPU, ...) -TensorFlow- Trains Variables -User- Fetches Outputs -User- Feeds Inputs -TensorFlow- Performs Operations -TensorFlow- Flows Tensors with tf.device(“/cpu:0,/gpu:15”):
  • 45. TENSORFLOW SESSION Session graph: GraphDef Variables: “W” : 0.328 “b” : -1.407 Variables are Randomly Initialized, then Periodically Checkpointed GraphDef is Created During Training, then Frozen for Inference
  • 46. TENSORFLOW GRAPH EXECUTION § Lazy Execution by Default § Similar to Spark § Eager Execution Now Supported (TensorFlow 1.4) § Similar to PyTorch § "Linearize” Execution to Minimize RAM Usage § Useful on Single GPU with Limited RAM
  • 47. TENSORFLOW MODEL § MetaGraph § Combines GraphDef and Metadata § GraphDef § Architecture of your model (nodes, edges) § Metadata § Asset: Accompanying assets to your model § SignatureDef: Maps external : internal tensors § Variables § Stored separately during training (checkpoint) § Allows training to continue from any checkpoint § Variables are “frozen” into Constants when preparing for inference GraphDef x W mul add b MetaGraph Metadata Assets SignatureDef Tags Version Variables: “W” : 0.328 “b” : -1.407
  • 48. BATCH NORMALIZATION (2015) § Each Mini-Batch May Have Wildly Different Distributions § Normalize per Batch (and Layer) § Faster Training, Learns Quicker § Final Model is More Accurate § TensorFlow is already on 2nd Generation Batch Algorithm § First-Class Support for Fusing Batch Norm Layers § Final mean + variance Are Folded Into Our Graph Later -- (Almost)Always Use Batch Normalization! -- z = tf.matmul(a_prev, W) a = tf.nn.relu(z) a_mean, a_var = tf.nn.moments(a, [0]) scale = tf.Variable(tf.ones([depth/channels])) beta = tf.Variable(tf.zeros ([depth/channels])) bn = tf.nn.batch_normalizaton(a, a_mean, a_var, beta, scale, 0.001)
  • 49. DROPOUT (2014) § Training Technique § Prevents Overfitting § Helps Avoid Local Minima § Inherent Ensembling Technique § Creates and Combines Different Neural Architectures § Expressed as Probability Percentage (ie. 50%) § Boost Other Weights During Validation & Prediction Perform Dropout (Training Phase) Boost for Dropout (Validation & Prediction Phase) 0% Dropout 50% Dropout
  • 50. FOLLOW SOME TENSORFLOW EXPERTS § https://github.com/yaroslavvb/stuff
  • 51. EXTEND EXISTING DATA PIPELINES § Data Processing § HDFS/Hadoop § Spark § Containers § Docker § Schedulers § Kubernetes § Mesos <dependency> <groupId>org.tensorflow</groupId> <artifactId>tensorflow-hadoop</artifactId> </dependency> https://github.com/tensorflow/ecosystem
  • 52. FEED TENSORFLOW TRAINING PIPELINE § Training is Almost Always Limited by Ingestion Pipeline § THE Number One Problem We See Today § Scaling GPUs Up / Out Doesn’t Help § GPUs are Heavily Under-Utilized Tesla K80 Volta V100
  • 53. DON’T USE FEED_DICT!! § feed_dict Requires Python <-> C++ Serialization § Not Optimized for Production Ingestion Pipelines § Retrieves Next Batch After Current Batch is Done § Single-Threaded, Synchronous § CPUs/GPUs Not Fully Utilized! § Use Queue or Dataset APIs § Queues are old and complex sess.run(train_step, feed_dict={…}
  • 54. DETECT UNDERUTILIZED CPUS, GPUS § Instrument training code to generate “timelines” § Analyze with Google Web Tracing Framework (WTF) § Monitor CPU with top, GPU with nvidia-smi http://google.github.io/tracing-framework/ from tensorflow.python.client import timeline trace = timeline.Timeline(step_stats=run_metadata.step_stats) with open('timeline.json', 'w') as trace_file: trace_file.write( trace.generate_chrome_trace_format(show_memory=True))
  • 55. QUEUES § More than traditional Queue § Uses CUDA Streams § Perform I/O, pre-processing, cropping, shuffling, … § Pull from HDFS, S3, Google Storage, Kafka, ... § Combine many small files into large TFRecord files § Use CPUs to free GPUs for compute § Helps saturate CPUs and GPUs
  • 56. QUEUE CAPACITY PLANNING § batch_size § # examples / batch (ie. 64 jpg) § Limited by GPU RAM § num_processing_threads § CPU threads pull and pre-process batches of data § Limited by CPU Cores § queue_capacity § Limited by CPU RAM (ie. 5 * batch_size)
  • 57. DATASET API § tf.Tensor => tf.data.Dataset § Functional Transformations § Python Generator => tf.data.Dataset Dataset.from_tensors((features, labels)) Dataset.from_tensor_slices((features, labels)) TextLineDataset(filenames) dataset.map(lambda x: tf.decode_jpeg(x)) dataset.repeat(NUM_EPOCHS) dataset.batch(BATCH_SIZE) def generator(): while True: yield ... dataset.from_generator(generator, tf.int32) § Dataset => One-Shot Iterator § Dataset => Initializable Iter iter = dataset.make_one_shot_iterator() next_element = iter.get_next() while …: sess.run(next_element) iter = dataset.make_initializable_iterator() sess.run(iter.initializer, feed_dict=PARAMS) next_element = iter.get_next() while …: sess.run(next_element)
  • 58. FUTURE OF DATASET API § Advanced, RL-based Device Placement Strategies § Automatic GPU Data Staging § More Functional Operators
  • 59. LET’S FEED DATA WITH A QUEUE § Navigate to the following notebook: 02_Feed_Queue_HDFS § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 61. BREAK § Please 🌟 this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline/tree/master/gpu.ml Need Help? Use the Chat!
  • 62. LET’S TRAIN A MODEL (CPU) § Navigate to the following notebook: 03_Train_Model_CPU § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 63. LET’S TRAIN A MODEL (GPU) § Navigate to the following notebook: 03a_Train_Model_GPU § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 64. TENSORFLOW DEBUGGER § Step through Operations § Inspect Inputs and Outputs § Wrap Session in Debug Session sess = tf.Session(config=config) sess = tf_debug.LocalCLIDebugWrapperSession(sess)
  • 65. LET’S DEBUG A MODEL § Navigate to the following notebook: 04_Debug_Model § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 66. AGENDA Part 1: Optimize TensorFlow Model Training § GPUs and TensorFlow § Train, Inspect, and Debug TensorFlow Models § TensorFlow Distributed Model Training on a Cluster § Optimize Training with JIT XLA Compiler
  • 67. SINGLE NODE, MULTI-GPU TRAINING § cpu:0 § By default, all CPUs § Requires extra config to target a CPU § gpu:0..n § Each GPU has a unique id § TF usually prefers a single GPU § xla_cpu:0, xla_gpu:0..n § “JIT Compiler Device” § Hints TensorFlow to attempt JIT Compile with tf.device(“/cpu:0”): with tf.device(“/gpu:0”): with tf.device(“/gpu:1”): GPU 0 GPU 1
  • 68. DISTRIBUTED, MULTI-NODE TRAINING § TensorFlow Automatically Inserts Send and Receive Ops into Graph § Parameter Server Synchronously Aggregates Updates to Variables § Nodes with Multiple GPUs will Pre-Aggregate Before Sending to PS Worker0 Worker0 Worker1 Worker0 Worker1 Worker2 gpu0 gpu1 gpu2 gpu3 gpu0 gpu1 gpu2 gpu3 gpu0 gpu1 gpu2 gpu3 gpu0 gpu1 gpu0 gpu0 Single Node Multiple Nodes
  • 69. DATA PARALLEL VS MODEL PARALLEL § Data Parallel (“Between-Graph Replication”) § Send exact same model to each device § Each device operates on partition of data § ie. Spark sends same function to many workers § Each worker operates on their partition of data § Model Parallel (“In-Graph Replication”) § Send different partition of model to each device § Each device operates on all data § Difficult, but required for larger models with lower-memory GPUs
  • 70. SYNCHRONOUS VS. ASYNCHRONOUS § Synchronous § Nodes compute gradients § Nodes update Parameter Server (PS) § Nodes sync on PS for latest gradients § Asynchronous § Some nodes delay in computing gradients § Nodes don’t update PS § Nodes get stale gradients from PS § May not converge due to stale reads!
  • 71. CHIEF WORKER § Chief Defaults to Worker Task 0 § Task 0 is guaranteed to exist § Performs Maintenance Tasks § Writes log summaries § Instructs PS to checkpoint vars § Performs PS health checks § (Re-)Initialize variables at (re-)start of training
  • 72. NODE AND PROCESS FAILURES § Checkpoint to Persistent Storage (HDFS, S3) § Use MonitoredTrainingSession and Hooks § Use a Good Cluster Orchestrator (ie. Kubernetes,Mesos) § Understand Failure Modes and Recovery States Stateless, Not Bad: Training Continues Stateful, Bad: Training Must Stop Dios Mio! Long Night Ahead…
  • 73. ESTIMATOR, EXPERIMENT API § Simplify Model Building § Provide Clear Path to Production § Enable Rapid Model Experiments § Provide Flexible Parameter Tuning § Enable Downstream Optimizing & Serving Infra( ) § Nudge Users to Best Practices Through Opinions § Provide Hooks/Callbacks to Override Opinions § Unified API for Local and Distributed TensorFlow
  • 74. ESTIMATOR API § “Train-to-Serve” Design § Create Custom - or Use a Canned Estimator § Hides Session, Graph, Layers, Iterative Loops (Train, Eval, Predict) § Hooks for All Phases of Model Training and Evaluation § Load Input: input_fn() § Train: model_fn() and train() § Evaluate: evaluate() § Save and Export: export_savedmodel() § Predict: predict() Uses sess.run() Slow Predictions! https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/customestimator/
  • 75. CANNED ESTIMATORS § Commonly-Used Estimators § Pre-Tested and Pre-Tuned § DNNClassifer, TensorForestEstimator § Always Use Canned Estimators If Possible § Reduce Lines of Code, Complexity, and Bugs § Use FeatureColumns to Define & Create Features Custom vs. Canned @ Google, August, 2017
  • 76. COMBINE ESTIMATOR + DATASET API def input_fn(): def generator(): while True: yield ... my_dataset = tf.data.dataset.from_generator(generator, tf.int32) # A one-shot iterator automatically initializes itself on first use. iter = my_dataset.make_one_shot_iterator() # The return value of get_next() matches the dataset element type. images, labels = iter.get_next() return images, labels # The input_fn can be used as a regular Estimator input function. estimator = tf.estimator.Estimator(…) estimator.train(train_input_fn=input_fn, …)
  • 77. FEATURECOLUMN ABSTRACTION § Used by Canned Estimator § Simplifies Input Ingestion § Declarative Way to Specify Model Training Inputs § Converts Sparse Features to Dense Tensors § Sparse Features: Query Keyword, Url, ProductID,… § Wide/Linear Models Use Feature-Crossing § Deep Models Use Embeddings
  • 78. SINGLE VS. MULTI-OBJECTIVES + HEADS § Single-Objective Estimator § Single classification prediction § Multi-Objective Estimator § Two (2) classification predictions § One (1) classification prediction + One(1) final layer § Multiple Heads Are Used to Ensemble Models § Treats neural network as a feature engineering step! § Supported by TensorFlow Serving
  • 79. LAYERS API § Standalone Layer or Entire Sub-Graphs § Functions of Tensor Inputs & Outputs § Mix and Match with Operations § Assumes 1st Dimension is Batch Size § Handles One (1) to Many (*) Inputs § Metrics are Layers § Loss Metric (Per Mini-Batch) § Accuracy and MSE (Across Mini-Batches)
  • 80. EXPERIMENT API § Easier-to-Use Distributed TensorFlow § Same API for Local and Distributed (*Theoretically) § Combines Estimator with input_fn() § Used for Training, Evaluation, & Hyper-Parameter Tuning § Distributed Training Defaults to Data-Parallel & Async § Cluster Configuration is Fixed at Start of Training Job § No Auto-Scaling Allowed!!
  • 81. ESTIMATOR, EXPERIMENT CONFIGS § TF_CONFIG § Special environment variable for config § Defines ClusterSpec in JSON incl. master, workers, PS’s § Distributed mode ‘{“environment”:“cloud”}’ § Local: ‘{environment”:“local”, {“task”:{”type”:”worker”}}’ § RunConfig: Defines checkpoint interval, output directory, § HParams: Hyper-parameter tuning parameters and ranges § learn_runner creates RunConfig before calling run() & tune() § schedule is set based on {”task”:{”type”}} TF_CONFIG= '{ "environment": "cloud", "cluster": { "master":["worker0:2222”], "worker":["worker1:2222"], "ps": ["ps0:2222"] }, "task": {"type": "ps", "index": "0"} }'
  • 82. OPTIMIZER, ESTIMATOR API + TPU’S run_config = tpu_config.RunConfig() estimator = tpu_estimator.TpuEstimator(model_fn=model_fn, config=run_config) estimator.train(input_fn=input_fn, num_epochs=10, …) optimizer = tpu_optimizer.CrossShardOptimizer( tf.train.GradientDescentOptimizer(learning_rate=…) ) train_op = optimizer.minimize(loss) estimator_spec = tf.estimator.EstimatorSpec(train_op=train_op, loss=…)
  • 83. SEPARATE TRAINING + EVALUATION § Separate Training and Evaluation Clusters § Evaluate Upon Checkpoint § Avoid Resource Contention § Let Training Continue in Parallel with Evaluation Training Cluster Evaluation Cluster Parameter Server Cluster
  • 84. LET’S TRAIN DISTRIBUTED TENSORFLOW § Navigate to the following notebook: 05_Train_Model_Distributed_CPU or 05a_Train_Model_Distributed_GPU § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 86. BREAK § Please 🌟 this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline/tree/master/gpu.ml Need Help? Use the Chat!
  • 87. AGENDA Part 1: Optimize TensorFlow Model Training § GPUs and TensorFlow § Train, Inspect, and Debug TensorFlow Models § TensorFlow Distributed Model Training on a Cluster § Optimize Training with JIT XLA Compiler
  • 88. XLA FRAMEWORK § XLA: “Accelerated Linear Algebra” § Reduce Reliance on Custom Operators § Improve Execution Speed § Improve Memory Usage § Reduce Mobile Footprint § Improve Portability Helps TensorFlow Stay Flexible, Yet Still Performant
  • 89. XLA HIGH LEVEL OPTIMIZER (HLO) § HLO: “High Level Optimizer” § Compiler Intermediate Representation (IR) § Independent of source and target language § XLA Step 1 Emits Target-Independent HLO § XLA Step 2 Emits Target-Dependent LLVM § LLVM Emits Native Code Specific to Target § Supports x86-64, ARM64 (CPU), and NVPTX (GPU)
  • 90. JIT COMPILER § JIT: “Just-In-Time” Compiler § Built on XLA Framework § Reduce Memory Movement – Especially with GPUs § Reduce Overhead of Multiple Function Calls § Similar to Spark Operator Fusing in Spark 2.0 § Unroll Loops, Fuse Operators, Fold Constants, … § Scopes: session, device, `with jit_scope():`
  • 91. VISUALIZING JIT COMPILER IN ACTION Before JIT After JIT Google Web Tracing Framework: http://google.github.io/tracing-framework/ from tensorflow.python.client import timeline trace = timeline.Timeline(step_stats=run_metadata.step_stats) with open('timeline.json', 'w') as trace_file: trace_file.write( trace.generate_chrome_trace_format(show_memory=True)) run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() sess.run(options=run_options, run_metadata=run_metadata)
  • 92. VISUALIZING FUSING OPERATORS pip install graphviz dot -Tpng /tmp/hlo_graph_1.w5LcGs.dot -o hlo_graph_1.png GraphViz: http://www.graphviz.org hlo_*.dot files generated by XLA
  • 93. LET’S TRAIN WITH XLA CPU § Navigate to the following notebook: 06_Train_Model_XLA_CPU § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 94. LET’S TRAIN WITH XLA GPU § Navigate to the following notebook: 06a_Train_Model_XLA_GPU § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 95. AGENDA Part 0: Latest PipelineAI Research Part 1: Optimize TensorFlow Model Training Part 2: Optimize TensorFlow Model Serving
  • 96. AGENDA Part 2: Optimize TensorFlow Model Serving § AOT XLA Compiler and Graph Transform Tool § Key Components of TensorFlow Serving § Deploy Optimized TensorFlow Model § Optimize TensorFlow Serving Runtime
  • 97. AOT COMPILER § Standalone, Ahead-Of-Time (AOT) Compiler § Built on XLA framework § tfcompile § Creates executable with minimal TensorFlow Runtime needed § Includes only dependencies needed by subgraph computation § Creates functions with feeds (inputs) and fetches (outputs) § Packaged as cc_libary header and object files to link into your app § Commonly used for mobile device inference graph § Currently, only CPU x86-64 and ARM are supported - no GPU
  • 98. GRAPH TRANSFORM TOOL (GTT) § Post-Training Optimization to Prepare for Inference § Remove Training-only Ops (checkpoint, drop out, logs) § Remove Unreachable Nodes between Given feed -> fetch § Fuse Adjacent Operators to Improve Memory Bandwidth § Fold Final Batch Norm mean and variance into Variables § Round Weights/Variables to improve compression (ie. 70%) § Quantize (FP32 -> INT8) to Speed Up Math Operations
  • 99. AFTER TRAINING, BEFORE OPTIMIZATION -TensorFlow- Trains Variables -User- Fetches Outputs -User- Feeds Inputs -TensorFlow- Performs Operations -TensorFlow- Flows Tensors ?!
  • 100. POST-TRAINING GRAPH TRANSFORMS transform_graph --in_graph=tensorflow_inception_graph.pb ß Original Graph --out_graph=optimized_inception_graph.pb ß Transformed Graph --inputs='Mul' ß Feed (Input) --outputs='softmax' ß Fetch (Output) --transforms=' ß List of Transforms strip_unused_nodes remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes'
  • 101. AFTER STRIPPING UNUSED NODES § Optimizations § strip_unused_nodes § Results § Graph much simpler § File size much smaller
  • 102. AFTER REMOVING UNUSED NODES § Optimizations § strip_unused_nodes § remove_nodes § Results § Pesky nodes removed § File size a bit smaller
  • 103. AFTER FOLDING CONSTANTS § Optimizations § strip_unused_nodes § remove_nodes § fold_constants § Results § Placeholders (feeds) -> Variables* (*Why Variables and not Constants?)
  • 104. AFTER FOLDING BATCH NORMS § Optimizations § strip_unused_nodes § remove_nodes § fold_constants § fold_batch_norms § Results § Graph remains the same § File size approximately the same
  • 105. AFTER QUANTIZING WEIGHTS § Optimizations § strip_unused_nodes § remove_nodes § fold_constants § fold_batch_norms § quantize_weights § Results § Graph is same, file size is smaller, compute is faster
  • 106. WEIGHT QUANTIZATION § FP16 and INT8 Are Smaller and Computationally Simpler § Weights/Variables are Constants § Easy to Linearly Quantize
  • 107. LET’S OPTIMIZE FOR INFERENCE § Navigate to the following notebook: 07_Optimize_Model* *Why just CPU version? Why not GPU? § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 109. ACTIVATION QUANTIZATION § Activations Not Known Ahead of Time § Depends on input, not easy to quantize § Requires Additional Calibration Step § Use a “representative” dataset § Per Neural Network Layer… § Collect histogram of activation values § Generate many quantized distributions with different saturation thresholds § Choose threshold to minimize… KL_divergence(ref_distribution, quant_distribution) § Not Much Time or Data is Required (Minutes on Commodity Hardware)
  • 110. AFTER ACTIVATION QUANTIZATION § Optimizations § strip_unused_nodes § remove_nodes § fold_constants § fold_batch_norms § quantize_weights § quantize_nodes (activations) § Results § Larger graph, needs calibration! Requires Additional freeze_requantization_ranges
  • 111. LET’S OPTIMIZE FOR INFERENCE § Navigate to the following notebook: 08_Optimize_Model_Activations § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 112. FREEZING MODEL FOR DEPLOYMENT § Optimizations § strip_unused_nodes § remove_nodes § fold_constants § fold_batch_norms § quantize_weights § quantize_nodes § freeze_graph § Results § Variables -> Constants Finally! We’re Ready to Deploy!!
  • 113. AGENDA Part 2: Optimize TensorFlow Model Serving § AOT XLA Compiler and Graph Transform Tool § Key Components of TensorFlow Serving § Deploy Optimized TensorFlow Model § Optimize TensorFlow Serving Runtime
  • 114. MODEL SERVING TERMINOLOGY § Inference § Only Forward Propagation through Network § Predict, Classify, Regress, … § Bundle § GraphDef, Variables, Metadata, … § Assets § ie. Map of ClassificationID -> String § {9283: “penguin”, 9284: “bridge”} § Version § Every Model Has a Version Number (Integer) § Version Policy § ie. Serve Only Latest (Highest), Serve Both Latest and Previous, …
  • 115. TENSORFLOW SERVING FEATURES § Supports Auto-Scaling § Custom Loaders beyond File-based § Tune for Low-latency or High-throughput § Serve Diff Models/Versions in Same Process § Customize Models Types beyond HashMap and TensorFlow § Customize Version Policies for A/B and Bandit Tests § Support Request Draining for Graceful Model Updates § Enable Request Batching for Diff Use Cases and HW § Supports Optimized Transport with GRPC and Protocol Buffers
  • 116. PREDICTION SERVICE § Predict (Original, Generic) § Input: List of Tensor § Output: List of Tensor § Classify § Input: List of tf.Example (key, value) pairs § Output: List of (class_label: String, score: float) § Regress § Input: List of tf.Example (key, value) pairs § Output: List of (label: String, score: float)
  • 117. PREDICTION INPUTS + OUTPUTS § SignatureDef § Defines inputs and outputs § Maps external (logical) to internal (physical) tensor names § Allows internal (physical) tensor names to change from tensorflow.python.saved_model import utils from tensorflow.python.saved_model import signature_constants from tensorflow.python.saved_model import signature_def_utils graph = tf.get_default_graph() x_observed = graph.get_tensor_by_name('x_observed:0') y_pred = graph.get_tensor_by_name('add:0') inputs_map = {'inputs': x_observed} outputs_map = {'outputs': y_pred} predict_signature = signature_def_utils.predict_signature_def(inputs=inputs_map, outputs=outputs_map)
  • 118. MULTI-HEADED INFERENCE § Inputs Pass Through Model One Time § Model Returns Multiple Predictions: 1. Human-readable prediction (ie. “penguin”, “church”,…) 2. Final layer of scores (float vector) § Final Layer of floats Pass to the Next Model in Ensemble § Optimizes Bandwidth, CPU/GPU, Latency, Memory § Enables Complex Model Composing and Ensembling
  • 119. BUILD YOUR OWN MODEL SERVER § Adapt GRPC(Google) <-> HTTP (REST of the World) § Perform Batch Inference vs. Request/Response § Handle Requests Asynchronously § Support Mobile, Embedded Inference § Customize Request Batching § Add Circuit Breakers, Fallbacks § Control Latency Requirements § Reduce Number of Moving Parts #include “tensorflow_serving/model_servers/server_core.h” class MyTensorFlowModelServer { ServerCore::Options options; // set options (model name, path, etc) std::unique_ptr<ServerCore> core; TF_CHECK_OK( ServerCore::Create(std::move(options), &core) ); } Compile and Link with libtensorflow.so
  • 120. RUNTIME OPTION: NVIDIA TENSOR-RT § Post-Training Model Optimizations § Specific to Nvidia GPU § Similar to TF Graph Transform Tool § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  • 121. AGENDA Part 2: Optimize TensorFlow Model Serving § AOT XLA Compiler and Graph Transform Tool § Key Components of TensorFlow Serving § Deploy Optimized TensorFlow Model § Optimize TensorFlow Serving Runtime
  • 122. SAVED MODEL FORMAT § Navigate to the following notebook: 09_Deploy_Optimized_Model § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 123. AGENDA Part 2: Optimize TensorFlow Model Serving § AOT XLA Compiler and Graph Transform Tool § Key Components of TensorFlow Serving § Deploy Optimized TensorFlow Model § Optimize TensorFlow Serving Runtime
  • 124. REQUEST BATCH TUNING § max_batch_size § Enables throughput/latency tradeoff § Bounded by RAM § batch_timeout_micros § Defines batch time window, latency upper-bound § Bounded by RAM § num_batch_threads § Defines parallelism § Bounded by CPU cores § max_enqueued_batches § Defines queue upper bound, throttling § Bounded by RAM Reaching either threshold will trigger a batch
  • 125. ADVANCED BATCHING & SERVING TIPS § Batch Just the GPU/TPU Portions of the Computation Graph § Batch Arbitrary Sub-Graphs using Batch / Unbatch Graph Ops § Distribute Large Models Into Shards Across TensorFlow Model Servers § Batch RNNs Used for Sequential and Time-Series Data § Find Best Batching Strategy For Your Data Through Experimentation § BasicBatchScheduler: Homogeneous requests (ie Regress or Classify) § SharedBatchScheduler: Mixed requests, multi-step, ensemble predict § StreamingBatchScheduler: Mixed CPU/GPU/IO-bound Workloads § Serve Only One (1) Model Inside One (1) TensorFlow Serving Process § Much Easier to Debug, Tune, Scale, and Manage Models in Production.
  • 126. LET’S DEPLOY OPTIMIZED MODEL § Navigate to the following notebook: 10_Optimize_Model_Server § https://github.com/PipelineAI/pipeline/tree/master/ gpu.ml/notebooks
  • 127. AGENDA Part 0: Latest PipelineAI Research Part 1: Optimize TensorFlow Model Training Part 2: Optimize TensorFlow Model Serving
  • 128. THANK YOU!! QUESTIONS? § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline/tree/master/gpu.ml Contact Me chris@pipeline.ai @cfregly