https://pipeline.ai
With PipelineAI, You Can…
* Generate Hardware-Specific Model Optimizations
* Deploy and Compare Models in Live Production
* Optimize Complete AI Pipeline Across Many Models
* Hyper-Parameter Tune Both Training & Predicting Phases
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...Chris Fregly
Online Workshop
Note: A GPU-based cloud instance will be provided to each attendee for the duration of this event!!
At 8am PT on the morning of this workshop, we will email the Webinar details to your email address registered with Eventbrite.
If this email address is not up to date - or you do not get the email by 8am PT - please email your Eventbrite confirmation to help@pipeline.ai and we'll send you the details.
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
Time
Start: 9am PT Time
End: 1pm PT Time
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
More Info including GitHub and Docker Repos
http://pipeline.ai
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
Chris Fregly, Founder @ PipelineAI, will walk you through a real-world, complete end-to-end Pipeline-optimization example. We highlight hyper-parameters - and model pipeline phases - that have never been exposed until now.
While most Hyperparameter Optimizers stop at the training phase (ie. learning rate, tree depth, ec2 instance type, etc), we extend model validation and tuning into a new post-training optimization phase including 8-bit reduced precision weight quantization and neural network layer fusing - among many other framework and hardware-specific optimizations.
Next, we introduce hyperparameters at the prediction phase including request-batch sizing and chipset (CPU v. GPU v. TPU).
Lastly, we determine a PipelineAI Efficiency Score of our overall Pipeline including Cost, Accuracy, and Time. We show techniques to maximize this PipelineAI Efficiency Score using our massive PipelineDB along with the Pipeline-wide hyper-parameter tuning techniques mentioned in this talk.
Bio
Chris Fregly is Founder and Applied AI Engineer at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017Chris Fregly
http://pipeline.io
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
Bio
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Github Repo
https://github.com/fluxcapacitor/pipeline
Video
https://youtu.be/oNf3I1fVmg8
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
https://www.meetup.com/TensorFlow-Chicago/events/240267321/
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/240587698/
http://pipeline.io
https://github.com/fluxcapacitor/pipeline
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Chris Fregly
This document discusses optimizing and profiling TensorFlow models for training and inference on GPUs. It covers optimizing training using GPUs, data pipelines, the XLA JIT compiler, and distributed training. For inference, it discusses optimizing using the XLA AOT compiler, graph transformation tools, and TensorFlow Serving. The talk compares optimization techniques in production settings.
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...Chris Fregly
Pipeline.AI is a platform for deploying and optimizing machine learning models at scale. It allows users to package models with their runtime dependencies, perform load testing and optimizations, deploy models to production safely using techniques like canary deployments, and monitor models both offline and online. The platform aims to enable live, continuous model training directly in production environments.
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, Chris will demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
https://github.com/fluxcapacitor/pipeline/gpu.ml
http://pipeline.io
Speaker: Umayah Abdennabi
Agenda
* Intro Grammarly (Umayah Abdennabi, 5 mins)
* Meetup Updates and Announcements (Chris, 5 mins)
* Custom Functions in Spark SQL (30 mins)
Speaker: Umayah Abdennabi
Spark comes with a rich Expression library that can be extended to make custom expressions. We will look into custom expressions and why you would want to use them.
* TF 2.0 + Keras (30 mins)
Speaker: Francesco Mosconi
Tensorflow 2.0 was announced at the March TF Dev Summit, and it brings many changes and upgrades. The most significant change is the inclusion of Keras as the default model building API. In this talk, we'll review the main changes introduced in TF 2.0 and highlight the differences between open source Keras and tf.keras
* SQUAD Deep-Dive: Question & Answer with Context (45 mins)
Speaker: Brett Koonce (https://quarkworks.co)
SQuAD (Stanford Question Answer Dataset) is an NLP challenge based around answering questions by reading Wikipedia articles, designed to be a real-world machine learning benchmark. We will look at several different ways to tackle the SQuAD problem, building up to state of the art approaches in terms of time, complexity, and accuracy.
https://rajpurkar.github.io/SQuAD-explorer/
https://dawn.cs.stanford.edu/benchmark/#squad
Food and drinks will be provided. The event will be held at Grammarly's office at One Embarcadero Center on the 9th floor. When you arrive at One Embarcadero, take the escalator to the second floor where you will find the lobby and elevators to the office suites. Come on up to the 9th floor (no need to check in at security), and ring the Grammarly doorbell.
Quest for the Perfect Workflow for McrFREDAndi Smith
Andi Smith provides an overview of setting up an automated workflow for front-end development using Grunt or Gulp. They discuss choosing a task runner, common tasks for setup like concatenation and minification, tasks for development like autoprefixing and live reloading, and tasks for build like image optimization and compression. The presentation emphasizes setting up a workflow that focuses on speeding up the development process and only including necessary tasks.
Hands on Docker - Launch your own LEMP or LAMP stack - SunshinePHPDana Luther
In this tutorial we will go over setting up a standard LEMP stack for development use and learn how to modify it to mimic your production/pre-production environments as closely as possible. We will go over how to switch from Nginx to Apache, upgrade PHP versions and introduce additional storage engines such as Redis to the equation. We'll also step through how to run both unit and acceptance suites using headless Selenium images in the stack. Leave here fully confident in knowing that whatever environment you get thrown into, you can replicate it and work in it comfortably.
Migrating to a Bazel-based CI System: 6 Learnings - Or ShacharWix Engineering
Two years ago, we were given a big challenge - Transform Wix Build System, then based on Maven and Teamcity, to a new system that will support our exponentially growing scale. Naturally, we chose Bazel.
But, how could we move to a system so different in so many ways than the existing one? Furthermore, we were required not to break the current build system, as we migrate to the new one.
Fast forward to today: Wix backend CI system is fully migrated to Bazel! The system builds in a fracture of the time - even with our largest codebases. In this talk, Or Shachar will describe how we achieved this, why it took us so long, what tools we had to build on the way (and what we already have, and will, open source!), and share the principles that helped us.
You can watch it here:
https://www.wix.engineering/post/bazelcon-2019-lessons-learned-from-migrating-our-build-system-to-bazel
The document discusses a 100-day challenge by NTT Corporation and the Japan Cloud Foundry Group to deploy 100 open source apps to Cloud Foundry. They successfully deployed 97 apps using various buildpacks and services, with 3 apps failing to deploy due to app-specific issues. The document provides details on the challenge and lessons learned around deploying different programming languages and frameworks to Cloud Foundry.
Migrating to a bazel based CI system: 6 learnings Or Shachar
Two years ago, we were given a big challenge - Transform Wix Build System, then based on Maven and Teamcity, to a new system that will support our exponentially growing scale.
But, how could we move to a system so different in so many ways than the existing one? Furthermore, we were required not to break the current build system, as we migrate to the new one.
Fast forward to today: Wix backend CI system is fully migrated to Bazel! The system builds in a fracture of the time - even with our largest codebases. In this talk, we will describe how we achieved this, why it took us so long, what tools we had to build on the way (and what we already have, and will, open source!), and share the principles that helped us.
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Marcus Barczak
The document discusses Etsy's experience integrating multiple content delivery network (CDN) providers. Etsy began using a single CDN in 2008 but then investigated using multiple CDNs in 2012 to improve resilience, flexibility, and costs. They developed an evaluation criteria and testing process to initially configure and test the CDNs with non-critical traffic before routing production traffic. Etsy then implemented methods for balancing traffic across CDNs using DNS and monitoring the performance of the CDNs and origin infrastructure.
This document provides a summary of Mike Malone's talk on scaling Django web apps. It discusses how Pownce scaled to handle hundreds of requests per second and thousands of database operations per second while serving millions of users, relationships, notes, and terabytes of static data. It also covers some of the common bottlenecks Pownce encountered and eliminated in scaling their Django application, including using caching, load balancing, and queuing to improve performance and scalability.
Puppet is an open source tool for system configuration management and automation. It allows system administrators to define the desired state for systems using code and enforces that state. Puppet works by compiling configuration code into a catalog that is distributed to nodes to enforce the specified configuration. This model-driven approach allows organizations to provision, deploy, and manage thousands of systems consistently at scale.
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010Adrian Trenaman
Want to know how to design, implement and deploy modular enterprise integration solutions using OSGi? The Apache Karaf OSGi shell, used by Apache Felix and Apache ServiceMix, enhances core OSGi implementations like Felix or Equinox with an easy to use, extendible command shell, providing logging, hot deployment, configuration, container administration, clustering, high availability and easy 'feature-based' dependency management In this session, you'll learn how Karaf works, and how you can leverage Karaf either on its own or embedded within ServiceMix to deploy business logic, RESTful services, EIP-based integration flows and web services. You'll learn how to extend the command shell with your own commands, and, use Spring-DM *or* OSGi BluePrint Services to make using OSGi a walk in the park.
The document discusses continuous integration and delivery for machine learning models. It describes wrapping machine learning code into Docker containers to allow for parameterized training. It also discusses deploying models using Kubernetes operators and packaging models as services to run on customer infrastructure for training and serving. The goal is to establish best practices for continuous training, testing, and deployment of machine learning models.
Similar to PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to Scalable Predicting - Strata Conference - San Jose - March 2018
High Performance Distributed TensorFlow with GPUs and Kubernetesinside-BigData.com
In this deck from the Stanford HPC Conference, Chris Fregly from PipelineAI presents: High Performance Distributed TensorFlow with GPUs and Kubernetes.
"Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements with TensorFlow, Kubernetes, OpenFaaS, GPUs, and PipelineAI.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment. This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production. Following the famous Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data. Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong. Learn to be fast and strong by attending this talk!"
Watch the video: https://youtu.be/k4qAKQHakNg
Learn more: https://pipeline.ai/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIData Con LA
Abstract:-
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment.
This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster.
Bio:-
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Pipeline.AI was also the recent winner of the O'Reilly Media AI Startup Showcase at the AI conference.
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
In this deck from the 2018 Swiss HPC Conference, Axel Koehler from NVIDIA presents: The Convergence of HPC and Deep Learning.
"The intersection of AI and HPC is extending the reach of science and accelerating the pace of scientific innovation like never before. The technology originally developed for HPC has enabled deep learning, and deep learning is enabling many usages in science. Deep learning is also helping deliver real-time results with models that used to take days or months to simulate. The presentation will give an overview about the latest hard- and software developments for HPC and Deep Learning from NVIDIA and will show some examples that Deep Learning can be combined with traditional large scale simulations."
Watch the video: https://wp.me/p3RLHQ-ijM
Learn more: http://nvidia.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Performance Benchmarking of Clouds Evaluating OpenStackPradeep Kumar
Pradeep Kumar surisetty presented on performance benchmarking of clouds and evaluating OpenStack. He discussed key cloud characteristics like elasticity and scalability. He then covered various performance measuring tools like Rally, Browbeat, Perfkit Benchmarker, and SPEC Cloud IaaS 2016 benchmark. He also discussed performance monitoring tools like Ceilometer, Collectd/Graphite/Grafana, and Ganglia. Finally, he provided some tuning tips for hardware, instances, over-subscription, local storage, NUMA nodes, disk pinning, and deployment timings.
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...Andrew Liu
Data analysts, data engineers, and application developers are supporting unprecedented rates of change, whether talking about latency requirements to the expanding arena of data usage scenarios. While the technology functionality must rapidly evolve to meet customer needs and respond to competitive pressures, how can we enhance the data platform to help manage this unpredictability?
To help address these realities, data practitioners from a diverse set of backgrounds are increasingly relying on schema-free, distributed, scalable, and high-performance data storage (also known as NoSQL databases). In this session, we will showcase a wide variety of customer scenarios, business goals, and technical challenges faced by real-world customers. More importantly, how adding Azure DocumentDB into a data practitioner's arsenal within the Microsoft/Azure data ecosystem will allow you to easily solve these complex design patterns at massive scale.
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
This document provides an overview of machine learning tooling on AWS, including data pipelines, modeling and training, and deployment. It discusses AWS products for streaming and batch data ingestion, machine learning services like Amazon Machine Learning, Amazon SageMaker, and AWS Deep Learning AMIs. It also provides best practices for notebooks, model maintenance, and ML lifecycle management using tools like MLFlow and KubeFlow. The document concludes that while AWS provides a strong foundation, operations require additional layers for successful and reproducible machine learning.
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
AI has been a hot topic lately, with advances being made constantly in what is possible, there has not been as much discussion of the infrastructure and scaling challenges that come with it. How do you support dozens of different languages and frameworks, and make them interoperate invisibly? How do you scale to run abstract code from thousands of different developers, simultaneously and elastically, while maintaining less than 15ms of overhead?
At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them.
In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.
Quick trip around the Cosmos - Things every astronaut supposed to knowRafał Hryniewski
Slides for my talk which overviews new(ish) product of Microsoft - multi-model, cloud database known as CosmosDB.
Recorded talk (in Polish) is available here: https://youtu.be/ZWpJne0kcds?t=1h52m45s
The document summarizes a meetup on data streaming and machine learning with Google Cloud Platform. The meetup consisted of two presentations:
1. The first presentation discussed using Apache Beam (Dataflow) on Google Cloud Platform to parallelize machine learning training for improved performance. It showed how Dataflow was used to reduce training time from 12 hours to under 30 minutes.
2. The second presentation demonstrated building a streaming pipeline for sentiment analysis on Twitter data using Dataflow. It covered streaming patterns, batch vs streaming processing, and a demo that ingested tweets from PubSub and analyzed them using Cloud NLP API and BigQuery.
The document summarizes a meetup on data streaming and machine learning with Google Cloud Platform. The meetup consisted of two presentations:
1. The first presentation discussed using Apache Beam and Google Cloud Dataflow to parallelize machine learning training for hyperparameter optimization. It showed how Dataflow reduced training time from 12 hours to under 30 minutes.
2. The second presentation demonstrated building a streaming Twitter sentiment analysis pipeline with Dataflow. It covered streaming patterns, batch vs streaming considerations, and a demo that ingested tweets from PubSub, analyzed sentiment with NLP, and loaded results to BigQuery.
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
Managing and Scaling Puppet - PuppetConf 2014Puppet
Miguel Zuniga presented on managing and scaling Puppet. The presentation covered using a Puppet master with a web cluster for scaling, adding caching to reduce load, using source control with Puppet, multi-datacenter configurations, masterless Puppet in the cloud, and future directions including search capabilities and dynamic configurations. Zuniga took questions at the end.
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
JConWorld: Continuous SQL with Kafka and Flink
In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.
We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html
https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber
Talk given at the London AICamp meet up on the 13 July 2023. It's an introduction on building open-source ChatGPT-like chat bots and some of the considerations to have while training/tuning them using Airflow.
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
GPS Insight is a leader in fleet vehicle management using IoT. Internally they use a combination of SQL and NoSQL big data technologies, including distributed SQL data analytics via Presto, an open-source query engine developed by Facebook. Learn how to set up, configure, and use Presto with Scylla for supporting ad hoc non-partition key queries for analytics and data scientists. Plus hear how to use Presto for a Data Archival approach with csv files on S3 or similar storage appliance.
Apache Samza is a stream processing framework that provides high-level APIs and powerful stream processing capabilities. It is used by many large companies for real-time stream processing. The document discusses Samza's stream processing architecture at LinkedIn, how it scales to process billions of messages per day across thousands of machines, and new features around faster onboarding, powerful APIs including Apache Beam support, easier development through high-level APIs and tables, and better operability in YARN and standalone clusters.
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...NETWAYS
Physical, virtual, containers. Public cloud, private cloud, hybrid cloud. IaaS, PaaS, SaaS. These are the choices that we're faced with when architecting a datacenter of today. And the choice is not one or the other; it is often a combination of many of these. How do we remain in control of our datacenters? How do we deploy and configure software, manage change across disparate systems, and enforce policy/security? How do we do this in a way that operations engineers and developers alike can rejoice in the processes and workflow?
In this talk, I will discuss the problems faced by the modern datacenter, and how a set of open source tools including Vagrant, Packer, Consul, and Terraform can be used to tame the rising complexity curve and provide solutions for these problems.
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...HostedbyConfluent
Core banking systems are batch oriented: typically with heavy overnight batch cycles before business opens each morning. In this talk I will explain some of the common interface points between core-banking infrastructure and event streaming systems. Then I will focus on how to do stream processing using ksqlDB for core-banking shaped data: showing how to do common operation using various ksqlDB functions. The key features are avro-record keys and multi-key joins (ksqlDB 0.15), schema management and state store planning.
Similar to PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to Scalable Predicting - Strata Conference - San Jose - March 2018 (20)
AWS reInvent 2022 reCap AI/ML and DataChris Fregly
This document discusses Amazon Web Services (AWS) products and services for building end-to-end machine learning and data strategies. It covers topics such as ML infrastructure, governance, data preparation, model training, deployment, and education. Specific services mentioned include Amazon SageMaker, AWS Lake Formation, Amazon Redshift, Amazon EMR, AWS Glue, and AWS services for hardware acceleration like AWS Trainium and AWS Graviton.
Pandas on AWS - Let me count the ways.pdfChris Fregly
Chris Fregly (Principal Solution Architect, AI and machine learning at AWS) will give a brief presentation on the various ways to perform scalable Pandas, Modin, and Ray on AWS. He will then answer questions from the audience and moderator, Alejandro Herrera (whatever he is) at Ponder.
Chris Fregly is a Principal Solution Architect for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is the organizer of the Global Data Science on AWS meetup. He is co-author of the O'Reilly Book, "Data Science on AWS."
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupChris Fregly
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Talk #0: Introductions and Meetup Announcements By Chris Fregly and Antje Barth
Talk #1: Ray Overview, Ray AI Runtime on AWS using Amazon SageMaker, EC2, EMR, EKS by Chris Fregly, Principal Specialist Solution Architect, AI and Machine Learning @ AWS
Talk #2: Deep-dive Blueprints for Amazon Elastic Kubernetes Service (EKS) including Ray and Spark by Apoorva Kulkarni, Sr. Specialist Solution Architect, Containers and Kubernetes @ AWS
RSVP Webinar: https://www.eventbrite.com/e/webinarkubeflow-tensorflow-tfx-pytorch-gpu-spark-ml-amazonsagemaker-tickets-45852865154
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
O'Reilly Book: https://www.amazon.com/dp/1492079391/
Website: https://datascienceonaws.com
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedChris Fregly
The document discusses using multi-armed bandit tests to compare natural language models. It describes training BERT models with TensorFlow and PyTorch, and training a multi-armed bandit model with Vowpal Wabbit for reinforcement learning. It then demonstrates testing the BERT models with the bandit model and scaling multi-armed bandits on AWS.
Amazon reInvent 2020 Recap: AI and Machine LearningChris Fregly
Amazon reInvent 2020 Recap: AI and Machine Learning
Video here: https://youtu.be/YSXe02Y5pHM
NEW RELEASE! Build, Automate, Manage, and Scale ML Workflows with the NEW Amazon SageMaker Pipelines by Hallie Crosby Weishahn.
Description of Talk and Demo
AWS recently announced Amazon SageMaker Pipelines (https://aws.amazon.com/sagemaker/pipelines/), the first purpose-built, easy-to-use Continuous Integration and Continuous Delivery (CI/CD) service for machine learning.
SageMaker Pipelines has three main components which improve the operational resilience and reproducibility of your workflows: 1) pipelines, 2) model registry, and 3) projects.
In this talk and demo, Hallie will walk us through the new Amazon SageMaker Pipelines feature including MLOps support.
Date/Time
9-10am US Pacific Time (Third Monday of Every Month)
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Meetup:
https://www.meetup.com/Data-Science-on-AWS/
Zoom:
https://zoom.us/j/690414331
Webinar ID: 690 414 331
Phone:
+1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
Related Links
Meetup: https://meetup.datascienceonaws.com
GitHub Repo: https://github.com/data-science-on-aws/
O'Reilly Book: https://datascienceonaws.com
YouTube: https://youtube.datascienceonaws.com
Slideshare: https://slideshare.datascienceonaws.com
Support: https://support.pipeline.ai
Monthly Workshop: https://www.eventbrite.com/e/full-day-workshop-kubeflow-gpu-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-tickets-63362929227
RSVP: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
The document discusses Amazon SageMaker Model Monitor and Debugger for monitoring machine learning models in production. SageMaker Model Monitor collects prediction data from endpoints, creates a baseline, and runs scheduled monitoring jobs to detect deviations from the baseline. It generates reports and metrics in CloudWatch. SageMaker Debugger helps debug training issues by capturing debug data with no code changes and providing real-time alerts and visualizations in Studio. Both services help detect model degradation and take corrective actions like retraining.
Quantum Computing with Amazon Braket
In this talk, I describe some fundamental principles of quantum computing including qu-bits, superposition, and entanglement. I will demonstrate how to perform secure quantum computing tasks across many Quantum Processing Units (QPUs) using Amazon Braket, IAM, and S3.
AI and Machine Learning, Quantum Computing, Amazon Braket, QPU
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
In this talk, we present tips and best practices for scaling a large workshop for 1,000's of simultaneous attendees - both online and in-person. While our workshop is focused on AI and machine learning on AWS, we generalize our learnings for any domain or specialization.
The document provides an overview of announcements from Amazon Web Services' annual re:Invent conference in December 2019. Key details include:
- The conference had 65,000 attendees and 3,000 sessions.
- Announcements covered improving the developer experience, compute, storage, AI/ML, databases/analytics, networking, security, and extending AWS beyond regions.
- New services and features were announced for Lambda, API Gateway, Step Functions, EventBridge, Amplify, SageMaker, EC2, EKS, EBS, S3, Rekognition, Lex, Translate, Transcribe, Comprehend, Personalize, Forecast, Fraud Detector, and more.
This document provides an overview and agenda for a workshop on end-to-end machine learning pipelines using TFX, Kubeflow, Airflow and MLflow. The agenda covers setting up an environment with Kubernetes, using TensorFlow Extended (TFX) components to build pipelines, ML pipelines with Airflow and Kubeflow, hyperparameter tuning with Kubeflow, and deploying notebooks with Kubernetes. Hands-on exercises are also provided to explore key areas like TensorFlow Data Validation, TensorFlow Transform, TensorFlow Model Analysis and Airflow ML pipelines.
Title
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
Video
https://youtu.be/vaB4IM6ySD0
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
Related Links
1. PipelineAI Home: https://pipeline.ai
2. PipelineAI Community Edition: http://community.pipeline.ai
3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline
4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
5. YouTube Videos: https://youtube.pipeline.ai
6. SlideShare Presentations: https://slideshare.pipeline.ai
7. Slack Support: https://joinslack.pipeline.ai
8. Web Support and Knowledge Base: https://support.pipeline.ai
9. Email Support: support@pipeline.ai
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...Chris Fregly
Traditional machine learning pipelines end with life-less models sitting on disk in the research lab. These traditional models are typically trained on stale, offline, historical batch data. Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production. Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered:
* Continuous Validation (V)
* Continuous Optimizing (O)
* Continuous Training (T)
* Continuous Explainability (E).
The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.)
Attendees will learn to create continuous machine learning pipelines in production with PipelineAI, TensorFlow, and Kafka.
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...Chris Fregly
Perform Online Predictions using Slack
A/B and multi-armed bandit model compare
Train Online Models with Kafka Streams
Create new models quickly
Deploy to production safely
Mirror traffic to validate online performance
Any Framework, Any Hardware, Any Cloud
Dashboard to manage the lifecycle of models from local development to live production
Generates optimized runtimes for the models
Custom targeting rules, shadow mode, and percentage-based rollouts to safely test features in live production
Continuous model training, model validation, and pipeline optimization
https://youtu.be/zpkH9oiIovU
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/258276286/
Related Links
PipelineAI Home: https://pipeline.ai
PipelineAI Community Edition: https://community.pipeline.ai
PipelineAI GitHub: https://github.com/PipelineAI/pipeline
PipelineAI Quick Start: https://quickstart.pipeline.ai
Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
YouTube Videos: https://youtube.pipeline.ai
SlideShare Presentations: https://slideshare.pipeline.ai
Slack Support:
https://joinslack.pipeline.ai
Web Support and Knowledge Base: https://support.pipeline.ai
Email Support: help@pipeline.ai
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
This document discusses distributed deep learning on the MapR Converged Data Platform. It provides an overview of MapR's enterprise big data journey and capabilities for distributed deep learning. It describes using containers and Kubernetes for deep learning model development and deployment, with NVIDIA GPUs for computation. It presents architectures and patterns for separating or collocating MapR and GPU clusters. Finally, it previews demos of parameter server/workers and real-time face detection using streams.
React Native vs Flutter - SSTech SystemSSTech System
Your project needs and long-term objectives will ultimately choose which of React Native and Flutter to use. For applications using JavaScript and current web technologies in particular, React Native is a mature and trustworthy choice. For projects that value performance and customizability across many platforms, Flutter, on the other hand, provides outstanding performance and a unified UI development experience.
React and Next.js are complementary tools in web development. React, a JavaScript library, specializes in building user interfaces with its component-based architecture and efficient state management. Next.js extends React by providing server-side rendering, routing, and other utilities, making it ideal for building SEO-friendly, high-performance web applications.
introduction of Ansys software and basic and advance knowledge of modelling s...sachin chaurasia
Ansys Mechanical enables you to solve complex structural engineering problems and make better, faster design decisions. With the finite element analysis (FEA) solvers available in the suite, you can customize and automate solutions for your structural mechanics problems and parameterize them to analyze multiple design scenarios. Ansys Mechanical is a dynamic tool that has a complete range of analysis tools.
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Softwares
NBFC Software: Optimize Your Non-Banking Financial Company
Enhance Your Financial Services with Comprehensive NBFC Software
NBFC software provides a complete solution for non-banking financial companies, streamlining banking and accounting functions to reduce operational costs. Our software is designed to meet the diverse needs of NBFCs, including investment banks, insurance companies, and hedge funds.
Key Features of NBFC Software:
Centralized Database: Facilitates inter-branch collaboration and smooth operations with a unified platform.
Automation: Simplifies loan lifecycle management and account maintenance, ensuring efficient delivery of financial services.
Customization: Highly customizable to fit specific business needs, offering flexibility in managing various loan types such as home loans, mortgage loans, personal loans, and more.
Security: Ensures safe and secure handling of financial transactions and sensitive data.
User-Friendly Interface: Designed to be intuitive and easy to use, reducing the learning curve for employees.
Cost-Effective: Reduces the need for additional manpower by automating tasks, making it a budget-friendly solution. Benefits of NBFC Software:
Go Paperless: Transition to a fully digital operation, eliminating offline work.
Transparency: Enables managers and executives to monitor various points of the banking process easily.
Defaulter Tracking: Helps track loan defaulters, maintaining a healthy loan management system.
Increased Accessibility: Cutting-edge technology increases the accessibility and usability of NBFC operations. Request a Demo Now!
IN Dubai [WHATSAPP:Only (+971588192166**)] Abortion Pills For Sale In Dubai** UAE** Mifepristone and Misoprostol Tablets Available In Dubai** UAE
CONTACT DR. SINDY Whatsapp +971588192166* We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai** Sharjah** Abudhabi** Ajman** Alain** Fujairah** Ras Al Khaimah** Umm Al Quwain** UAE** Buy cytotec in Dubai +971588192166* '''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol** Cytotec” +971588192166* ' Dr.SINDY ''BUY ABORTION PILLS MIFEGEST KIT** MISOPROSTOL** CYTOTEC PILLS IN DUBAI** ABU DHABI**UAE'' Contact me now via What's App… abortion pills in dubai Mtp-Kit Prices
abortion pills available in dubai/abortion pills for sale in dubai/abortion pills in uae/cytotec dubai/abortion pills in abu dhabi/abortion pills available in abu dhabi/abortion tablets in uae
… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all** Cytotec Abortion Pills are Available In Dubai / UAE** you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pills in Dubai** UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if it's beyond 6 months. Our Abu Dhabi** Ajman** Al Ain** Dubai** Fujairah** Ras Al Khaimah (RAK)** Sharjah** Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical** medical and surgical abortion methods for early through late second trimester** including the Abortion By Pill Procedure (RU 486** Mifeprex** Mifepristone** early options French Abortion Pill)** Tamoxifen** Methotrexate and Cytotec (Misoprostol). The Abu Dhabi** United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used** 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need for surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi** United Arab Emirates** uses the latest medications for medical abortions (RU-486** Mifeprex** Mifegyne** Mifepristone** early options French abortion pill)** Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi** United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our
Sami provided a beginner-friendly introduction to Amazon Web Services (AWS), covering essential terms, products, and services for cloud deployment. Participants explored AWS' latest Gen AI offerings, making it accessible for those starting their cloud journey or integrating AI into coding practices.
An MVP (Minimum Viable Product) mobile application is a streamlined version of a mobile app that includes only the core features necessary to address the primary needs of its users. The purpose of an MVP is to validate the app concept with minimal resources, gather user feedback, and identify any areas for improvement before investing in a full-scale development. This approach allows businesses to quickly launch their app, test its market viability, and make data-driven decisions for future enhancements, ensuring a higher likelihood of success and user satisfaction.
Software development... for all? (keynote at ICSOFT'2024)miso_uam
Our world runs on software. It governs all major aspects of our life. It is an enabler for research and innovation, and is critical for business competitivity. Traditional software engineering techniques have achieved high effectiveness, but still may fall short on delivering software at the accelerated pace and with the increasing quality that future scenarios will require.
To attack this issue, some software paradigms raise the automation of software development via higher levels of abstraction through domain-specific languages (e.g., in model-driven engineering) and empowering non-professional developers with the possibility to build their own software (e.g., in low-code development approaches). In a software-demanding world, this is an attractive possibility, and perhaps -- paraphrasing Andy Warhol -- "in the future, everyone will be a developer for 15 minutes". However, to make this possible, methods are required to tweak languages to their context of use (crucial given the diversity of backgrounds and purposes), and the assistance to developers throughout the development process (especially critical for non-professionals).
In this keynote talk at ICSOFT'2024 I presented enabling techniques for this vision, supporting the creation of families of domain-specific languages, their adaptation to the usage context; and the augmentation of low-code environments with assistants and recommender systems to guide developers (professional or not) in the development process.
Efficient hot work permit software for safe, streamlined work permit management and compliance. Enhance safety today. Contact us on +353 214536034.
https://sheqnetwork.com/work-permit/
Are you wondering how to migrate to the Cloud? At the ITB session, we addressed the challenge of managing multiple ColdFusion licenses and AWS EC2 instances. Discover how you can consolidate with just one EC2 instance capable of running over 50 apps using CommandBox ColdFusion. This solution supports both ColdFusion flavors and includes cb-websites, a GoLang binary for managing CommandBox websites.
Responsibilities of Fleet Managers and How TrackoBit Can Assist.pdfTrackobit
What do fleet managers do? What are their duties, responsibilities, and challenges? And what makes a fleet manager effective and successful? This blog answers all these questions.
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to Scalable Predicting - Strata Conference - San Jose - March 2018
1. HIGH PERFORMANCE TENSORFLOW IN
PRODUCTION WITH KUBERNETES AND GPUS
STRATA CONFERENCE, SAN JOSE MARCH 2018
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
2. KEY TAKE-AWAYS
With PipelineAI, You Can…
§ Generate Hardware-Specific Model Optimizations
§ Deploy and Compare Models in Live Production
§ Optimize Complete AI Pipeline Across Many Models
§ Hyper-Parameter Tune Both Training & Inference
3. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Routing
4. INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Founder @ Advanced Spark TensorFlow Meetup
§ Please Join Our 60,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
5. INTRODUCTIONS: YOU
§ Data Scientist, Data Engineer, Data Analyst, Data Curious
§ Want to Deploy ML/AI Models Rapidly and Safely
§ Need to Trace or Explain Model Predictions
§ Have a Decent Grasp of Computer Science Fundamentals
6. PIPELINE.AI IS 100% OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star this GitHub Repo!
§ “Each Star is Worth $1,500 in Seed Money”
- A Prominent Venture Capitalist in Silicon Valley
http://jrvis.com/red-dwarf/
10. WHY HEAVY FOCUS ON MODEL SERVING?
Model Training
Batch & Boring
Offline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Are Very Well-Known
Real-Time & Exciting!!
Online in Live Production
Pipeline Extends into Production
Continuous Insight into Live Production
Huuuuuuge Number of Application Users
Runtime Optimizations Not Yet Explored
<<<
Model Serving
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
11. CLOUD-BASED MODEL SERVING OPTIONS
§ AWS SageMaker
§ Released Nov 2017 @ Re-invent
§ Custom Docker Images for Training/Serving (ie. PipelineAI Images)
§ Distributed TensorFlow Training through Estimator API
§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine
§ Mostly Command-Line Based
§ Driving TensorFlow Open Source API (ie. Estimator API)
§ Azure ML
PipelineAI Supports SageMaker
*and*
Hybrid-Cloud Deployments
12. BUILD MODEL WITH THE RUNTIME
§ Package Model + Runtime into 1 Docker Image
§ Emphasizes Immutable Deployment and Infrastructure
§ Same Image Across All Environments
§ No Library or Dependency Surprises from Laptop to Production
§ Allows Tuning Model + Runtime Together
pipeline predict-server-build --model-name=mnist
--model-tag=A
--model-type=tensorflow
--model-runtime=tfserving
--model-chip=gpu
--model-path=./tensorflow/mnist/
Build Local
Model Server A
13. RUN A LOADTEST LOCALLY!
§ Perform Mini-Load Test on Local Model Server
§ Immediate, Local Prediction Performance Metrics
§ Compare to Previous Model + Runtime Variations
§ Gain Intuition Before Push to Prod
pipeline predict-server-start --model-name=mnist
--model-tag=A
--memory-limit=2G
pipeline predict-http-test --model-endpoint-url=http://localhost:8080
--test-request-path=test_request.json
--test-request-concurrency=1000
Start Local
LoadTest
Start Local
Model Servers
14. TUNE MODEL + RUNTIME TOGETHER
§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)
§ Reduced Precision (ie. FP16 Half Precision)
§ Model Serving (Post-Train) Optimizations
§ Quantize Model Weights + Activations From 32-bit to 8-bit
§ Fuse Neural Network Layers Together
§ Model Runtime Optimizations
§ Runtime Config: Request Batch Size, etc
§ Different Runtime: TensorFlow Serving CPU/GPU, Nvidia TensorRT
15. DETECT UNDERUTILIZED CPUS, GPUS
§ Instrument Code to Generate “Timelines”
§ Analyze with Google Web
Tracing Framework (WTF)
§ Monitor CPU with top, GPU with nvidia-smi
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
16. SERVING (POST-TRAIN) OPTIMIZATIONS
§ Prepare Model for Serving
§ Simplify Network, Reduce Size
§ Reduce Precision -> Fast Math
§ Some Tools
§ Graph Transform Tool (GTT)
§ tfcompile
After Training
After
Optimizing!
pipeline optimize --optimization-list=[‘quantize_weights’,‘tfcompile’]
--model-name=mnist
--model-tag=A
--model-path=./tensorflow/mnist/model
--model-inputs=[‘x’]
--model-outputs=[‘add’]
--output-path=./tensorflow/mnist/optimized_model
Linear
Regression
Model Size: 70MB –> 70K (!)
17. NVIDIA TENSOR-RT RUNTIME
§ Post-Training Model Optimizations
§ Specific to Nvidia GPUs
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
18. TENSORFLOW LITE RUNTIME
§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime
§ Low-Latency, Fast Startup
§ Selective Operator Loading
§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)
§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
19. 3 DIFFERENT RUNTIMES, SAME MODEL
pipeline predict-server-build --model-name=mnist
--model-tag=C
--model-type=tensorflow
--model-runtime=tensorrt
--model-chip=gpu
--model-path=./tensorflow/mnist/
Build Local
Model Server C
pipeline predict-server-build --model-name=mnist
--model-tag=A
--model-type=tensorflow
--model-runtime=tfserving
--model-chip=cpu
--model-path=./tensorflow/mnist/
Build Local
Model Server A
pipeline predict-server-build --model-name=mnist
--model-tag=B
--model-type=tensorflow
--model-runtime=tfserving
--model-chip=gpu
--model-path=./tensorflow/mnist/
Build Local
Model Server B
Same Model,
Diff Runtime
20. PUSH IMAGE TO DOCKER REGISTRY
§ Supports All Public + Private Docker Registries
§ DockerHub, Artifactory, Quay, AWS, Google, …
§ Or Self-Hosted, Private Docker Registry
pipeline predict-server-push --model-name=mnist
--model-tag=A
--image-registry-url=<your-registry>
--image-registry-repo=<your-repo>
Push Images to
Docker Registry
21. DEPLOY MODELS SAFELY TO PROD
§ Deploy from CLI or Jupyter Notebook
§ Tear-Down and Rollback Models Quickly
§ Shadow Canary: Deploy to 20% Live Traffic
§ Split Canary: Deploy to 97-2-1% Live Traffic
pipeline predict-kube-start --model-name=mnist
--model-tag=BStart Cluster B
pipeline predict-kube-start --model-name=mnist
--model-tag=CStart Cluster C
pipeline predict-kube-start --model-name=mnist
--model-tag=AStart Cluster A
pipeline predict-kube-route --model-name=mnist
--model-split-tag-and-weight-dict='{"A":97, "B":2, "C”:1}'
--model-shadow-tag-list='[]'
Route Live Traffic
22. COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation + Training Accuracy
§ CPU + GPU Utilization
§ Online, Live Prediction Values
§ Compare Relative Precision
§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics
§ Response Time, Throughput
§ Cost ($) Per Prediction
23. ENSEMBLE PREDICTION AUDIT TRAIL
§ Necessary for Model Explain-ability
§ Fine-Grained Request Tracing
§ Used for Model Ensembles
24. REAL-TIME PREDICTION STREAMS
§ Visually Compare Real-time Predictions
Features and
Inputs
Predictions and
Confidences
Model B Model CModel A
28. SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Based on Cost ($) Per Prediction
§ Cost Changes Throughout Day
§ Lose AWS Spot Instances
§ Google Cloud Becomes Cheaper
§ Shift Across Clouds & On-Prem
29. PSEUDO-CONTINUOUS TRAINING
§ Identify and Fix Borderline (Unconfident) Predictions
§ Fix Predictions Along Class Boundaries
§ Facilitate ”Human in the Loop”
§ Retrain with Newly-Labeled Data
§ Game-ify the Labeling Process
§ Path to Crowd-Sourced Labeling
30. CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning!
§ PipelineAI Supports Continuous Model Training!
§ Kafka, Kinesis
§ Spark Streaming, Flink
§ Storm, Heron
31. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Routing
32. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
33. SETTING UP TENSORFLOW WITH GPUS
§ Very Painful!
§ Especially inside Docker
§ Use nvidia-docker
§ Especially on Kubernetes!
§ Use the Latest Kubernetes (with Init Script Support)
§ http://pipeline.ai for GitHub + DockerHub Links
35. GPU HALF-PRECISION SUPPORT
§ FP32 is “Full Precision”, FP16 is “Half Precision”
§ Two(2) FP16’s in Each FP32 GPU Core for 2x Throughput!
§ Lower Precision is OK for Approx. Deep Learning Use Cases
§ The Network Matters Most – Not Individual Neuron Accuracy
§ Supported by Pascal P100 (2016) and Volta V100 (2017)
Set the following on GPU’s with CC 5.3+:
TF_FP16_MATMUL_USE_FP32_COMPUTE=0
TF_FP16_CONV_USE_FP32_COMPUTE=0
TF_XLA_FLAGS=--xla_enable_fast_math=1
36. VOLTA V100 (2017) VS. PASCAL P100 (2016)
§ 84 Streaming Multiprocessors (SM’s)
§ 5,376 GPU Cores
§ 672 Tensor Cores (ie. Google TPU)
§ Mixed FP16/FP32 Precision
§ Matrix Dims Should be Multiples of 8
§ More Shared Memory
§ New L0 Instruction Cache
§ Faster L1 Data Cache
§ V100 vs. P100 Performance
§ 12x Training, 6x Inference
37. FP32 VS. FP16 ON AWS GPU INSTANCES
FP16 Half Precision
87.2 T ops/second for p3 Volta V100
4.1 T ops/second for g3 Tesla M60
1.6 T ops/second for p2 Tesla K80
FP32 Full Precision
15.4 T ops/second for p3 Volta V100
4.0 T ops/second for g3 Tesla M60
3.3 T ops/second for p2 Tesla K80
38. § Currently Supports the Following:
§ Tesla K80
§ Pascal P100
§ Volta V100 Coming Soon?
§ TPUs (Only in Google Cloud)
§ Attach GPUs to CPU Instances
§ Similar to AWS Elastic GPU, except less confusing
WHAT ABOUT GOOGLE CLOUD?
39. V100 AND CUDA 9
§ Independent Thread Scheduling - Finally!!
§ Similar to CPU fine-grained thread synchronization semantics
§ Allows GPU to yield execution of any thread
§ Still Optimized for SIMT (Same Instruction Multi-Thread)
§ SIMT units automatically scheduled together
§ Explicit Synchronization
P100 V100
New CUDA
Thread Cooperative Groups
https://devblogs.nvidia.com/cooperative-groups/
40. GPU CUDA PROGRAMMING
§ Barbaric, But Fun Barbaric
§ Must Know Hardware Very Well
§ Hardware Changes are Painful
§ Use the Profilers & Debuggers
41. CUDA STREAMS
§ Asynchronous I/O Transfer
§ Overlap Compute and I/O
§ Keep GPUs Saturated!
§ Used Heavily by TensorFlow
Bad
Good
Bad
Good
43. PYCUDA AND NUMBA
§ https://devblogs.nvidia.com/numba-python-cuda-
acceleration/
§ https://devblogs.nvidia.com/seven-things-numba/
44. LET’S SEE WHAT THIS THING CAN DO!
§ Navigate to the following notebook:
01a_Explore_GPU
01b_Explore_Numba
§ https://github.com/PipelineAI/notebooks
45. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Feed, Train, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
46. TRAINING TERMINOLOGY
§ Tensors: N-Dimensional Arrays
§ ie. Scalar, Vector, Matrix
§ Operations: MatMul, Add, SummaryLog,…
§ Graph: Graph of Operations (DAG)
§ Session: Contains Graph(s)
§ Feeds: Feed Inputs into Placeholder
§ Fetches: Fetch Output from Operation
§ Variables: What We Learn Through Training
§ aka “Weights”, “Parameters”
§ Devices: Hardware Device (GPU, CPU, TPU, ...)
-TensorFlow-
Trains
Variables
-User-
Fetches
Outputs
-User-
Feeds
Inputs
-TensorFlow-
Performs
Operations
-TensorFlow-
Flows
Tensors
with tf.device(“/cpu:0,/gpu:15”):
48. TENSORFLOW GRAPH EXECUTION
§ Lazy Execution by Default
§ Similar to Spark
§ Eager Execution Now Supported (TensorFlow 1.4+)
§ Similar to PyTorch
§ "Linearize” Execution to Minimize RAM Usage
§ Useful on Single GPU with Limited RAM
49. OPERATION PARALLELISM
§ Inter-Op (Between-Op) Parallelism
§ By default, TensorFlow runs multiple ops in parallel
§ Useful for low core and small memory/cache envs
§ Set to one (1)
§ Intra-Op (Within-Op) Parallelism
§ Different threads can use same set of data in RAM
§ Useful for compute-bound workloads (CNNs)
§ Set to # of cores (>=2)
50. TENSORFLOW MODEL
§ MetaGraph
§ Combines GraphDef and Metadata
§ GraphDef
§ Architecture of your model (nodes, edges)
§ Metadata
§ Asset: Accompanying assets to your model
§ SignatureDef: Maps external to internal tensors
§ Variables
§ Stored separately during training (checkpoint)
§ Allows training to continue from any checkpoint
§ Variables are “frozen” into Constants when preparing for inference
GraphDef
x
W
mul add
b
MetaGraph
Metadata
Assets
SignatureDef
Tags
Version
Variables:
“W” : 0.328
“b” : -1.407
53. TENSORFLOW + SPARK OPTIONS
§ TensorFlow on Spark (Yahoo!)
§ TensorFrames <-Dead Project->
§ Separate Clusters for Spark and TensorFlow
§ Spark: Boring Batch ETL
§ TensorFlow: Exciting AI Model Training and Serving
§ Hand-Off Point is S3, HDFS, Google Cloud Storage
54. TENSORFLOW + KAFKA
§ TensorFlow Dataset API Now Supports Kafka!!
from tensorflow.contrib.kafka.python.ops import kafka_dataset_ops
repeat_dataset = kafka_dataset_ops.KafkaDataset(topics,
group="test",
eof=True)
.repeat(num_epochs)
batch_dataset = repeat_dataset.batch(batch_size)
…
55. TO UNDERSTAND TENSORFLOW I/O…
§ TFRecord File Format
§ TensorFlow Python and C++ Dataset API
§ Python Module and Packaging
§ Comfort with Python’s Lack of Strong Typing
§ C++ Concurrency Constructs
§ Protocol Buffers
§ Old Queue API
§ GPU/CUDA Memory Tricks And a Lot of Coffee!
56. FEED TENSORFLOW TRAINING PIPELINE
§ Training is Limited by the Ingestion Pipeline
§ Number One Problem We See Today
§ Scaling GPUs Up / Out Doesn’t Help
§ GPUs are Heavily Under-Utilized
§ Use tf.dataset API for best perf
§ Efficient parallel async I/O (C++)
Tesla K80 Volta V100
57. DON’T USE FEED_DICT!!
§ feed_dict Requires Python <-> C++ Serialization
§ Not Optimized for Production Ingestion Pipelines
§ Retrieves Next Batch After Current Batch is Done
§ Single-Threaded, Synchronous
§ CPUs/GPUs Not Fully Utilized!
§ Use Queue or Dataset APIs
§ Queues are old & complex
sess.run(train_step, feed_dict={…}
58. DETECT UNDERUTILIZED CPUS, GPUS
§ Instrument Code to Generate “Timelines”
§ Analyze with Google Web
Tracing Framework (WTF)
§ Monitor CPU with top, GPU with nvidia-smi
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
59. QUEUES
§ More than Traditional Queue
§ Uses CUDA Streams
§ Perform I/O, Pre-processing, Cropping, Shuffling, …
§ Pull from HDFS, S3, Google Storage, Kafka, ...
§ Combine Many Small Files into Large TFRecord Files
§ Use CPUs to Free GPUs for Compute
§ Helps Saturate CPUs and GPUs
60. QUEUE CAPACITY PLANNING
§ batch_size
§ # examples / batch (ie. 64 jpg)
§ Limited by GPU RAM
§ num_processing_threads
§ CPU threads pull and pre-process batches of data
§ Limited by CPU Cores
§ queue_capacity
§ Limited by CPU RAM (ie. 5 * batch_size)
61. TF.DTYPE
§ tf.float32, tf.int32, tf.string, etc
§ Default is usually tf.float32
§ Most TF operations support numpy natively
# Tuple of (tf.float32 scalar, tf.int32 array of 100 elements)
(tf.random_uniform([1]), tf.random_uniform([1, 100], dtype=tf.int32))
62. TF.TRAIN.FEATURE
§ Three(3) Feature Types
§ Bytes
§ Float
§ Int64
§ Actually, They Are Lists of 0..* Values of 3 Types Above
§ BytesList
§ FloatList
§ Int64List
63. TF.TRAIN.FEATURES
§ Map of {String -> Feature}
§ Better Name is “FeatureMap”
§ Organize Feature into Categories
§ Access Feature Using
Features[’feature_name’]
65. TF.TRAIN.FEATURELISTS
§ Map of {String -> FeatureList}
§ Better Name is “FeatureListMap”
§ Organize FeatureList into Categories
§ Access FeatureList Using
FeatureLists[’feature_list_name’]
66. TF.TRAIN.EXAMPLE
§ Key-Value Dictionary
§ String -> tf.train.Feature
§ Not a Self-Describing Format (?!)
§ Must Establish Schema Upfront by Writers and Readers
§ Must Obey the Following Conventions
§ Feature K must be of Type T in all Examples
§ Feature K can be omitted, default can be configured
§ If Feature K exists as empty, no default is applied
67. TF.TFRECORD
§ Contains many tf.train.Example’s
=> tf.train.Example contains many tf.train.Feature’s
=> tf.train.Feature contains BytesList, FloatList, Int64List
§ Record-Oriented Format of Binary Strings (ProtoBuffer)
§ Must Convert tf.train.Example to Serialized String
§ Use tf.train.Example.SerializeToString()
§ Used for Large Scale ML/AI Training
§ Not Meant for Random or Non-Sequential Access
§ Compression: GZIP, ZLIB
uint64 length
uint32 masked_crc32_of_length
byte data[length]
uint32 masked_crc32_of_data
68. EMBRACE BINARY FORMATS!
§ Unreadable and Scary, But Much More Efficient
§ Better Use of Memory and Disk Cache
§ Faster Copying and Moving
§ Smaller on the Wire
I
69. CONVERTING MNIST DATA TO TFRECORD
def convert_to_tfrecord(data, name):
images = data.images
labels = data.labels
num_examples = data.num_examples
rows = images.shape[1]
cols = images.shape[2]
depth = images.shape[3]
filename = os.path.join(FLAGS.directory, name + '.tfrecords’)
with tf.python_io.TFRecordWriter(filename) as writer:
for index in range(num_examples):
image_raw = images[index].tostring()
example = tf.train.Example(
features = tf.train.Features(
feature = {'height': tf.train.Feature(int64_list=tf.train.Int64List(value=[rows])),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=[cols])),
'depth': tf.train.Feature(int64_list=tf.train.Int64List(value=[depth])),
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw]))
}))
writer.write(example.SerializeToString())
tf.python_io.TFRecordWriter
70. READING TF.TFRECORD’S
§ tf.data.TFRecordDatasetß Preferred (Dataset API)
§ tf.TFRecordReader()ß Not Preferred (Queue API)
§ tf.python_io.tf_record_iterator ß Preferred
§ Used as Python Generator
for serialized_example in tf.python_io.tf_record_iterator(filename):
example = tf.train.Example()
example.ParseFromString(serialized_example)
image_raw example.features.feature['image_raw’].string_list.value
height = example.features.feature[‘height'].int32_list.value[0]
…
71. DE-SERIALIZING TF.TFRECORD’S
feature_map = {'height': tf.train.Feature(int64_list=tf.train.Int64List(value=[rows])),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=[cols])),
'depth': tf.train.Feature(int64_list=tf.train.Int64List(value=[depth])),
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw]))
deserialized_features = tf.parse_single_example(serialized_example, features=feature_map)
# Cast height from String to int32
height = tf.cast(deserialized_features[‘height’], tf.int32)
…
# Convert raw image from string to float32
image_raw = tf.decode_raw(deserialized_features[‘image_raw'], tf.float32)
72. MORE TF.TRAIN.FEATURE CONSTRUCTS
§ tf.VarLenFeature
§ tf.FixedLenFeature, tf.FixedLenSequenceFeature
§ tf.SparseFeature
feature_map = {'height': tf.FixedLenFeature((), tf.int32, …)),
…
'image_raw': tf.train.VarLenFeature(tf.string, …))
deserialized_features = tf.parse_single_example(serialized_example, features=feature_map)
# Cast height from String to int32
height = tf.cast(deserialized_features[‘height’], tf.int32)
…
# Convert raw image from string to float32
image_raw = tf.decode_raw(deserialized_features[‘image_raw'], tf.float32)
73. TF.DATA.DATASET
tf.Tensor => tf.data.Dataset
Functional Transformations
Python Generator => tf.data.Dataset
Dataset.from_tensors((features, labels))
Dataset.from_tensor_slices((features, labels))
TextLineDataset(filenames)
dataset.map(lambda x: tf.decode_jpeg(x))
dataset.repeat(NUM_EPOCHS)
dataset.batch(BATCH_SIZE)
def generator():
while True:
yield ...
dataset.from_generator(generator, tf.int32)
Dataset => One-Shot Iterator
Dataset => Initializable Iter
iter = dataset.make_one_shot_iterator()
next_element = iter.get_next()
while …:
sess.run(next_element)
iter = dataset.make_initializable_iterator()
sess.run(iter.initializer, feed_dict=PARAMS)
next_element = iter.get_next()
while …:
sess.run(next_element)
TIP: Use Dataset.prefetch() and parallel version of Dataset.map()
76. CUSTOM TF.PY_FUNC() TRANSFORMATION
§ Custom Python Function
§ Similar to Spark Python UDF (Eek!)
§ You Will Suffer a Big Performance Penalty
§ Try to Use TensorFlow-Native Operations
§ Remember, you can build your own in C++!
77. TF.DATA.ITERATOR TYPES
§ One Shot: Iterates Once Through the Dataset
§ Currently, best Iterator to use with Estimator API
§ Initializable: Runs iterator.initializer() Once
§ Re-Initializable: Runs iterator.initializer() Many
§ Ie. Random shuffling between iterations (epochs) of training
§ Feedable: Switch Between Different Dataset
§ Uses Feed and Placeholder to explicitly feed the iterator
§ Doesn’t require initialization when switching
78. TF.DATA.ITERATOR SIMPLE EXAMPLE
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
# Typically `result` will be the output of a model, or an optimizer's
# training operation.
result = tf.add(next_element, next_element)
sess.run(iterator.initializer)
while True:
try:
sess.run(result) # è 0, 2, 4, 6, 8
except tf.errors.OutOfRangeError:
print(‘End of dataset…’)
break
79. TF.DATA.ITERATOR TEXT EXAMPLE
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.TextLineDataset(filenames)
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.filter(lambda line: tf.not_equal(tf.substr(line, 0, 1), "#"))))
§ Skip 1st Header Line and Comment Lines Starting with `#`
80. TF.DATA.ITERATOR NUMPY EXAMPLE
# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
features = data["features"]
labels = data["labels"]
# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]
features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)
dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# …Your Dataset Transformations…
iterator = dataset.make_initializable_iterator()
sess.run(iterator.initializer, feed_dict={features_placeholder: features,
labels_placeholder: labels})
81. TF.DATA.ITERATOR TFRECORD EXAMPLE
filenames = tf.placeholder(tf.string, shape=[None])
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...) # Parse the record into tensors.
dataset = dataset.repeat() # Repeat the input indefinitely.
dataset = dataset.batch(32) # Batches of size 32
iterator = dataset.make_initializable_iterator()
# You can feed the initializer with the appropriate filenames for the current
# phase of execution, e.g. training vs. validation.
# Initialize `iterator` with training data.
training_filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
sess.run(iterator.initializer, feed_dict={filenames: training_filenames})
# Initialize `iterator` with validation data.
validation_filenames = ["/var/data/validation1.tfrecord", ...]
sess.run(iterator.initializer, feed_dict={filenames: validation_filenames})
82. FUTURE OF DATASET API
§ Replaces Queue API
§ More Functional Operators
§ Automatic GPU Data Staging
§ Under-utilized GPUs Assisting with Data Ingestion
§ Advanced, RL-based Device Placement Strategies
83. TF.ESTIMATOR.ESTIMATOR (1/2)
§ Supports Keras!
§ Unified API for Local + Distributed
§ Provide Clear Path to Production
§ Enable Rapid Model Experiments
§ Provide Flexible Parameter Tuning
§ Enable Downstream Optimizing & Serving Infra( )
§ Nudge Users to Best Practices Through Opinions
§ Provide Hooks/Callbacks to Override Opinions
84. TF.ESTIMATOR.ESTIMATOR (2/2)
§ “Train-to-Serve” Design
§ Create Custom Estimator or Re-Use Canned Estimator
§ Hides Session, Graph, Layers, Iterative Loops (Train, Eval, Predict)
§ Hooks for All Phases of Model Training and Evaluation
§ Load Input: input_fn()
§ Train: model_fn() and train()
§ Evaluate: eval_fn() and evaluate()
§ Performance Metrics: Loss, Accuracy, …
§ Save and Export: export_savedmodel()
§ Predict: predict() Uses the slow sess.run()
https://github.com/GoogleCloudPlatform/cloudml-samples
/blob/master/census/customestimator/
85. TF.CONTRIB.LEARN.EXPERIMENT
§ Easier-to-Use Distributed TensorFlow
§ Same API for Local and Distributed
§ Combines Estimator with input_fn()
§ Used for Training, Evaluation, & Hyper-Parameter Tuning
§ Distributed Training Defaults to Data-Parallel & Async
§ Cluster Configuration is Fixed at Start of Training Job
§ No Auto-Scaling Allowed, but That’s OK for Training
§ Note: This is Likely to be Deprecated Soon
86. ESTIMATOR + EXPERIMENT CONFIGS
§ TF_CONFIG
§ Special environment variable for config
§ Defines ClusterSpec in JSON incl. master, workers, PS’s
§ Distributed mode ‘{“environment”:“cloud”}’
§ Local: ‘{environment”:“local”, {“task”:{”type”:”worker”}}’
§ RunConfig: Defines checkpoint interval, output directory,
§ HParams: Hyper-parameter tuning parameters and ranges
§ learn_runner creates RunConfig before calling run() & tune()
§ schedule is set based on {”task”:{”type”:…}}
TF_CONFIG=
'{
"environment": "cloud",
"cluster":
{
"master":["worker0:2222”],
"worker":["worker1:2222"],
"ps": ["ps0:2222"]
},
"task": {"type": "ps",
"index": "0"}
}'
87. ESTIMATOR + KERAS
§ Distributed TensorFlow (Estimator) + Easy to Use (Keras)
§ tf.keras.estimator.model_to_estimator()
# Instantiate a Keras inception v3 model.
keras_inception_v3 = tf.keras.applications.inception_v3.InceptionV3(weights=None)
# Compile model with the optimizer, loss, and metrics you'd like to train with.
keras_inception_v3.compile(optimizer=tf.keras.optimizers.SGD(lr=0.0001, momentum=0.9),
loss='categorical_crossentropy',
metric='accuracy')
# Create an Estimator from the compiled Keras model.
est_inception_v3 = tf.keras.estimator.model_to_estimator(keras_model=keras_inception_v3)
# Treat the derived Estimator as you would any other Estimator. For example,
# the following derived Estimator calls the train method:
est_inception_v3.train(input_fn=my_training_set, steps=2000)
88. “CANNED” ESTIMATORS
§ Commonly-Used Estimators
§ Pre-Tested and Pre-Tuned
§ DNNClassifer, TensorForestEstimator
§ Always Use Canned Estimators If Possible
§ Reduce Lines of Code, Complexity, and Bugs
§ Use FeatureColumn to Define & Create Features
Custom vs. Canned
@ Google, August 2017
89. ESTIMATOR + DATASET API
def input_fn():
def generator():
while True:
yield ...
my_dataset = tf.data.dataset.from_generator(generator, tf.int32)
# A one-shot iterator automatically initializes itself on first use.
iter = my_dataset.make_one_shot_iterator()
# The return value of get_next() matches the dataset element type.
images, labels = iter.get_next()
return images, labels
# The input_fn can be used as a regular Estimator input function.
estimator = tf.estimator.Estimator(…)
estimator.train(train_input_fn=input_fn, …)
91. TF.CONTRIB.LEARN.HEAD (OBJECTIVES)
§ Single-Objective Estimator
§ Single classification prediction
§ Multi-Objective Estimator
§ One (1) classification prediction
§ One(1) final layer to feed into next model
§ Multiple Heads Used to Ensemble Models
§ Treats neural network as a feature engineering step
§ Supported by TensorFlow Serving
92. TF.LAYERS
§ Standalone Layer or Entire Sub-Graphs
§ Functions of Tensor Inputs & Outputs
§ Mix and Match with Operations
§ Assumes 1st Dimension is Batch Size
§ Handles One (1) to Many (*) Inputs
§ Metrics are Layers
§ Loss Metric (Per Mini-Batch)
§ Accuracy and MSE (Across Mini-Batches)
93. TF.FEATURE_COLUMN
§ Used by Canned Estimator
§ Declaratively Specify Training Inputs
§ Converts Sparse to Dense Tensors
§ Sparse Features: Query Keyword, ProductID
§ Dense Features: One-Hot, Multi-Hot
§ Wide/Linear: Use Feature-Crossing
§ Deep: Use Embeddings
94. TF.FEATURE_COLUMN EXAMPLE
§ Continuous + One-Hot + Embedding
deep_columns = [
age,
education_num,
capital_gain,
capital_loss,
hours_per_week,
tf.feature_column.indicator_column(workclass),
tf.feature_column.indicator_column(education),
tf.feature_column.indicator_column(marital_status),
tf.feature_column.indicator_column(relationship),
# To show an example of embedding
tf.feature_column.embedding_column(occupation, dimension=8),
]
95. FEATURE CROSSING
§ Create New Features by Combining Existing Features
§ Limitation: Combinations Must Exist in Training Dataset
base_columns = [
education, marital_status, relationship, workclass, occupation, age_buckets
]
crossed_columns = [
tf.feature_column.crossed_column(
['education', 'occupation'], hash_bucket_size=1000),
tf.feature_column.crossed_column(
['age_buckets', 'education', 'occupation'], hash_bucket_size=1000)
]
96. SEPARATE TRAINING + EVALUATION
§ Separate Training and Evaluation Clusters
§ Evaluate Upon Checkpoint
§ Avoid Resource Contention
§ Training Continues in Parallel with Evaluation
Training
Cluster
Evaluation
Cluster
Parameter Server
Cluster
97. BATCH (RE-)NORMALIZATION (2015, 2017)
§ Each Mini-Batch May Have Wildly Different Distributions
§ Normalize per Batch (and Layer)
§ Faster Training, Learns Quicker
§ Final Model is More Accurate
§ TensorFlow is already on 2nd Generation Batch Algorithm
§ First-Class Support for Fusing Batch Norm Layers
§ Final mean + variance Are Folded Into Graph Later
-- (Almost) Always Use Batch (Re-)Normalization! --
z = tf.matmul(a_prev, W)
a = tf.nn.relu(z)
a_mean, a_var = tf.nn.moments(a, [0])
scale = tf.Variable(tf.ones([depth/channels]))
beta = tf.Variable(tf.zeros ([depth/channels]))
bn = tf.nn.batch_normalizaton(a, a_mean, a_var,
beta, scale, 0.001)
98. DROPOUT (2014)
§ Training Technique
§ Prevents Overfitting
§ Helps Avoid Local Minima
§ Inherent Ensembling Technique
§ Creates and Combines Different Neural Architectures
§ Expressed as Probability Percentage (ie. 50%)
§ Boost Other Weights During Validation & Prediction
Perform Dropout
(Training Phase)
Boost for Dropout
(Validation & Prediction Phase)
0%
Dropout
50%
Dropout
99. BATCH NORM, DROPOUT + ESTIMATOR API
§ Must Specify Eval or Training Mode with Estimator API
§ These Will Behave Differently Depending on the Mode
100. SAVED MODEL FORMAT
§ Different Format than Traditional Exporter
§ Contains Checkpoints, 1..* MetaGraph’s, and Assets
§ Export Manually with SavedModelBuilder
§ Estimator.export_savedmodel()
§ Hooks to Generate SignatureDef
§ Use saved_model_cli to Verify
§ Used by TensorFlow Serving
§ New Standard Export Format? (Catching on Slowly…)
101. TENSORFLOW DEBUGGER
§ Step through Operations
§ Inspect Inputs and Outputs
§ Wrap Session in Debug Session
sess = tf.Session(config=config)
sess =
tf_debug.LocalCLIDebugWrapperSession(sess)
https://www.tensorflow.org/
programmers_guide/debugger
102. LET’S DEBUG A MODEL
§ Navigate to the following notebook:
04_Debug_Model
§ https://github.com/PipelineAI/notebooks
103. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
104. SINGLE NODE, MULTI-GPU TRAINING
§ cpu:0
§ By default, all CPUs
§ Requires extra config to target a CPU
§ gpu:0..n
§ Each GPU has a unique id
§ TF usually prefers a single GPU
§ xla_cpu:0, xla_gpu:0..n
§ “JIT Compiler Device”
§ Hints TensorFlow to attempt JIT Compile
with tf.device(“/cpu:0”):
with tf.device(“/gpu:0”):
with tf.device(“/gpu:1”):
GPU 0 GPU 1
105. DISTRIBUTED, MULTI-NODE TRAINING
§ TensorFlow Automatically Inserts Send and Receive Ops into Graph
§ Parameter Server Synchronously Aggregates Updates to Variables
§ Nodes with Multiple GPUs will Pre-Aggregate Before Sending to PS
Worker0 Worker0
Worker1
Worker0 Worker1 Worker2
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0 gpu1
gpu2 gpu3
gpu0
gpu1
gpu0
gpu0
Single
Node
Multiple
Nodes
106. DATA PARALLEL VS. MODEL PARALLEL
§ Data Parallel (“Between-Graph Replication”)
§ Send exact same model to each device
§ Each device operates on partition of data
§ ie. Spark sends same function to many workers
§ Each worker operates on their partition of data
§ Model Parallel (“In-Graph Replication”)
§ Send different partition of model to each device
§ Each device operates on all data
§ Difficult, but required for larger models with lower-memory GPUs
107. SYNCHRONOUS VS. ASYNCHRONOUS
§ Synchronous
§ Nodes compute gradients
§ Nodes update Parameter Server (PS)
§ Nodes sync on PS for latest gradients
§ Asynchronous
§ Some nodes delay in computing gradients
§ Nodes don’t update PS
§ Nodes get stale gradients from PS
§ May not converge due to stale reads!
108. CHIEF WORKER
§ Chief Defaults to Worker Task 0
§ Task 0 is guaranteed to exist
§ Performs Maintenance Tasks
§ Writes log summaries
§ Instructs PS to checkpoint vars
§ Performs PS health checks
§ (Re-)Initialize variables at (re-)start of training
109. NODE AND PROCESS FAILURES
§ Checkpoint to Persistent Storage (HDFS, S3)
§ Use MonitoredTrainingSession and Hooks
§ Use a Good Cluster Orchestrator (ie. Kubernetes, Mesos)
§ Understand Failure Modes and Recovery States
Stateless, Not Bad: Training Continues Stateful, Bad: Training Must Stop Dios Mio! Long Night Ahead…
110. AGENDA
Part 1: Optimize TensorFlow Training
§ GPUs and TensorFlow
§ Train, Inspect, and Debug TensorFlow Models
§ TensorFlow Distributed Cluster Model Training
§ Optimize Training with JIT XLA Compiler
111. XLA FRAMEWORK
§ XLA: “Accelerated Linear Algebra”
§ Reduce Reliance on Custom Operators
§ Intermediate Representation used by Hardware Vendors
§ Improve Portability
§ Increase Execution Speed
§ Decrease Memory Usage
§ Decrease Mobile Footprint
Helps TensorFlow Be Flexible AND Performant!!
112. XLA HIGH LEVEL OPTIMIZER (HLO)
§ HLO: “High Level Optimizer”
§ Compiler Intermediate Representation (IR)
§ Independent of source and target language
§ XLA Step 1 Emits Target-Independent HLO
§ XLA Step 2 Emits Target-Dependent LLVM
§ LLVM Emits Native Code Specific to Target
§ Supports x86-64, ARM64 (CPU), and NVPTX (GPU)
113. JIT COMPILER
§ JIT: “Just-In-Time” Compiler
§ Built on XLA Framework
§ Reduce Memory Movement – Especially with GPUs
§ Reduce Overhead of Multiple Function Calls
§ Similar to Spark Operator Fusing in Spark 2.0
§ Unroll Loops, Fuse Operators, Fold Constants, …
§ Scopes: session, device, with jit_scope():
114. VISUALIZING JIT COMPILER IN ACTION
Before JIT After JIT
Google Web Tracing Framework:
http://google.github.io/tracing-framework/
from tensorflow.python.client import timeline
trace =
timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.json', 'w') as trace_file:
trace_file.write(
trace.generate_chrome_trace_format(show_memory=True))
run_options = tf.RunOptions(trace_level=tf.RunOptions.SOFTWARE_TRACE)
run_metadata = tf.RunMetadata()
sess.run(options=run_options,
run_metadata=run_metadata)
116. LET’S TRAIN WITH XLA CPU
§ Navigate to the following notebook:
06_Train_Model_XLA_CPU
§ https://github.com/PipelineAI/notebooks
117. LET’S TRAIN WITH XLA GPU
§ Navigate to the following notebook:
06a_Train_Model_XLA_GPU
§ https://github.com/PipelineAI/notebooks
118. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Routing
120. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
121. AOT COMPILER
§ Standalone, Ahead-Of-Time (AOT) Compiler
§ Built on XLA framework
§ tfcompile
§ Creates executable with minimal TensorFlow Runtime needed
§ Includes only dependencies needed by subgraph computation
§ Creates functions with feeds (inputs) and fetches (outputs)
§ Packaged as cc_libary header and object files to link into your app
§ Commonly used for mobile device inference graph
§ Currently, only CPU x86-64 and ARM are supported - no GPU
122. GRAPH TRANSFORM TOOL (GTT)
§ Post-Training Optimization to Prepare for Inference
§ Remove Training-only Ops (checkpoint, drop out, logs)
§ Remove Unreachable Nodes between Given feed -> fetch
§ Fuse Adjacent Operators to Improve Memory Bandwidth
§ Fold Final Batch Norm mean and variance into Variables
§ Round Weights/Variables to improve compression (ie. 70%)
§ Quantize (FP32 -> INT8) to Speed Up Math Operations
125. AFTER STRIPPING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ Results
§ Graph much simpler
§ File size much smaller
126. AFTER REMOVING UNUSED NODES
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ Results
§ Pesky nodes removed
§ File size a bit smaller
127. AFTER FOLDING CONSTANTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ Results
§ Placeholders (feeds) -> Variables*
(*Why Variables and not Constants?)
128. AFTER FOLDING BATCH NORMS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ Results
§ Graph remains the same
§ File size approximately the same
129. AFTER QUANTIZING WEIGHTS
§ Optimizations
§ strip_unused_nodes
§ remove_nodes
§ fold_constants
§ fold_batch_norms
§ quantize_weights
§ Results
§ Graph is same, file size is smaller, compute is faster
130. WEIGHT QUANTIZATION
§ FP16 and INT8 Are Smaller and Computationally Simpler
§ Weights/Variables are Constants
§ Easy to Linearly Quantize
132. ACTIVATION QUANTIZATION
§ Activations Not Known Ahead of Time
§ Depends on input, not easy to quantize
§ Requires Additional Calibration Step
§ Use a “representative” dataset
§ Per Neural Network Layer…
§ Collect histogram of activation values
§ Generate many quantized distributions with different saturation thresholds
§ Choose threshold to minimize…
KL_divergence(ref_distribution, quant_distribution)
§ Not Much Time or Data is Required (Minutes on Commodity Hardware)
136. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
137. MODEL SERVING TERMINOLOGY
§ Inference
§ Only Forward Propagation through Network
§ Predict, Classify, Regress, …
§ Bundle
§ GraphDef, Variables, Metadata, …
§ Assets
§ ie. Map of ClassificationID -> String
§ {9283: “penguin”, 9284: “bridge”}
§ Version
§ Every Model Has a Version Number (Integer)
§ Version Policy
§ ie. Serve Only Latest (Highest), Serve Both Latest and Previous, …
138. TENSORFLOW SERVING FEATURES
§ Supports Auto-Scaling
§ Custom Loaders beyond File-based
§ Tune for Low-latency or High-throughput
§ Serve Diff Models/Versions in Same Process
§ Customize Models Types beyond HashMap and TensorFlow
§ Customize Version Policies for A/B and Bandit Tests
§ Support Request Draining for Graceful Model Updates
§ Enable Request Batching for Diff Use Cases and HW
§ Supports Optimized Transport with GRPC and Protocol Buffers
139. PREDICTION SERVICE
§ Predict (Original, Generic)
§ Input: List of Tensor
§ Output: List of Tensor
§ Classify
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (class_label: String, score: float)
§ Regress
§ Input: List of tf.Example (key, value) pairs
§ Output: List of (label: String, score: float)
141. MULTI-HEADED INFERENCE
§ Inputs Pass Through Model One Time
§ Model Returns Multiple Predictions:
1. Human-readable prediction (ie. “penguin”, “church”,…)
2. Final layer of scores (float vector)
§ Final Layer of floats Pass to the Next Model in Ensemble
§ Optimizes Bandwidth, CPU/GPU, Latency, Memory
§ Enables Complex Model Composing and Ensembling
142. BUILD YOUR OWN MODEL SERVER
§ Adapt GRPC(Google) <-> HTTP (REST of the World)
§ Perform Batch Inference vs. Request/Response
§ Handle Requests Asynchronously
§ Support Mobile, Embedded Inference
§ Customize Request Batching
§ Add Circuit Breakers, Fallbacks
§ Control Latency Requirements
§ Reduce Number of Moving Parts
#include
“tensorflow_serving/model_servers/server_core.h”
class MyTensorFlowModelServer {
ServerCore::Options options;
// set options (model name, path, etc)
std::unique_ptr<ServerCore> core;
TF_CHECK_OK(
ServerCore::Create(std::move(options), &core)
);
}
Compile and Link with
libtensorflow.so
143. RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ Similar to TF Graph Transform Tool
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
144. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
145. AGENDA
Part 2: Optimize TensorFlow Serving
§ AOT XLA Compiler and Graph Transform Tool
§ Key Components of TensorFlow Serving
§ Deploy Optimized TensorFlow Model
§ Optimize TensorFlow Serving Runtime
146. REQUEST BATCH TUNING
§ max_batch_size
§ Enables throughput/latency tradeoff
§ Bounded by RAM
§ batch_timeout_micros
§ Defines batch time window, latency upper-bound
§ Bounded by RAM
§ num_batch_threads
§ Defines parallelism
§ Bounded by CPU cores
§ max_enqueued_batches
§ Defines queue upper bound, throttling
§ Bounded by RAM
Reaching either threshold
will trigger a batch
Separate, Non-Batched Requests
Combined, Batched Requests
147. ADVANCED BATCHING & SERVING TIPS
§ Batch Just the GPU/TPU Portions of the Computation Graph
§ Batch Arbitrary Sub-Graphs using Batch / Unbatch Graph Ops
§ Distribute Large Models Into Shards Across TensorFlow Model Servers
§ Batch RNNs Used for Sequential and Time-Series Data
§ Find Best Batching Strategy For Your Data Through Experimentation
§ BasicBatchScheduler: Homogeneous requests (ie Regress or Classify)
§ SharedBatchScheduler: Mixed requests, multi-step, ensemble predict
§ StreamingBatchScheduler: Mixed CPU/GPU/IO-bound Workloads
§ Serve Only One (1) Model Inside One (1) TensorFlow Serving Process
§ Much Easier to Debug, Tune, Scale, and Manage Models in Production.
149. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Routing
150. AGENDA
Part 3: Advanced Model Serving + Routing
§ Kubernetes Ingress, Egress, Networking
§ Istio and Envoy Architecture
§ Intelligent Traffic Routing and Scaling
§ Metrics, Chaos Monkey, Production Readiness
151. KUBERNETES PRIORITY SCHEDULING
Workloads can …
§ access the entire cluster up
to the autoscaler max size
§ trigger autoscaling until
higher-priority workload
§ “fill the cracks” of resource
usage of higher-priority work
(i.e., wait to run until resources are feed
152. KUBERNETES INGRESS
§ Single Service
§ Can also use Service (LoadBalancer or NodePort)
§ Fan Out & Name-Based Virtual Hosting
§ Route Traffic Using Path or Host Header
§ Reduces # of load balancers needed
§ 404 Implemented as default backend
§ Federation / Hybrid-Cloud
§ Creates Ingress objects in every cluster
§ Monitors health and capacity of pods within each cluster
§ Routes clients to appropriate backend anywhere in federation
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gateway-fanout
annotations:
kubernetes.io/ingress.class: istio
spec:
rules:
- host: foo.bar.com
http:
paths:
- path: /foo
backend:
serviceName: s1
servicePort: 80
- path: /bar
backend:
serviceName: s2
servicePort: 80
Fan Out (Path)
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: gateway-virtualhost
annotations:
kubernetes.io/ingress.class: istio
spec:
rules:
- host: foo.bar.com
http:
paths:
backend:
serviceName: s1
servicePort: 80
- host: bar.foo.com
http:
paths:
backend:
serviceName: s2
servicePort: 80
Virtual Hosting
153. KUBERNETES INGRESS CONTROLLER
§ Ingress Controller Types
§ Google Cloud: kubernetes.io/ingress.class: gce
§ Nginx: kubernetes.io/ingress.class: nginx
§ Istio: kubernetes.io/ingress.class: istio
§ Must Start Ingress Controller Manually
§ Just deploying Ingress is not enough
§ Not started by kube-controller-manager
§ Start Istio Ingress Controller
kubectl apply -f
$ISTIO_INSTALL_PATH/install/kubernetes/istio.yaml
165. ISTIO AUTO-SCALING
§ Traffic Routing and Auto-Scaling Occur Independently
§ Istio Continues to Obey Traffic Splits After Auto-Scaling
§ Auto-Scaling May Occur In Response to New Traffic Route
166. A/B & BANDIT MODEL TESTING
§ Perform Live Experiments in Production
§ Compare Existing Model A with Model B, Model C
§ Safe Split-Canary Deployment
§ Pro Tip: Keep Ingress Simple – Use Route Rules Instead!
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: predict-mnist-20-5-75
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 20 # 20% still routes to model A
- labels:
version: B # 5% routes to new model B
weight: 5
- labels:
version: C # 75% routes to new model C
weight: 75
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: predict-mnist-1-2-97
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 1 # 1% routes to model A
- labels:
version: B # 2% routes to new model B
weight: 2
- labels:
version: C # 97% routes to new model C
weight: 97
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: predict-mnist-97-2-1
spec:
destination:
name: predict-mnist
precedence: 2 # Greater than global deny-all
route:
- labels:
version: A
weight: 97 # 97% still routes to model A
- labels:
version: B # 2% routes to new model B
weight: 2
- labels:
version: C # 1% routes to new model C
weight: 1
167. AGENDA
Part 3: Advanced Model Serving + Routing
§ Kubernetes Ingress, Egress, Networking
§ Istio and Envoy Architecture
§ Intelligent Traffic Routing and Scaling
§ Metrics, Chaos Monkey, Production Readiness
170. SPECIAL THANKS TO CHRISTIAN POSTA
§ http://blog.christianposta.com/istio-workshop
171. AGENDA
Part 0: Introductions and Setup
Part 1: Optimize TensorFlow Training
Part 2: Optimize TensorFlow Serving
Part 3: Advanced Model Serving + Routing
173. THANK YOU!!
§ Please Star this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline
Contact Me
chris@pipeline.ai
@cfregly