SlideShare a Scribd company logo
Recent Advances in Machine Learning: Bringing
a New Level of Intelligence to Network
Reliability and Optimization
David Meyer
Brocade Chief Scientist and Fellow
dmm@{brocade.com,uoregon.edu,1-4-5.net,..}
Orange Gardens
28 Jul 2016
Paris, France
You might be surprised but what
is going to drive innovation in the
enterprise and in the public cloud
is machine learning.
Bill Coughran, Sequoia Capital #ONUGSpring16
Machine Learning is the way we
are going to automate your
automation.
Chris Wright, RedHat CTO #RHSummit
However…
You are here (again)
Agenda
• Who Am I?
• What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml
Who Am I?
• Chief Scientist and Fellow at Brocade
• Adjunct Faculty Computer Science University of Oregon
• 15 years at Cisco
• 5 years at Sprint
• 4 years at Brocade
• IETF, NANOG, RIPE, OpenDaylight, ...
• Other areas I’ve been active in: Biology, Math, Control Theory, Law, ...
• Focus of last several years: Machine Learning theory and practice for networking
• See http://www.1-4-5.net/~dmm/vita.html for more detail
Aside: One Aspect Of Our Challenge
• The current distance between theory and practice in Machine
Learning is effectively zero
• What does this mean?
• Consider “sequence to sequence learning”
– https://arxiv.org/pdf/1409.3215v3.pdf
– Plus “thought vectors”
• Elapsed time from NIPS paper to Google Smart Reply1?
– A little over one year
• This means that the latest theory is being rapidly deployed in product
– The best ML theory folks are also the ML best coders
– And the field is moving at an astounding rate
– e.g., https://github.com/LeavesBreathe/tensorflow_with_latest_papers
• Which not surprisingly means that achieving SOA ML results requires deep
understanding of both theory and practice
• This presents a challenge (skills mismatch) for the networking field
1 https://gmail.googleblog.com/2016/03/smart-reply-comes-to-inbox-by-gmail-on-the-web.html
Another Aspect Of Our Challenge
What is our ”ImageNet”?
And is there a theory of “network”, or is every network a one-off?
Agenda
• Who Am I?
• Level Set: What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml
First, What is Machine Learning?
The complexity in traditional computer programming is in the
code (programs that people write). In machine learning, learning
algorithms are in principle simple and the complexity (structure) is
in the data. Is there a way that we can automatically learn that
structure? That is what is at the heart of machine learning.
-- Andrew Ng
• Said another way, we want to discover the Data Generating Distribution that underlies
the data that we observe. This is the function that we want to learn.
• Moreover, we care about primarily about the generalization accuracy of our model (function)
• Accuracy on examples we have not yet seen (BTW, how is this possible?)
• as opposed the accuracy on the training set (note: overfitting)
Computer
Output
Computer
Data
Program
Output
Data
Program
Traditional Programming
Machine Learning
Same Thing Said In Cartoon Form
Supervised Learning: training set of the form {(xi,yi)}; yi = f(xi)Unsupervised Learning: training set of the form {xi}Reinforcement Learning: Learn from interaction with environment
Environment
Many Beautiful Mysteries Remain
• Back-propagation
– Gradient-based optimization with gradients computed by back-prop
– Why is such a simple algorithm so powerful?
– Optimization an active area of research
– Many new techniques , e.g., Layer Normalization
– https://arxiv.org/pdf/1607.06450v1.pdf
• Neural Nets
– What are the units (artificial neurons) actually doing?
– Area of active research…
• Adversarial Images
– https://arxiv.org/abs/1412.6572
– Generative Adversarial Nets (GANs)
• …
Agenda
• Who Am I?
• Level Set: What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml
What is all the ML Excitement About?
ML Applications You Likely Interact with Everyday
Why this is relevant: Compute, Storage, Networking, Security and Energy (CSNSE)
use cases will all use similar technology (deep nets, …)
Think Object Recognition is Impressive?
Lip Reading?
Real-Time Language
Translation
One More
Why is this all happening now?
• Before 2006 people thought deep neural networks couldn’t be trained
• So why now?
• Theoretical breakthroughs in 2006
• Learned how to train deep neural networks
• Technically: RBMs, Stacked Auto-encoders, sparse coding
• Nice overview of LBH DL journey: http://chronicle.com/article/The-Believers/190147/
• Compute
• CPUs were 2^20s of times too slow
• Parallel processing/algorithms
• GPUs + OpenCL/CUDA
• FPGAs, custom hardware
• Datasets
• Massive data sets: Google, FB, Baidu, …
• Standardized data sets
• Alternate view of history?
• LBH Nature DL review: http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
• Jürgen Schmidhuber’s critique : http://people.idsia.ch/~juergen/deep-learning-conspiracy.html
• LBH rebuttal: http://recode.net/2015/07/15/ai-conspiracy-the-scientists-behind-deep-learning/
Image courtesy Yoshua Bengio
BTW, What Kinds of Custom H/W?
Ok, We Know That Machines Are
Getting Smarter, But Where Does
Knowledge Come From?
Evolution Experience
Culture Machines
Many orders of magnitude
faster and larger
So how can machines
discover new knowledge?
How Can Machines Discover New Knowledge?
• Fill the gaps in existing knowledge
– Symbolists
– Technology: Induction/Inverse Deduction
• Emulate the brain
– Connectionists
– Technology: Deep neural nets
• Emulate evolution
– Evolutionaries
– Technology: Genetic Algorithms
• Systematically reduce uncertainty
– Bayesians
– Technology: Bayesian Inference
• - Notice similarities between old and new
– Analogizers
– Technology: Kernel machines/Support Vector Machines
These correspond to the 5 major
schools of thought in machine
learning
https://en.wikipedia.org/wiki/The_Master_Algorithm
Agenda
• Who Am I?
• Level Set: What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml
Integrated Approaches
Bringing Rigor to ML for Networking
• Clustering
– Categorical and continuous (e.g., LDA, K-means, …)
– Most “Machine Learning” systems you see today
– Anomaly detection
• Deep Neural Networks
– FF/Recurrent/memory nets (LSTMs, NTMs, ...)
– Time series/long range dependencies
– Understand sequences such as network flows
• Reinforcement Learning
– Give Machine Learning agency
• learn feedback control of actions
– Non-stationary distributions
– Deep neural networks (value/policy networks)
– Understand/react in adversarial environments
• Standardized (and public) data sets
– Required to evaluate techniques
– Move the field forward
• e.g. MNIST , ImageNet, …
So What Kinds of Use Cases Are We
Working On?
• Security/Anomaly Detection
• Site Reliability Engineering
• NFV orchestration and optimization
• New automation tools for DevOps
• Predicting and remediating problems mobile networks
• Network control plane optimization
• Network Gamification
• ...
Importantly, we can use deep
neural networks to capture
intuition from experts in these
problem domains
Example: Using Flow Data for Anomaly Detection
Generalization Graph
Radial Nested Block State Model with Edge Bundling
Generalized Anomaly Detection Setting
DNS tunneling
Linear Decision Boundary
Another Visualization
(same data)
Linear Decision Boundary
What is Really Happening Here?
Deep Nets disentangle the underlying explanatory factors in the data
so as to make them linearly separable
Graphic courtesy Christopher Olah
Linear Decision Boundary
Target function represented by input data is some twisted up manifold
Agenda
• Who Am I?
• Level Set: What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml
Reinforcement learning meets Deep Learning and Monte Carlo Tree Search
What is All the Excitement About, Redux
Reinforcement Learning?
Reinforcement Learning Setup Reinforcement Learning Example
One of the many ”AlphaGo Breakthroughs”
http://www.1-4-5.net/~dmm/ml/log_derivative_trick.pdf
Training the SL Policy Network
Deep Convolution Neural Network Captures the Intuition of Expert Go Players
BTW, Why Is Go So Hard?
• Game Tree Complexity = bd
• Brute Force Search Intractable
– Search space is huge (10^721)
– Difficult to evaluate who is winning
• Can we characterize network state and action spaces?
– And why is this important?
Presentation Layer
Domain KnowledgeDomain KnowledgeDomain KnowledgeDomain Knowledge
Data
Collection
Packet brokers, flow data, …
Preprocessing
Big Data, Hadoop, Data
Science, …
Learning Inference
Business
Logic
Learning
Analytics Platform
Typical Workflow Schematic
Intelligence
Remediation/Optimization/…
Topology, Anomaly Detection,
Root Cause Analysis,
Predictive Insight, ….
Intent
Importantly, control is not learned
Static vs. Reinforcement Learning Architectures
• Today’s Static ML Architectures
• ML doesn’t have “agency”
• Hard-coded or open loop control
• Doesn’t learn control
• Assumes stationary DGD
• Largely off-line
• Reinforcement Learning Architecture
• Agent architecture
• Agent learns control/action selection
• Adapts to evolving environment
• Non-stationary distributions
• Network gamification
Network Gamification?
• Gamification is the application of game theoretic
approaches and design techniques to what have
traditionally been non-game problems, including
business, operations and social impact challenges
– Online learning/optimization
– Security
– Automation
– Almost any application that interacts with its environment
• Idea: Envision network and other automation tasks as
2-player games, and use Deep Learning, Reinforcement
Learning and Monte Carlo Tree Search to learn optimal
control in an online setting
Summary: Why Is AlphaGo Important/Relevant?
• Security: Agent can learn dynamic/evolving behavior of adversary
• DevOPs: Agent can learn workflow automation (e.g. Openstack Mistral/StackStorm)
• Orchestration: Agent can learn dynamic behavior of VNFs and system as a whole
• General: Deep learning can capture human intuition
Example: Gamifying Workflows
State Transitions
𝜋 𝑠 = 𝑎
𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)
• Workflows can be learned/optimized
• Model as a POMDP: <S,A,T,R,Ω,O,𝛾>
• Estimate with MDP: <S,A,T,R,𝛾>
• Model free
• Deep Value (Q) and Policy (𝜋) networks
https://github.com/StackStorm/st2/blob/master/contrib/examples/actions/chains/echochain.yaml
https://github.com/StackStorm
https://gym.openai.com/
Boolean state test/actions
Interested in Reinforcement Learning?
One of the Many Important Things
OpenAI is Doing
Agenda
• Who Am I?
• Level Set: What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml
Possible Areas of Collaboration
• Collaborative work on Machine Learning
– Orchestration
– Anomaly Detection
– Optimization
– Other use cases
– Design of new learning approaches
• Data sets
• Prototype implementations
• Brocade Funding of Events
• Internships and other forms of student funding
• Others
Visualizing & Classifying Flow Behavior
• Each “row” in the image is one time step of the flow (1 flow record)
• H flow records (in timestamp order)/flow (zero padded)
• Each flow record has W fields  image size: W x H x D
• D = 4 (4 channels (RGBA), 1/octect in the IP address)
• Column consist of the fields in the flow record
• e.g., source IP, source port, dest IP, bytes out, …
• Image is used to train a Convolutional Neural Net to recognize
anomolous flows
End to end flow encoded as an image
W
H
D
Finally…Explore Wacky Ideas
Q&A
Thank you

More Related Content

Recent Advances in Machine Learning: Bringing a New Level of Intelligence to Network Reliability and Optimization

  • 1. Recent Advances in Machine Learning: Bringing a New Level of Intelligence to Network Reliability and Optimization David Meyer Brocade Chief Scientist and Fellow dmm@{brocade.com,uoregon.edu,1-4-5.net,..} Orange Gardens 28 Jul 2016 Paris, France
  • 2. You might be surprised but what is going to drive innovation in the enterprise and in the public cloud is machine learning. Bill Coughran, Sequoia Capital #ONUGSpring16
  • 3. Machine Learning is the way we are going to automate your automation. Chris Wright, RedHat CTO #RHSummit
  • 5. Agenda • Who Am I? • What Is Machine Learning? • What is all the (Machine Learning) Excitement About? • Integrated Approaches • Machine Learning Excitement Redux: Beyond Static Learning • Possible Collaborations • Technical explanations/code – http://www.1-4-5.net/~dmm/ml
  • 6. Who Am I? • Chief Scientist and Fellow at Brocade • Adjunct Faculty Computer Science University of Oregon • 15 years at Cisco • 5 years at Sprint • 4 years at Brocade • IETF, NANOG, RIPE, OpenDaylight, ... • Other areas I’ve been active in: Biology, Math, Control Theory, Law, ... • Focus of last several years: Machine Learning theory and practice for networking • See http://www.1-4-5.net/~dmm/vita.html for more detail
  • 7. Aside: One Aspect Of Our Challenge • The current distance between theory and practice in Machine Learning is effectively zero • What does this mean? • Consider “sequence to sequence learning” – https://arxiv.org/pdf/1409.3215v3.pdf – Plus “thought vectors” • Elapsed time from NIPS paper to Google Smart Reply1? – A little over one year • This means that the latest theory is being rapidly deployed in product – The best ML theory folks are also the ML best coders – And the field is moving at an astounding rate – e.g., https://github.com/LeavesBreathe/tensorflow_with_latest_papers • Which not surprisingly means that achieving SOA ML results requires deep understanding of both theory and practice • This presents a challenge (skills mismatch) for the networking field 1 https://gmail.googleblog.com/2016/03/smart-reply-comes-to-inbox-by-gmail-on-the-web.html
  • 8. Another Aspect Of Our Challenge What is our ”ImageNet”? And is there a theory of “network”, or is every network a one-off?
  • 9. Agenda • Who Am I? • Level Set: What Is Machine Learning? • What is all the (Machine Learning) Excitement About? • Integrated Approaches • Machine Learning Excitement Redux: Beyond Static Learning • Possible Collaborations • Technical explanations/code – http://www.1-4-5.net/~dmm/ml
  • 10. First, What is Machine Learning? The complexity in traditional computer programming is in the code (programs that people write). In machine learning, learning algorithms are in principle simple and the complexity (structure) is in the data. Is there a way that we can automatically learn that structure? That is what is at the heart of machine learning. -- Andrew Ng • Said another way, we want to discover the Data Generating Distribution that underlies the data that we observe. This is the function that we want to learn. • Moreover, we care about primarily about the generalization accuracy of our model (function) • Accuracy on examples we have not yet seen (BTW, how is this possible?) • as opposed the accuracy on the training set (note: overfitting)
  • 11. Computer Output Computer Data Program Output Data Program Traditional Programming Machine Learning Same Thing Said In Cartoon Form Supervised Learning: training set of the form {(xi,yi)}; yi = f(xi)Unsupervised Learning: training set of the form {xi}Reinforcement Learning: Learn from interaction with environment Environment
  • 12. Many Beautiful Mysteries Remain • Back-propagation – Gradient-based optimization with gradients computed by back-prop – Why is such a simple algorithm so powerful? – Optimization an active area of research – Many new techniques , e.g., Layer Normalization – https://arxiv.org/pdf/1607.06450v1.pdf • Neural Nets – What are the units (artificial neurons) actually doing? – Area of active research… • Adversarial Images – https://arxiv.org/abs/1412.6572 – Generative Adversarial Nets (GANs) • …
  • 13. Agenda • Who Am I? • Level Set: What Is Machine Learning? • What is all the (Machine Learning) Excitement About? • Integrated Approaches • Machine Learning Excitement Redux: Beyond Static Learning • Possible Collaborations • Technical explanations/code – http://www.1-4-5.net/~dmm/ml
  • 14. What is all the ML Excitement About? ML Applications You Likely Interact with Everyday Why this is relevant: Compute, Storage, Networking, Security and Energy (CSNSE) use cases will all use similar technology (deep nets, …)
  • 15. Think Object Recognition is Impressive?
  • 19. Why is this all happening now? • Before 2006 people thought deep neural networks couldn’t be trained • So why now? • Theoretical breakthroughs in 2006 • Learned how to train deep neural networks • Technically: RBMs, Stacked Auto-encoders, sparse coding • Nice overview of LBH DL journey: http://chronicle.com/article/The-Believers/190147/ • Compute • CPUs were 2^20s of times too slow • Parallel processing/algorithms • GPUs + OpenCL/CUDA • FPGAs, custom hardware • Datasets • Massive data sets: Google, FB, Baidu, … • Standardized data sets • Alternate view of history? • LBH Nature DL review: http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html • Jürgen Schmidhuber’s critique : http://people.idsia.ch/~juergen/deep-learning-conspiracy.html • LBH rebuttal: http://recode.net/2015/07/15/ai-conspiracy-the-scientists-behind-deep-learning/ Image courtesy Yoshua Bengio
  • 20. BTW, What Kinds of Custom H/W?
  • 21. Ok, We Know That Machines Are Getting Smarter, But Where Does Knowledge Come From? Evolution Experience Culture Machines Many orders of magnitude faster and larger So how can machines discover new knowledge?
  • 22. How Can Machines Discover New Knowledge? • Fill the gaps in existing knowledge – Symbolists – Technology: Induction/Inverse Deduction • Emulate the brain – Connectionists – Technology: Deep neural nets • Emulate evolution – Evolutionaries – Technology: Genetic Algorithms • Systematically reduce uncertainty – Bayesians – Technology: Bayesian Inference • - Notice similarities between old and new – Analogizers – Technology: Kernel machines/Support Vector Machines These correspond to the 5 major schools of thought in machine learning https://en.wikipedia.org/wiki/The_Master_Algorithm
  • 23. Agenda • Who Am I? • Level Set: What Is Machine Learning? • What is all the (Machine Learning) Excitement About? • Integrated Approaches • Machine Learning Excitement Redux: Beyond Static Learning • Possible Collaborations • Technical explanations/code – http://www.1-4-5.net/~dmm/ml
  • 24. Integrated Approaches Bringing Rigor to ML for Networking • Clustering – Categorical and continuous (e.g., LDA, K-means, …) – Most “Machine Learning” systems you see today – Anomaly detection • Deep Neural Networks – FF/Recurrent/memory nets (LSTMs, NTMs, ...) – Time series/long range dependencies – Understand sequences such as network flows • Reinforcement Learning – Give Machine Learning agency • learn feedback control of actions – Non-stationary distributions – Deep neural networks (value/policy networks) – Understand/react in adversarial environments • Standardized (and public) data sets – Required to evaluate techniques – Move the field forward • e.g. MNIST , ImageNet, …
  • 25. So What Kinds of Use Cases Are We Working On? • Security/Anomaly Detection • Site Reliability Engineering • NFV orchestration and optimization • New automation tools for DevOps • Predicting and remediating problems mobile networks • Network control plane optimization • Network Gamification • ... Importantly, we can use deep neural networks to capture intuition from experts in these problem domains
  • 26. Example: Using Flow Data for Anomaly Detection Generalization Graph Radial Nested Block State Model with Edge Bundling Generalized Anomaly Detection Setting DNS tunneling Linear Decision Boundary
  • 28. What is Really Happening Here? Deep Nets disentangle the underlying explanatory factors in the data so as to make them linearly separable Graphic courtesy Christopher Olah Linear Decision Boundary Target function represented by input data is some twisted up manifold
  • 29. Agenda • Who Am I? • Level Set: What Is Machine Learning? • What is all the (Machine Learning) Excitement About? • Integrated Approaches • Machine Learning Excitement Redux: Beyond Static Learning • Possible Collaborations • Technical explanations/code – http://www.1-4-5.net/~dmm/ml
  • 30. Reinforcement learning meets Deep Learning and Monte Carlo Tree Search What is All the Excitement About, Redux
  • 31. Reinforcement Learning? Reinforcement Learning Setup Reinforcement Learning Example
  • 32. One of the many ”AlphaGo Breakthroughs” http://www.1-4-5.net/~dmm/ml/log_derivative_trick.pdf
  • 33. Training the SL Policy Network Deep Convolution Neural Network Captures the Intuition of Expert Go Players
  • 34. BTW, Why Is Go So Hard? • Game Tree Complexity = bd • Brute Force Search Intractable – Search space is huge (10^721) – Difficult to evaluate who is winning • Can we characterize network state and action spaces? – And why is this important?
  • 35. Presentation Layer Domain KnowledgeDomain KnowledgeDomain KnowledgeDomain Knowledge Data Collection Packet brokers, flow data, … Preprocessing Big Data, Hadoop, Data Science, … Learning Inference Business Logic Learning Analytics Platform Typical Workflow Schematic Intelligence Remediation/Optimization/… Topology, Anomaly Detection, Root Cause Analysis, Predictive Insight, …. Intent Importantly, control is not learned
  • 36. Static vs. Reinforcement Learning Architectures • Today’s Static ML Architectures • ML doesn’t have “agency” • Hard-coded or open loop control • Doesn’t learn control • Assumes stationary DGD • Largely off-line • Reinforcement Learning Architecture • Agent architecture • Agent learns control/action selection • Adapts to evolving environment • Non-stationary distributions • Network gamification
  • 37. Network Gamification? • Gamification is the application of game theoretic approaches and design techniques to what have traditionally been non-game problems, including business, operations and social impact challenges – Online learning/optimization – Security – Automation – Almost any application that interacts with its environment • Idea: Envision network and other automation tasks as 2-player games, and use Deep Learning, Reinforcement Learning and Monte Carlo Tree Search to learn optimal control in an online setting
  • 38. Summary: Why Is AlphaGo Important/Relevant? • Security: Agent can learn dynamic/evolving behavior of adversary • DevOPs: Agent can learn workflow automation (e.g. Openstack Mistral/StackStorm) • Orchestration: Agent can learn dynamic behavior of VNFs and system as a whole • General: Deep learning can capture human intuition
  • 39. Example: Gamifying Workflows State Transitions 𝜋 𝑠 = 𝑎 𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠) • Workflows can be learned/optimized • Model as a POMDP: <S,A,T,R,Ω,O,𝛾> • Estimate with MDP: <S,A,T,R,𝛾> • Model free • Deep Value (Q) and Policy (𝜋) networks https://github.com/StackStorm/st2/blob/master/contrib/examples/actions/chains/echochain.yaml https://github.com/StackStorm https://gym.openai.com/ Boolean state test/actions
  • 41. One of the Many Important Things OpenAI is Doing
  • 42. Agenda • Who Am I? • Level Set: What Is Machine Learning? • What is all the (Machine Learning) Excitement About? • Integrated Approaches • Machine Learning Excitement Redux: Beyond Static Learning • Possible Collaborations • Technical explanations/code – http://www.1-4-5.net/~dmm/ml
  • 43. Possible Areas of Collaboration • Collaborative work on Machine Learning – Orchestration – Anomaly Detection – Optimization – Other use cases – Design of new learning approaches • Data sets • Prototype implementations • Brocade Funding of Events • Internships and other forms of student funding • Others
  • 44. Visualizing & Classifying Flow Behavior • Each “row” in the image is one time step of the flow (1 flow record) • H flow records (in timestamp order)/flow (zero padded) • Each flow record has W fields  image size: W x H x D • D = 4 (4 channels (RGBA), 1/octect in the IP address) • Column consist of the fields in the flow record • e.g., source IP, source port, dest IP, bytes out, … • Image is used to train a Convolutional Neural Net to recognize anomolous flows End to end flow encoded as an image W H D Finally…Explore Wacky Ideas