Recent Advances in Machine Learning: Bringing a New Level of Intelligence to Network Reliability and Optimization

Recent Advances in Machine Learning: Bringing
a New Level of Intelligence to Network
Reliability and Optimization
David Meyer
Brocade Chief Scientist and Fellow
dmm@{brocade.com,uoregon.edu,1-4-5.net,..}
Orange Gardens
28 Jul 2016
Paris, France

You might be surprised but what
is going to drive innovation in the
enterprise and in the public cloud
is machine learning.
Bill Coughran, Sequoia Capital #ONUGSpring16

Machine Learning is the way we
are going to automate your
automation.
Chris Wright, RedHat CTO #RHSummit

However…
You are here (again)

Agenda
• Who Am I?
• What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml

Who Am I?
• Chief Scientist and Fellow at Brocade
• Adjunct Faculty Computer Science University of Oregon
• 15 years at Cisco
• 5 years at Sprint
• 4 years at Brocade
• IETF, NANOG, RIPE, OpenDaylight, ...
• Other areas I’ve been active in: Biology, Math, Control Theory, Law, ...
• Focus of last several years: Machine Learning theory and practice for networking
• See http://www.1-4-5.net/~dmm/vita.html for more detail

Aside: One Aspect Of Our Challenge
• The current distance between theory and practice in Machine
Learning is effectively zero
• What does this mean?
• Consider “sequence to sequence learning”
– https://arxiv.org/pdf/1409.3215v3.pdf
– Plus “thought vectors”
• Elapsed time from NIPS paper to Google Smart Reply1?
– A little over one year
• This means that the latest theory is being rapidly deployed in product
– The best ML theory folks are also the ML best coders
– And the field is moving at an astounding rate
– e.g., https://github.com/LeavesBreathe/tensorflow_with_latest_papers
• Which not surprisingly means that achieving SOA ML results requires deep
understanding of both theory and practice
• This presents a challenge (skills mismatch) for the networking field
1 https://gmail.googleblog.com/2016/03/smart-reply-comes-to-inbox-by-gmail-on-the-web.html

Another Aspect Of Our Challenge
What is our ”ImageNet”?
And is there a theory of “network”, or is every network a one-off?

Agenda
• Who Am I?
• Level Set: What Is Machine Learning?
• What is all the (Machine Learning) Excitement About?
• Integrated Approaches
• Machine Learning Excitement Redux: Beyond Static Learning
• Possible Collaborations
• Technical explanations/code
– http://www.1-4-5.net/~dmm/ml

First, What is Machine Learning?
The complexity in traditional computer programming is in the
code (programs that people write). In machine learning, learning
algorithms are in principle simple and the complexity (structure) is
in the data. Is there a way that we can automatically learn that
structure? That is what is at the heart of machine learning.
-- Andrew Ng
• Said another way, we want to discover the Data Generating Distribution that underlies
the data that we observe. This is the function that we want to learn.
• Moreover, we care about primarily about the generalization accuracy of our model (function)
• Accuracy on examples we have not yet seen (BTW, how is this possible?)
• as opposed the accuracy on the training set (note: overfitting)

Computer
Output
Computer
Data
Program
Output
Data
Program
Traditional Programming
Machine Learning
Same Thing Said In Cartoon Form
Supervised Learning: training set of the form {(xi,yi)}; yi = f(xi)Unsupervised Learning: training set of the form {xi}Reinforcement Learning: Learn from interaction with environment
Environment

Many Beautiful Mysteries Remain
• Back-propagation
– Gradient-based optimization with gradients computed by back-prop
– Why is such a simple algorithm so powerful?
– Optimization an active area of research
– Many new techniques , e.g., Layer Normalization
– https://arxiv.org/pdf/1607.06450v1.pdf
• Neural Nets
– What are the units (artificial neurons) actually doing?
– Area of active research…
• Adversarial Images
– https://arxiv.org/abs/1412.6572
– Generative Adversarial Nets (GANs)
• …

What is all the ML Excitement About?
ML Applications You Likely Interact with Everyday
Why this is relevant: Compute, Storage, Networking, Security and Energy (CSNSE)
use cases will all use similar technology (deep nets, …)

Think Object Recognition is Impressive?

Real-Time Language
Translation

Why is this all happening now?
• Before 2006 people thought deep neural networks couldn’t be trained
• So why now?
• Theoretical breakthroughs in 2006
• Learned how to train deep neural networks
• Technically: RBMs, Stacked Auto-encoders, sparse coding
• Nice overview of LBH DL journey: http://chronicle.com/article/The-Believers/190147/
• Compute
• CPUs were 2^20s of times too slow
• Parallel processing/algorithms
• GPUs + OpenCL/CUDA
• FPGAs, custom hardware
• Datasets
• Massive data sets: Google, FB, Baidu, …
• Standardized data sets
• Alternate view of history?
• LBH Nature DL review: http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
• Jürgen Schmidhuber’s critique : http://people.idsia.ch/~juergen/deep-learning-conspiracy.html
• LBH rebuttal: http://recode.net/2015/07/15/ai-conspiracy-the-scientists-behind-deep-learning/
Image courtesy Yoshua Bengio

BTW, What Kinds of Custom H/W?

Ok, We Know That Machines Are
Getting Smarter, But Where Does
Knowledge Come From?
Evolution Experience
Culture Machines
Many orders of magnitude
faster and larger
So how can machines
discover new knowledge?

How Can Machines Discover New Knowledge?
• Fill the gaps in existing knowledge
– Symbolists
– Technology: Induction/Inverse Deduction
• Emulate the brain
– Connectionists
– Technology: Deep neural nets
• Emulate evolution
– Evolutionaries
– Technology: Genetic Algorithms
• Systematically reduce uncertainty
– Bayesians
– Technology: Bayesian Inference
• - Notice similarities between old and new
– Analogizers
– Technology: Kernel machines/Support Vector Machines
These correspond to the 5 major
schools of thought in machine
learning
https://en.wikipedia.org/wiki/The_Master_Algorithm

Integrated Approaches
Bringing Rigor to ML for Networking
• Clustering
– Categorical and continuous (e.g., LDA, K-means, …)
– Most “Machine Learning” systems you see today
– Anomaly detection
• Deep Neural Networks
– FF/Recurrent/memory nets (LSTMs, NTMs, ...)
– Time series/long range dependencies
– Understand sequences such as network flows
• Reinforcement Learning
– Give Machine Learning agency
• learn feedback control of actions
– Non-stationary distributions
– Deep neural networks (value/policy networks)
– Understand/react in adversarial environments
• Standardized (and public) data sets
– Required to evaluate techniques
– Move the field forward
• e.g. MNIST , ImageNet, …

So What Kinds of Use Cases Are We
Working On?
• Security/Anomaly Detection
• Site Reliability Engineering
• NFV orchestration and optimization
• New automation tools for DevOps
• Predicting and remediating problems mobile networks
• Network control plane optimization
• Network Gamification
• ...
Importantly, we can use deep
neural networks to capture
intuition from experts in these
problem domains

Example: Using Flow Data for Anomaly Detection
Generalization Graph
Radial Nested Block State Model with Edge Bundling
Generalized Anomaly Detection Setting
DNS tunneling
Linear Decision Boundary

Another Visualization
(same data)

What is Really Happening Here?
Deep Nets disentangle the underlying explanatory factors in the data
so as to make them linearly separable
Graphic courtesy Christopher Olah
Target function represented by input data is some twisted up manifold

Reinforcement learning meets Deep Learning and Monte Carlo Tree Search
What is All the Excitement About, Redux

Reinforcement Learning?
Reinforcement Learning Setup Reinforcement Learning Example

One of the many ”AlphaGo Breakthroughs”
http://www.1-4-5.net/~dmm/ml/log_derivative_trick.pdf

Training the SL Policy Network
Deep Convolution Neural Network Captures the Intuition of Expert Go Players

BTW, Why Is Go So Hard?
• Game Tree Complexity = bd
• Brute Force Search Intractable
– Search space is huge (10^721)
– Difficult to evaluate who is winning
• Can we characterize network state and action spaces?
– And why is this important?

Presentation Layer
Domain KnowledgeDomain KnowledgeDomain KnowledgeDomain Knowledge
Data
Collection
Packet brokers, flow data, …
Preprocessing
Big Data, Hadoop, Data
Science, …
Learning Inference
Business
Logic
Learning
Analytics Platform
Typical Workflow Schematic
Intelligence
Remediation/Optimization/…
Topology, Anomaly Detection,
Root Cause Analysis,
Predictive Insight, ….
Intent
Importantly, control is not learned

Static vs. Reinforcement Learning Architectures
• Today’s Static ML Architectures
• ML doesn’t have “agency”
• Hard-coded or open loop control
• Doesn’t learn control
• Assumes stationary DGD
• Largely off-line
• Reinforcement Learning Architecture
• Agent architecture
• Agent learns control/action selection
• Adapts to evolving environment
• Non-stationary distributions
• Network gamification

Network Gamification?
• Gamification is the application of game theoretic
approaches and design techniques to what have
traditionally been non-game problems, including
business, operations and social impact challenges
– Online learning/optimization
– Security
– Automation
– Almost any application that interacts with its environment
• Idea: Envision network and other automation tasks as
2-player games, and use Deep Learning, Reinforcement
Learning and Monte Carlo Tree Search to learn optimal
control in an online setting

Summary: Why Is AlphaGo Important/Relevant?
• Security: Agent can learn dynamic/evolving behavior of adversary
• DevOPs: Agent can learn workflow automation (e.g. Openstack Mistral/StackStorm)
• Orchestration: Agent can learn dynamic behavior of VNFs and system as a whole
• General: Deep learning can capture human intuition

Example: Gamifying Workflows
State Transitions
𝜋 𝑠 = 𝑎
𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)
• Workflows can be learned/optimized
• Model as a POMDP: <S,A,T,R,Ω,O,𝛾>
• Estimate with MDP: <S,A,T,R,𝛾>
• Model free
• Deep Value (Q) and Policy (𝜋) networks
https://github.com/StackStorm/st2/blob/master/contrib/examples/actions/chains/echochain.yaml
https://github.com/StackStorm
https://gym.openai.com/
Boolean state test/actions

Interested in Reinforcement Learning?

One of the Many Important Things
OpenAI is Doing

Possible Areas of Collaboration
• Collaborative work on Machine Learning
– Orchestration
– Anomaly Detection
– Optimization
– Other use cases
– Design of new learning approaches
• Data sets
• Prototype implementations
• Brocade Funding of Events
• Internships and other forms of student funding
• Others

Visualizing & Classifying Flow Behavior
• Each “row” in the image is one time step of the flow (1 flow record)
• H flow records (in timestamp order)/flow (zero padded)
• Each flow record has W fields  image size: W x H x D
• D = 4 (4 channels (RGBA), 1/octect in the IP address)
• Column consist of the fields in the flow record
• e.g., source IP, source port, dest IP, bytes out, …
• Image is used to train a Convolutional Neural Net to recognize
anomolous flows
End to end flow encoded as an image
W
H
D
Finally…Explore Wacky Ideas

Recent Advances in Machine Learning: Bringing a New Level of Intelligence to Network Reliability and Optimization

More Related Content

Recent Advances in Machine Learning: Bringing a New Level of Intelligence to Network Reliability and Optimization