SlideShare a Scribd company logo
Chris Fregly
Developer Advocate
AI and Machine Learning
@AWS
Smokey and the Multi-Armed Bandit
Featuring BERT Reynolds
Abstract
First, I will train and deploy multiple natural language understanding
(NLU) models and compare them in live production using reinforcement
learning to dynamically shift traffic to the winning model.
Second, I will describe the differences between A/B and multi-armed
bandit tests including exploration-exploitation, reward-maximization, and
regret-minimization.
Third, I will dive deep into the details of building and scaling a multi-
armed bandit deployment on AWS using a real-time, stream-based text
classifier with TensorFlow, PyTorch, and BERT on 150+ million reviews
from the Amazon Customer Reviews Dataset.
Me Developer Advocate
AI and Machine Learning @ AWS
(Based in San Francisco)
Co-Author of the O'Reilly Book,
"Data Science on AWS."
Founder of the Advanced
Kubeflow Meetup (Global)
https://www.datascienceonaws.com
data-science-on-aws
@cfregly
linkedin.com/in/cfregly
https://meetup.com/Data-Science-on-AWS
Data Science on AWS – Book and Workshop Outline
https://www.datascienceonaws.com/
Agenda
• Compare A/B Tests vs. Multi-Armed Bandit Tests
• Optimize Bandits with Reinforcement Learning
• Train 2 BERT Languge Models with TensorFlow
• Train a Multi-Armed Bandit Model with Vowpal Wabbit
• Test 2 BERT Models with a Bandit
• DEMO: Scale Multi-Armed Bandits on AWS
Traditional A/B Tests
• Static
• Cannot Add New Models After Test Begins
• Static Traffic Split Between Models A and B
• May Negatively Impact Business Metrics
• Must Run Experiment to Completion
• No Concept of Reward for Winning Model
Multi-Armed Bandit Tests
• Add New Models
• Dynamically Shift Traffic
• Explore-Exploit Strategy
• Finish Experiment Early - or Run Longer!
• Minimize Regret (Business Impact)
• Maximize Reward
Train 2 BERT Models with TensorFlow (Models A & B)
• BERT Mania!
• Fine-Tuning BERT
Train a Bandit Model with Reinforcement Learning (RL)
• Popular Reinforcement Learning Strategies
• Epsilon Greedy
• Thompson’s Sampling
• Online Cover
• Bagging
• Implemented in Vowpal Wabbit (VW)!
• Try Our Open Source RL Containers
• https://github.com/aws/sagemaker-rl-container
Test 2 BERT Models with a Multi-Armed Bandit Model
DEMO: Scale Multi-Armed Bandits on AWS
DEMO: Scale Multi-Armed Bandits on AWS
• BERT Model 1: TensorFlow
• BERT Model 2: PyTorch
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEMO!
More Resources
• O’Reilly Book - Data Science on AWS – Early Release Available!
• https://datascienceonaws.com
• GitHub Repo
• https://github.com/data-science-on-aws/workshop
• AWS Blog Post on Multi-Armed Bandits
• https://aws.amazon.com/blogs/machine-learning/power-contextual-bandits-using-continual-learning-
with-amazon-sagemaker-rl/
• Bandit Algorithms
• https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-algorithms
• Open Source SageMaker Reinforcement Learning Containers
• https://github.com/aws/sagemaker-rl-container
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chris Fregly
data-science-on-aws
@cfregly
linkedin.com/in/cfregly

More Related Content

Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated

  • 1. Chris Fregly Developer Advocate AI and Machine Learning @AWS Smokey and the Multi-Armed Bandit Featuring BERT Reynolds
  • 2. Abstract First, I will train and deploy multiple natural language understanding (NLU) models and compare them in live production using reinforcement learning to dynamically shift traffic to the winning model. Second, I will describe the differences between A/B and multi-armed bandit tests including exploration-exploitation, reward-maximization, and regret-minimization. Third, I will dive deep into the details of building and scaling a multi- armed bandit deployment on AWS using a real-time, stream-based text classifier with TensorFlow, PyTorch, and BERT on 150+ million reviews from the Amazon Customer Reviews Dataset.
  • 3. Me Developer Advocate AI and Machine Learning @ AWS (Based in San Francisco) Co-Author of the O'Reilly Book, "Data Science on AWS." Founder of the Advanced Kubeflow Meetup (Global) https://www.datascienceonaws.com data-science-on-aws @cfregly linkedin.com/in/cfregly https://meetup.com/Data-Science-on-AWS
  • 4. Data Science on AWS – Book and Workshop Outline https://www.datascienceonaws.com/
  • 5. Agenda • Compare A/B Tests vs. Multi-Armed Bandit Tests • Optimize Bandits with Reinforcement Learning • Train 2 BERT Languge Models with TensorFlow • Train a Multi-Armed Bandit Model with Vowpal Wabbit • Test 2 BERT Models with a Bandit • DEMO: Scale Multi-Armed Bandits on AWS
  • 6. Traditional A/B Tests • Static • Cannot Add New Models After Test Begins • Static Traffic Split Between Models A and B • May Negatively Impact Business Metrics • Must Run Experiment to Completion • No Concept of Reward for Winning Model
  • 7. Multi-Armed Bandit Tests • Add New Models • Dynamically Shift Traffic • Explore-Exploit Strategy • Finish Experiment Early - or Run Longer! • Minimize Regret (Business Impact) • Maximize Reward
  • 8. Train 2 BERT Models with TensorFlow (Models A & B) • BERT Mania! • Fine-Tuning BERT
  • 9. Train a Bandit Model with Reinforcement Learning (RL) • Popular Reinforcement Learning Strategies • Epsilon Greedy • Thompson’s Sampling • Online Cover • Bagging • Implemented in Vowpal Wabbit (VW)! • Try Our Open Source RL Containers • https://github.com/aws/sagemaker-rl-container
  • 10. Test 2 BERT Models with a Multi-Armed Bandit Model
  • 11. DEMO: Scale Multi-Armed Bandits on AWS
  • 12. DEMO: Scale Multi-Armed Bandits on AWS • BERT Model 1: TensorFlow • BERT Model 2: PyTorch
  • 13. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. DEMO!
  • 14. More Resources • O’Reilly Book - Data Science on AWS – Early Release Available! • https://datascienceonaws.com • GitHub Repo • https://github.com/data-science-on-aws/workshop • AWS Blog Post on Multi-Armed Bandits • https://aws.amazon.com/blogs/machine-learning/power-contextual-bandits-using-continual-learning- with-amazon-sagemaker-rl/ • Bandit Algorithms • https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-algorithms • Open Source SageMaker Reinforcement Learning Containers • https://github.com/aws/sagemaker-rl-container
  • 15. Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chris Fregly data-science-on-aws @cfregly linkedin.com/in/cfregly