Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated

Chris Fregly
Developer Advocate
AI and Machine Learning
@AWS
Smokey and the Multi-Armed Bandit
Featuring BERT Reynolds

Abstract
First, I will train and deploy multiple natural language understanding
(NLU) models and compare them in live production using reinforcement
learning to dynamically shift traffic to the winning model.
Second, I will describe the differences between A/B and multi-armed
bandit tests including exploration-exploitation, reward-maximization, and
regret-minimization.
Third, I will dive deep into the details of building and scaling a multi-
armed bandit deployment on AWS using a real-time, stream-based text
classifier with TensorFlow, PyTorch, and BERT on 150+ million reviews
from the Amazon Customer Reviews Dataset.

Me Developer Advocate
AI and Machine Learning @ AWS
(Based in San Francisco)
Co-Author of the O'Reilly Book,
"Data Science on AWS."
Founder of the Advanced
Kubeflow Meetup (Global)
https://www.datascienceonaws.com
data-science-on-aws
@cfregly
linkedin.com/in/cfregly
https://meetup.com/Data-Science-on-AWS

Data Science on AWS – Book and Workshop Outline
https://www.datascienceonaws.com/

Agenda
• Compare A/B Tests vs. Multi-Armed Bandit Tests
• Optimize Bandits with Reinforcement Learning
• Train 2 BERT Languge Models with TensorFlow
• Train a Multi-Armed Bandit Model with Vowpal Wabbit
• Test 2 BERT Models with a Bandit
• DEMO: Scale Multi-Armed Bandits on AWS

Traditional A/B Tests
• Static
• Cannot Add New Models After Test Begins
• Static Traffic Split Between Models A and B
• May Negatively Impact Business Metrics
• Must Run Experiment to Completion
• No Concept of Reward for Winning Model

Multi-Armed Bandit Tests
• Add New Models
• Dynamically Shift Traffic
• Explore-Exploit Strategy
• Finish Experiment Early - or Run Longer!
• Minimize Regret (Business Impact)
• Maximize Reward

Train 2 BERT Models with TensorFlow (Models A & B)
• BERT Mania!
• Fine-Tuning BERT

Train a Bandit Model with Reinforcement Learning (RL)
• Popular Reinforcement Learning Strategies
• Epsilon Greedy
• Thompson’s Sampling
• Online Cover
• Bagging
• Implemented in Vowpal Wabbit (VW)!
• Try Our Open Source RL Containers
• https://github.com/aws/sagemaker-rl-container

Test 2 BERT Models with a Multi-Armed Bandit Model

DEMO: Scale Multi-Armed Bandits on AWS

DEMO: Scale Multi-Armed Bandits on AWS
• BERT Model 1: TensorFlow
• BERT Model 2: PyTorch

More Resources
• O’Reilly Book - Data Science on AWS – Early Release Available!
• https://datascienceonaws.com
• GitHub Repo
• https://github.com/data-science-on-aws/workshop
• AWS Blog Post on Multi-Armed Bandits
• https://aws.amazon.com/blogs/machine-learning/power-contextual-bandits-using-continual-learning-
with-amazon-sagemaker-rl/
• Bandit Algorithms
• https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-algorithms
• Open Source SageMaker Reinforcement Learning Containers
• https://github.com/aws/sagemaker-rl-container

Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated

More Related Content

Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated