SlideShare a Scribd company logo
Keras 2
“You have just found Keras”
Felipe Almeida
Rio Machine Learning Meetup / June 2017
First Steps
1
Content
● Intro
● Neural Networks
● Keras
● Examples
● Keras concepts
● Resources
2
Intro
● Neural nets are versatile, but there was a need for a simple
framework to design + experiment with them.
● Neural nets (particularly with multiple layers) need a lot of time to
be trained
● Recent advances in algorithms (Layerwise-training, contrastive
divergence, etc) and in hardware (leveraging GPUs for tensor
operations), as well as the massive amounts of available data have
made deep learning popular
3
Neural Networks
● Generally speaking, neural networks are nonlinear machine
learning models.
● They can be used for supervised or unsupervised learning.
● Deep learning refers to training neural nets with multiple layers.
○ They are more powerful but only if you have lots of data to train
them on.
● Keras is used to create neural network models
4
Neural Networks - Sample Architectures
Source:
neuralnetworksanddeeplearning.com 5
Source:
neuralnetworksanddeeplearning.com 6
Neural Networks - Sample Architectures
Source:
neuralnetworksanddeeplearning.com 7
Neural Networks - Sample Architectures
Source:
neuralnetworksanddeeplearning.com 8
Neural Networks - Sample Architectures
Source: University of Bonn
9
Neural Networks - Sample Architectures
Source: AI GitBook
10
Neural Networks - Sample Architectures
Keras
● Models created by Keras can be executed on a backend:
○ Tensorflow (default)
○ Theano
○ CNTK (Beta)
○ MxNet (Beta)
● Keras has builtin GPU support with CUDA
○ CUDA is a framework for using the GPU on Nvidia video cards
for mathematical (tensor) operations
11
Keras
● Keras is the de facto deep learning frontendSource:@fchollet,Jun32017
12
Keras
● Keras is among the libraries supported by Apple’s CoreML
Source: @fchollet, Jun 5 2017
13
Example #1
● The MNIST dataset contains 60,000 labelled handwritten digits (for
training) and 10,000 for testing.
14
Example #1
● We can train a neural net to classify a digit’s pixels into one of the
10 digit classes:
NOTEBOOK - MNIST MLP
15
Example #2
● The MNIST dataset can also be trained using multi-layer,
convolutional neural networks (CNNs).
○ The results with a regular NN are already good, but it’s good to
show how to train a CNN
● NOTEBOOK - MNIST CNN
16
Example #2 - What are CNNs
● While the model is being trained, let’s understand what a CNN
looks like and what it’s good for.
● CNNs use convolutional operations to extract features that are
position invariant.
○ In other words, they make it possible to train models that detect
features no matter what position they are in the input samples
17
Example #2 - What are CNNs
● For this reason, they are often used for image classification:
18
Example #3
● CNNs can also be used for text classification
○ In fact, they produce state-of-the-art results in tasks such as:
■ Text classification
■ Sentiment analysis
● Let’s train a CNN model to classify documents in the
newsgroup_20 dataset
● NOTEBOOK IMDB CNN
19
Keras: Models
● The most important part of keras are models.
● Model = layers, loss and an optimizer
● These are the objects that you add Layers to, call compile() and
fit() on.
● Models can be saved and checkpointed for later use
20
Keras: Layers
● Layers are used to define what your architecture looks like
● Examples of layers are:
○ Dense layers (this is the normal, fully-connected layer)
○ Convolutional layers (applies convolution operations on the
previous layer)
○ Pooling layers (used after convolutional layers)
○ Dropout layers (these are used for regularization, to avoid
overfitting)
21
Keras: Loss Functions
● Loss functions are used to compare the network’s predicted output
with the real output, in each pass of the backpropagations
algorithm
○ Loss functions are used to tell the model how the weights
should be updated
● Common loss functions are:
○ Mean squared error
○ Cross-entropy
○ etc.
22
Keras: Optimizers
● Optimizers are strategies used to update the network’s weights in
the backpropagation algorithm.
● The most simple optimizer is the Stochastic Gradient Descent
Algorithm (SGD), but there are many other you can choose, such
as:
○ RMSProp
○ Adagrad
23
Keras: Optimizers
● Most optimizers can be tuned using hyperparameters, such as:
○ The learning rate to use
○ Whether or not to use momentum
24
Keras: CPU / GPU
● If your computer has a good graphics card, it can be used to speed
up model training
● All models up to now were trained using the GPU.
○ Let’s see what happens if we disable to the GPU, and force
keras to use the CPU instead.
25
Keras: Other information
● Feature preprocessing
○ Although you can use any other method for feature
preprocessing, keras has a couple of utilities to help, such as:
■ To_categorical (to one-hot encode data)
■ Text preprocessing utilities, such as tokenizing
26
Keras: Other information
● You can integrate Keras models into a Scikit-learn Pipeline.
○ There are special wrapper functions available on Keras to help
you implement the methods that are expected by a scikit-learn
classifier, such as fit(), predict(), predict_proba(),
etc.
○ You can also use things like scikit-learn’s grid_search, to do
model selection on Keras models, to decide what are the best
hyperparameters for a given task.
27
Keras: Other information
● Nearly everything in Keras can be regularized. In addition to the
Dropout layer, there are all sorts of other regularizers available,
such as:
○ Weight regularizers
○ Bias regularizers
○ Activity regularizers
28
Resources
● Keras Cheat Sheet by DataCamp
29

More Related Content

First steps with Keras 2: A tutorial with Examples

  • 1. Keras 2 “You have just found Keras” Felipe Almeida Rio Machine Learning Meetup / June 2017 First Steps 1
  • 2. Content ● Intro ● Neural Networks ● Keras ● Examples ● Keras concepts ● Resources 2
  • 3. Intro ● Neural nets are versatile, but there was a need for a simple framework to design + experiment with them. ● Neural nets (particularly with multiple layers) need a lot of time to be trained ● Recent advances in algorithms (Layerwise-training, contrastive divergence, etc) and in hardware (leveraging GPUs for tensor operations), as well as the massive amounts of available data have made deep learning popular 3
  • 4. Neural Networks ● Generally speaking, neural networks are nonlinear machine learning models. ● They can be used for supervised or unsupervised learning. ● Deep learning refers to training neural nets with multiple layers. ○ They are more powerful but only if you have lots of data to train them on. ● Keras is used to create neural network models 4
  • 5. Neural Networks - Sample Architectures Source: neuralnetworksanddeeplearning.com 5
  • 9. Source: University of Bonn 9 Neural Networks - Sample Architectures
  • 10. Source: AI GitBook 10 Neural Networks - Sample Architectures
  • 11. Keras ● Models created by Keras can be executed on a backend: ○ Tensorflow (default) ○ Theano ○ CNTK (Beta) ○ MxNet (Beta) ● Keras has builtin GPU support with CUDA ○ CUDA is a framework for using the GPU on Nvidia video cards for mathematical (tensor) operations 11
  • 12. Keras ● Keras is the de facto deep learning frontendSource:@fchollet,Jun32017 12
  • 13. Keras ● Keras is among the libraries supported by Apple’s CoreML Source: @fchollet, Jun 5 2017 13
  • 14. Example #1 ● The MNIST dataset contains 60,000 labelled handwritten digits (for training) and 10,000 for testing. 14
  • 15. Example #1 ● We can train a neural net to classify a digit’s pixels into one of the 10 digit classes: NOTEBOOK - MNIST MLP 15
  • 16. Example #2 ● The MNIST dataset can also be trained using multi-layer, convolutional neural networks (CNNs). ○ The results with a regular NN are already good, but it’s good to show how to train a CNN ● NOTEBOOK - MNIST CNN 16
  • 17. Example #2 - What are CNNs ● While the model is being trained, let’s understand what a CNN looks like and what it’s good for. ● CNNs use convolutional operations to extract features that are position invariant. ○ In other words, they make it possible to train models that detect features no matter what position they are in the input samples 17
  • 18. Example #2 - What are CNNs ● For this reason, they are often used for image classification: 18
  • 19. Example #3 ● CNNs can also be used for text classification ○ In fact, they produce state-of-the-art results in tasks such as: ■ Text classification ■ Sentiment analysis ● Let’s train a CNN model to classify documents in the newsgroup_20 dataset ● NOTEBOOK IMDB CNN 19
  • 20. Keras: Models ● The most important part of keras are models. ● Model = layers, loss and an optimizer ● These are the objects that you add Layers to, call compile() and fit() on. ● Models can be saved and checkpointed for later use 20
  • 21. Keras: Layers ● Layers are used to define what your architecture looks like ● Examples of layers are: ○ Dense layers (this is the normal, fully-connected layer) ○ Convolutional layers (applies convolution operations on the previous layer) ○ Pooling layers (used after convolutional layers) ○ Dropout layers (these are used for regularization, to avoid overfitting) 21
  • 22. Keras: Loss Functions ● Loss functions are used to compare the network’s predicted output with the real output, in each pass of the backpropagations algorithm ○ Loss functions are used to tell the model how the weights should be updated ● Common loss functions are: ○ Mean squared error ○ Cross-entropy ○ etc. 22
  • 23. Keras: Optimizers ● Optimizers are strategies used to update the network’s weights in the backpropagation algorithm. ● The most simple optimizer is the Stochastic Gradient Descent Algorithm (SGD), but there are many other you can choose, such as: ○ RMSProp ○ Adagrad 23
  • 24. Keras: Optimizers ● Most optimizers can be tuned using hyperparameters, such as: ○ The learning rate to use ○ Whether or not to use momentum 24
  • 25. Keras: CPU / GPU ● If your computer has a good graphics card, it can be used to speed up model training ● All models up to now were trained using the GPU. ○ Let’s see what happens if we disable to the GPU, and force keras to use the CPU instead. 25
  • 26. Keras: Other information ● Feature preprocessing ○ Although you can use any other method for feature preprocessing, keras has a couple of utilities to help, such as: ■ To_categorical (to one-hot encode data) ■ Text preprocessing utilities, such as tokenizing 26
  • 27. Keras: Other information ● You can integrate Keras models into a Scikit-learn Pipeline. ○ There are special wrapper functions available on Keras to help you implement the methods that are expected by a scikit-learn classifier, such as fit(), predict(), predict_proba(), etc. ○ You can also use things like scikit-learn’s grid_search, to do model selection on Keras models, to decide what are the best hyperparameters for a given task. 27
  • 28. Keras: Other information ● Nearly everything in Keras can be regularized. In addition to the Dropout layer, there are all sorts of other regularizers available, such as: ○ Weight regularizers ○ Bias regularizers ○ Activity regularizers 28
  • 29. Resources ● Keras Cheat Sheet by DataCamp 29