First steps with Keras 2: A tutorial with Examples
- 1. Keras 2
“You have just found Keras”
Felipe Almeida
Rio Machine Learning Meetup / June 2017
First Steps
1
- 3. Intro
● Neural nets are versatile, but there was a need for a simple
framework to design + experiment with them.
● Neural nets (particularly with multiple layers) need a lot of time to
be trained
● Recent advances in algorithms (Layerwise-training, contrastive
divergence, etc) and in hardware (leveraging GPUs for tensor
operations), as well as the massive amounts of available data have
made deep learning popular
3
- 4. Neural Networks
● Generally speaking, neural networks are nonlinear machine
learning models.
● They can be used for supervised or unsupervised learning.
● Deep learning refers to training neural nets with multiple layers.
○ They are more powerful but only if you have lots of data to train
them on.
● Keras is used to create neural network models
4
- 5. Neural Networks - Sample Architectures
Source:
neuralnetworksanddeeplearning.com 5
- 11. Keras
● Models created by Keras can be executed on a backend:
○ Tensorflow (default)
○ Theano
○ CNTK (Beta)
○ MxNet (Beta)
● Keras has builtin GPU support with CUDA
○ CUDA is a framework for using the GPU on Nvidia video cards
for mathematical (tensor) operations
11
- 12. Keras
● Keras is the de facto deep learning frontendSource:@fchollet,Jun32017
12
- 13. Keras
● Keras is among the libraries supported by Apple’s CoreML
Source: @fchollet, Jun 5 2017
13
- 14. Example #1
● The MNIST dataset contains 60,000 labelled handwritten digits (for
training) and 10,000 for testing.
14
- 15. Example #1
● We can train a neural net to classify a digit’s pixels into one of the
10 digit classes:
NOTEBOOK - MNIST MLP
15
- 16. Example #2
● The MNIST dataset can also be trained using multi-layer,
convolutional neural networks (CNNs).
○ The results with a regular NN are already good, but it’s good to
show how to train a CNN
● NOTEBOOK - MNIST CNN
16
- 17. Example #2 - What are CNNs
● While the model is being trained, let’s understand what a CNN
looks like and what it’s good for.
● CNNs use convolutional operations to extract features that are
position invariant.
○ In other words, they make it possible to train models that detect
features no matter what position they are in the input samples
17
- 18. Example #2 - What are CNNs
● For this reason, they are often used for image classification:
18
- 19. Example #3
● CNNs can also be used for text classification
○ In fact, they produce state-of-the-art results in tasks such as:
■ Text classification
■ Sentiment analysis
● Let’s train a CNN model to classify documents in the
newsgroup_20 dataset
● NOTEBOOK IMDB CNN
19
- 20. Keras: Models
● The most important part of keras are models.
● Model = layers, loss and an optimizer
● These are the objects that you add Layers to, call compile() and
fit() on.
● Models can be saved and checkpointed for later use
20
- 21. Keras: Layers
● Layers are used to define what your architecture looks like
● Examples of layers are:
○ Dense layers (this is the normal, fully-connected layer)
○ Convolutional layers (applies convolution operations on the
previous layer)
○ Pooling layers (used after convolutional layers)
○ Dropout layers (these are used for regularization, to avoid
overfitting)
21
- 22. Keras: Loss Functions
● Loss functions are used to compare the network’s predicted output
with the real output, in each pass of the backpropagations
algorithm
○ Loss functions are used to tell the model how the weights
should be updated
● Common loss functions are:
○ Mean squared error
○ Cross-entropy
○ etc.
22
- 23. Keras: Optimizers
● Optimizers are strategies used to update the network’s weights in
the backpropagation algorithm.
● The most simple optimizer is the Stochastic Gradient Descent
Algorithm (SGD), but there are many other you can choose, such
as:
○ RMSProp
○ Adagrad
23
- 24. Keras: Optimizers
● Most optimizers can be tuned using hyperparameters, such as:
○ The learning rate to use
○ Whether or not to use momentum
24
- 25. Keras: CPU / GPU
● If your computer has a good graphics card, it can be used to speed
up model training
● All models up to now were trained using the GPU.
○ Let’s see what happens if we disable to the GPU, and force
keras to use the CPU instead.
25
- 26. Keras: Other information
● Feature preprocessing
○ Although you can use any other method for feature
preprocessing, keras has a couple of utilities to help, such as:
■ To_categorical (to one-hot encode data)
■ Text preprocessing utilities, such as tokenizing
26
- 27. Keras: Other information
● You can integrate Keras models into a Scikit-learn Pipeline.
○ There are special wrapper functions available on Keras to help
you implement the methods that are expected by a scikit-learn
classifier, such as fit(), predict(), predict_proba(),
etc.
○ You can also use things like scikit-learn’s grid_search, to do
model selection on Keras models, to decide what are the best
hyperparameters for a given task.
27
- 28. Keras: Other information
● Nearly everything in Keras can be regularized. In addition to the
Dropout layer, there are all sorts of other regularizers available,
such as:
○ Weight regularizers
○ Bias regularizers
○ Activity regularizers
28