SlideShare a Scribd company logo
DEEP IMAGE
PROCESSING
Machine Learning
Data Science
Meetup
17/09/2018, LUISS ENLABS
ABOUT ME…
Just a brief introduction
Mirko Lucchese
MSc Computer Science
PhD Applied Mathematics
Several years worked as Data
Scientist in different companies
now...
Manager in Accenture
Artificial Intelligence CoE
Introduction on Deep Learning and
Image Processing/Computer Vision01
02
03
AGENDA
Tools and Algorithms
Some Real Life Example
INTRODUCTION
Deep Learning and Image Processing / Computer Vision
DEEP
LEARNING
COMPUTER
VISION
OBJECT
DETECTION
CLASSIFICATION
SEGMENTATION
FACE DETECTION
ADVANCED FILTERING
(IMAGE GENERATION)
+CNN
AE
VAE
GAN
RESIDUAL
PROBLEMS
TOOLS
Introduction on Deep Learning and
Image Processing/Computer Vision01
02
03
AGENDA
Tools and Algorithms
Some Real Life Example
DEEP LEARNING TOOLS
Pills to be used later
+
CNN
RESIDUAL
AE
VAE
GAN
Their response is computed on overlapping
parts of the input.
By construction the network learns the filters
to be applied to input signal that in traditional
algorithms were hand-engineered.
Biological counterpart: Animal Visual Cortex
They implement alternative paths (shortcus)
to skip layers (or subnets) if needed.
By construction the network learns when to
skip layers in allowing to build networks by
increasing depth with a monotone training
error decreasing
Biological counterpart: pyramidal cells
They learn an efficient representation of a set
of data
They are built with 2 subnets:
- Encoder
- Decoder
The output code from encoder is the data
representation we search (latent variables).
In VAE the lantent code is assumed to belong
to a given statistical distribution.
This can help us to use the decoder as image
generation algorithm.
GAN is another generative algorithm. It is
composed by two parts to be trained
"separatedly"
- Generator
- Discriminator
IMAGE CLASSIFICATION
SUBTITLE
LeNet AlexNet VGG Inception V2
20121998
Top 5 error 5.6%
+
2014 2015
Top 5 error
8%
Top 5 error
from 26% to 15%
FACE DETECTION/RECOGNITION
Face Detection done with HOG
Break up the image into small squares of 16x16 pixels each.
In each square, we’ll count up how many gradients point in
each major direction .
Then we’ll replace that square in the image with the arrow
directions that were the strongest.
Cmopute gradient for each pixel
To find faces in this HOG image, all we have to do is find the
part of our image that looks the most similar to a known HOG
pattern that was extracted from a bunch of other training faces
FACE DETECTION/RECOGNITION
Face Landmark Estimation
Compute the specific points (called landmarks) that exist on every face.
We can train a machine learning algorithm to be able to find these
specific points on any face.
Dataset can be used with different number of landmarks:
- Kaggle (https://www.kaggle.com/drgilermo/face-images-
with-marked-landmark-points)
- Helen dataset
(http://www.ifp.illinois.edu/~vuongle2/helen/)
- …
FACE DETECTION/RECOGNITION
Encoding Faces
Train a deep CNN to generate 128 measurements for each face.
The training process works by looking at 3 face images at a time:
1. Load a training face image of a known person
2. Load another picture of the same known person
3. Load a picture of a totally different person
Then the algorithm looks at the measurements it is currently generating for
each of those three images. It then tweaks the neural network slightly so that it
makes sure the measurements it generates for #1 and #2 are slightly closer
while making sure the measurements for #2 and #3 are slightly further apart
We are not forced to do the train, we can use pre-trained models provided by OpenFace
(https://cmusatyalab.github.io/openface/)
SEGMENTATION
SUBTITLE
Fully Convolutional Network
• E2E convolutional network
• In the final layers
• the depth is higher
• the size is smaller
SegNet
• encoder and decoder approach
• it is less intensive on memory
and many others: DeepLab, RefiNet, PSPNet, DeepLab v3, UNet, ...
OBJECTS DETECTION
SUBTITLE
R-CNN
Fast R-CNN
Faster R-CNN
YOLO
• feed the input image to the CNN to generate a convolutional feature map.
• From the convolutional feature map, identify the region of proposals through a
selective search algorithm and warp them into squares
• by using a RoI pooling layer we reshape them into a fixed size
• use a softmax layer to predict the class of the proposed region and also the
offset values for the bounding box.
• The image is provided as an input to a convolutional networkto compute the
convolutional feature map.
• A separate network is used to predict the region proposals.
• The predicted region proposals are then reshaped using a RoI pooling layer
which is then used to classify the image within the proposed region and
predict the offset values for the bounding boxes.
• split image into an SxS grid, within each of the grid we take m bounding boxes.
• For each of the bounding box, the network outputs a class probability and offset values for the
bounding box.
• The bounding boxes having the class probability above a threshold value is selected and used to
locate the object within the image.
• Extract region proposal (~2K)
• Compute region feaures with a CNN
• Use SVM to classify regions features
Introduction on Deep Learning and
Image Processing/Computer Vision01
02
03
AGENDA
Tools and Algorithms
Some Real Life Example
DOCUMENT AUTOMATION
Overview
Process Inbound documents to:
• Identify document’s type
• Extract relevant information
Example of Document types:
…
NLP
DOCUMENT AUTOMATION
Handwritten letters recognition through CNN
• Identify ROIs
• Extract Characters
• Thresholding and Morphological Operators help us to build boxes around each character
• Classify extracted characters
• LeNet trained on EMNIST
• We improved model by considering a deeper and wider model reaching accuracy of 82%
Accuracy on real data: 75%
DOCUMENT AUTOMATION
Graphical Element Analysis using Object Detection
We specialized TF object detector (…) to recognize graphical object
inside documents under analysis:
• Dataset Creation: LabelImg to create XML (PASCAL VOC) for
each image
https://github.com/tzutalin/labelImg
• Use of the training script provided with TF and adapted it to our
purposes
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md
• Updated the config file for the model (XML)
https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs
• We started the train from a particular checkpoint of the mpodel
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
DATAMATRIX - 97.5%
IDENTITY RECOGNITION
Face Detection to Recognize Identity
CBIR
• Recognize identity through face detection for event pass
• Collect images from users through a registration process to a Content Based Image Retrieval (CBIR)
• At the event, the system takes a picture of the user face and query to CBIR
for all similar faces. This is done using LSH algorithm
• The identity is then checked by a comparison between the «query» face and
the faces returned by CBIR
THANK YOU

More Related Content

Mirko Lucchese - Deep Image Processing

  • 1. DEEP IMAGE PROCESSING Machine Learning Data Science Meetup 17/09/2018, LUISS ENLABS
  • 2. ABOUT ME… Just a brief introduction Mirko Lucchese MSc Computer Science PhD Applied Mathematics Several years worked as Data Scientist in different companies now... Manager in Accenture Artificial Intelligence CoE
  • 3. Introduction on Deep Learning and Image Processing/Computer Vision01 02 03 AGENDA Tools and Algorithms Some Real Life Example
  • 4. INTRODUCTION Deep Learning and Image Processing / Computer Vision DEEP LEARNING COMPUTER VISION OBJECT DETECTION CLASSIFICATION SEGMENTATION FACE DETECTION ADVANCED FILTERING (IMAGE GENERATION) +CNN AE VAE GAN RESIDUAL PROBLEMS TOOLS
  • 5. Introduction on Deep Learning and Image Processing/Computer Vision01 02 03 AGENDA Tools and Algorithms Some Real Life Example
  • 6. DEEP LEARNING TOOLS Pills to be used later + CNN RESIDUAL AE VAE GAN Their response is computed on overlapping parts of the input. By construction the network learns the filters to be applied to input signal that in traditional algorithms were hand-engineered. Biological counterpart: Animal Visual Cortex They implement alternative paths (shortcus) to skip layers (or subnets) if needed. By construction the network learns when to skip layers in allowing to build networks by increasing depth with a monotone training error decreasing Biological counterpart: pyramidal cells They learn an efficient representation of a set of data They are built with 2 subnets: - Encoder - Decoder The output code from encoder is the data representation we search (latent variables). In VAE the lantent code is assumed to belong to a given statistical distribution. This can help us to use the decoder as image generation algorithm. GAN is another generative algorithm. It is composed by two parts to be trained "separatedly" - Generator - Discriminator
  • 7. IMAGE CLASSIFICATION SUBTITLE LeNet AlexNet VGG Inception V2 20121998 Top 5 error 5.6% + 2014 2015 Top 5 error 8% Top 5 error from 26% to 15%
  • 8. FACE DETECTION/RECOGNITION Face Detection done with HOG Break up the image into small squares of 16x16 pixels each. In each square, we’ll count up how many gradients point in each major direction . Then we’ll replace that square in the image with the arrow directions that were the strongest. Cmopute gradient for each pixel To find faces in this HOG image, all we have to do is find the part of our image that looks the most similar to a known HOG pattern that was extracted from a bunch of other training faces
  • 9. FACE DETECTION/RECOGNITION Face Landmark Estimation Compute the specific points (called landmarks) that exist on every face. We can train a machine learning algorithm to be able to find these specific points on any face. Dataset can be used with different number of landmarks: - Kaggle (https://www.kaggle.com/drgilermo/face-images- with-marked-landmark-points) - Helen dataset (http://www.ifp.illinois.edu/~vuongle2/helen/) - …
  • 10. FACE DETECTION/RECOGNITION Encoding Faces Train a deep CNN to generate 128 measurements for each face. The training process works by looking at 3 face images at a time: 1. Load a training face image of a known person 2. Load another picture of the same known person 3. Load a picture of a totally different person Then the algorithm looks at the measurements it is currently generating for each of those three images. It then tweaks the neural network slightly so that it makes sure the measurements it generates for #1 and #2 are slightly closer while making sure the measurements for #2 and #3 are slightly further apart We are not forced to do the train, we can use pre-trained models provided by OpenFace (https://cmusatyalab.github.io/openface/)
  • 11. SEGMENTATION SUBTITLE Fully Convolutional Network • E2E convolutional network • In the final layers • the depth is higher • the size is smaller SegNet • encoder and decoder approach • it is less intensive on memory and many others: DeepLab, RefiNet, PSPNet, DeepLab v3, UNet, ...
  • 12. OBJECTS DETECTION SUBTITLE R-CNN Fast R-CNN Faster R-CNN YOLO • feed the input image to the CNN to generate a convolutional feature map. • From the convolutional feature map, identify the region of proposals through a selective search algorithm and warp them into squares • by using a RoI pooling layer we reshape them into a fixed size • use a softmax layer to predict the class of the proposed region and also the offset values for the bounding box. • The image is provided as an input to a convolutional networkto compute the convolutional feature map. • A separate network is used to predict the region proposals. • The predicted region proposals are then reshaped using a RoI pooling layer which is then used to classify the image within the proposed region and predict the offset values for the bounding boxes. • split image into an SxS grid, within each of the grid we take m bounding boxes. • For each of the bounding box, the network outputs a class probability and offset values for the bounding box. • The bounding boxes having the class probability above a threshold value is selected and used to locate the object within the image. • Extract region proposal (~2K) • Compute region feaures with a CNN • Use SVM to classify regions features
  • 13. Introduction on Deep Learning and Image Processing/Computer Vision01 02 03 AGENDA Tools and Algorithms Some Real Life Example
  • 14. DOCUMENT AUTOMATION Overview Process Inbound documents to: • Identify document’s type • Extract relevant information Example of Document types: … NLP
  • 15. DOCUMENT AUTOMATION Handwritten letters recognition through CNN • Identify ROIs • Extract Characters • Thresholding and Morphological Operators help us to build boxes around each character • Classify extracted characters • LeNet trained on EMNIST • We improved model by considering a deeper and wider model reaching accuracy of 82% Accuracy on real data: 75%
  • 16. DOCUMENT AUTOMATION Graphical Element Analysis using Object Detection We specialized TF object detector (…) to recognize graphical object inside documents under analysis: • Dataset Creation: LabelImg to create XML (PASCAL VOC) for each image https://github.com/tzutalin/labelImg • Use of the training script provided with TF and adapted it to our purposes https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md • Updated the config file for the model (XML) https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs • We started the train from a particular checkpoint of the mpodel https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md DATAMATRIX - 97.5%
  • 17. IDENTITY RECOGNITION Face Detection to Recognize Identity CBIR • Recognize identity through face detection for event pass • Collect images from users through a registration process to a Content Based Image Retrieval (CBIR) • At the event, the system takes a picture of the user face and query to CBIR for all similar faces. This is done using LSH algorithm • The identity is then checked by a comparison between the «query» face and the faces returned by CBIR