Mirko Lucchese - Deep Image Processing
- 2. ABOUT ME…
Just a brief introduction
Mirko Lucchese
MSc Computer Science
PhD Applied Mathematics
Several years worked as Data
Scientist in different companies
now...
Manager in Accenture
Artificial Intelligence CoE
- 3. Introduction on Deep Learning and
Image Processing/Computer Vision01
02
03
AGENDA
Tools and Algorithms
Some Real Life Example
- 4. INTRODUCTION
Deep Learning and Image Processing / Computer Vision
DEEP
LEARNING
COMPUTER
VISION
OBJECT
DETECTION
CLASSIFICATION
SEGMENTATION
FACE DETECTION
ADVANCED FILTERING
(IMAGE GENERATION)
+CNN
AE
VAE
GAN
RESIDUAL
PROBLEMS
TOOLS
- 5. Introduction on Deep Learning and
Image Processing/Computer Vision01
02
03
AGENDA
Tools and Algorithms
Some Real Life Example
- 6. DEEP LEARNING TOOLS
Pills to be used later
+
CNN
RESIDUAL
AE
VAE
GAN
Their response is computed on overlapping
parts of the input.
By construction the network learns the filters
to be applied to input signal that in traditional
algorithms were hand-engineered.
Biological counterpart: Animal Visual Cortex
They implement alternative paths (shortcus)
to skip layers (or subnets) if needed.
By construction the network learns when to
skip layers in allowing to build networks by
increasing depth with a monotone training
error decreasing
Biological counterpart: pyramidal cells
They learn an efficient representation of a set
of data
They are built with 2 subnets:
- Encoder
- Decoder
The output code from encoder is the data
representation we search (latent variables).
In VAE the lantent code is assumed to belong
to a given statistical distribution.
This can help us to use the decoder as image
generation algorithm.
GAN is another generative algorithm. It is
composed by two parts to be trained
"separatedly"
- Generator
- Discriminator
- 8. FACE DETECTION/RECOGNITION
Face Detection done with HOG
Break up the image into small squares of 16x16 pixels each.
In each square, we’ll count up how many gradients point in
each major direction .
Then we’ll replace that square in the image with the arrow
directions that were the strongest.
Cmopute gradient for each pixel
To find faces in this HOG image, all we have to do is find the
part of our image that looks the most similar to a known HOG
pattern that was extracted from a bunch of other training faces
- 9. FACE DETECTION/RECOGNITION
Face Landmark Estimation
Compute the specific points (called landmarks) that exist on every face.
We can train a machine learning algorithm to be able to find these
specific points on any face.
Dataset can be used with different number of landmarks:
- Kaggle (https://www.kaggle.com/drgilermo/face-images-
with-marked-landmark-points)
- Helen dataset
(http://www.ifp.illinois.edu/~vuongle2/helen/)
- …
- 10. FACE DETECTION/RECOGNITION
Encoding Faces
Train a deep CNN to generate 128 measurements for each face.
The training process works by looking at 3 face images at a time:
1. Load a training face image of a known person
2. Load another picture of the same known person
3. Load a picture of a totally different person
Then the algorithm looks at the measurements it is currently generating for
each of those three images. It then tweaks the neural network slightly so that it
makes sure the measurements it generates for #1 and #2 are slightly closer
while making sure the measurements for #2 and #3 are slightly further apart
We are not forced to do the train, we can use pre-trained models provided by OpenFace
(https://cmusatyalab.github.io/openface/)
- 11. SEGMENTATION
SUBTITLE
Fully Convolutional Network
• E2E convolutional network
• In the final layers
• the depth is higher
• the size is smaller
SegNet
• encoder and decoder approach
• it is less intensive on memory
and many others: DeepLab, RefiNet, PSPNet, DeepLab v3, UNet, ...
- 12. OBJECTS DETECTION
SUBTITLE
R-CNN
Fast R-CNN
Faster R-CNN
YOLO
• feed the input image to the CNN to generate a convolutional feature map.
• From the convolutional feature map, identify the region of proposals through a
selective search algorithm and warp them into squares
• by using a RoI pooling layer we reshape them into a fixed size
• use a softmax layer to predict the class of the proposed region and also the
offset values for the bounding box.
• The image is provided as an input to a convolutional networkto compute the
convolutional feature map.
• A separate network is used to predict the region proposals.
• The predicted region proposals are then reshaped using a RoI pooling layer
which is then used to classify the image within the proposed region and
predict the offset values for the bounding boxes.
• split image into an SxS grid, within each of the grid we take m bounding boxes.
• For each of the bounding box, the network outputs a class probability and offset values for the
bounding box.
• The bounding boxes having the class probability above a threshold value is selected and used to
locate the object within the image.
• Extract region proposal (~2K)
• Compute region feaures with a CNN
• Use SVM to classify regions features
- 13. Introduction on Deep Learning and
Image Processing/Computer Vision01
02
03
AGENDA
Tools and Algorithms
Some Real Life Example
- 15. DOCUMENT AUTOMATION
Handwritten letters recognition through CNN
• Identify ROIs
• Extract Characters
• Thresholding and Morphological Operators help us to build boxes around each character
• Classify extracted characters
• LeNet trained on EMNIST
• We improved model by considering a deeper and wider model reaching accuracy of 82%
Accuracy on real data: 75%
- 16. DOCUMENT AUTOMATION
Graphical Element Analysis using Object Detection
We specialized TF object detector (…) to recognize graphical object
inside documents under analysis:
• Dataset Creation: LabelImg to create XML (PASCAL VOC) for
each image
https://github.com/tzutalin/labelImg
• Use of the training script provided with TF and adapted it to our
purposes
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md
• Updated the config file for the model (XML)
https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs
• We started the train from a particular checkpoint of the mpodel
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
DATAMATRIX - 97.5%
- 17. IDENTITY RECOGNITION
Face Detection to Recognize Identity
CBIR
• Recognize identity through face detection for event pass
• Collect images from users through a registration process to a Content Based Image Retrieval (CBIR)
• At the event, the system takes a picture of the user face and query to CBIR
for all similar faces. This is done using LSH algorithm
• The identity is then checked by a comparison between the «query» face and
the faces returned by CBIR