SlideShare a Scribd company logo
• Rheumatic Heart Disease (RHD) is a heart condition caused by abnormal immune
response to streptococcal infection,
• streptococcal: a bacteria normally associated with poor sanitation and
hygiene conditions.
• The burden of RHD is concentrated in low-income countries,
• health resources are scarce.
• Echocardiographic (echo) screening is the gold standard for diagnosis of latent
RHD;
• personnel shortages limit broad implementation.
• To address this issue, we aimed to develop a machine-learning model for automatic
identification to be used in further steps of our solution for RHD screening for
prioritization of follow-up.
1
Preprocessing phase
• Videos clipped at 16 frames
• Rotation and resizing to 128x171 pixels (required by the DNN chosen)
• Whitening (process that subtracts the pixels in each video by the mean of the
videos in the original training data)
2
Video Pre-processing
Before whitening After whitening
Frame of a video
with doppler
Frame of a video
without doppler
Methodology
• Videos with and without doppler were considered separately.
• Undersampling according to the borderline-RHD class
• Classify an exam directly, i.e., there is no view classification
• Use of the C3D neural network proposed by Tran et al. [2015], originally
trained with the Sports-1M dataset
• Changed the classification layer according to the problem modeling
followed
• Fine-tuned the parameters with the training set
3
Methodology
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks,
ICCV 2015
Modified version of the C3D architecture (as showed below)
• Input: 16 frames from a video of an exam;
• 50 epochs with early stopping;
• Batch size of 16;
• Learning rate of 0.001 and a random crop strategy.
4
Network architecture
Normal
or
RHD
positive
Visual feature extraction Classifier
Preliminary experiments to understand the capability of the network in extracting visual
features and separating the 2 classes of interest.
We biased the training to maximize the Borderline accuracy.
Results of confusion matrix per video considering two classes: RHD positive and
negative:
• accuracy: 0.628 (95% CI, 0.573 – 0.682)
• specificity: 0.615 (95% CI, 0.435 – 0.795)
• sensibility: 0.641 (95% CI, 0.432 – 0.850)
5
Results per video
and 2 classes
• Hyperparameter tuning (hyperband)
• Take advantage of visual features from the doppler images;
• Analyze the visual features the networks use to classify the exams (interpretability)
and compare with those used by doctors;
• Build a network architecture with 2 arms (see figure below), considering both
doppler images and raw images from the exams.
6
Doing
Normal
or
RHD positive
DopplerImageRawImage

More Related Content

On the development of a Visual-Temporal-awareness Rheumatic Heart Disease classifier for Echocardiographic Videos

  • 1. • Rheumatic Heart Disease (RHD) is a heart condition caused by abnormal immune response to streptococcal infection, • streptococcal: a bacteria normally associated with poor sanitation and hygiene conditions. • The burden of RHD is concentrated in low-income countries, • health resources are scarce. • Echocardiographic (echo) screening is the gold standard for diagnosis of latent RHD; • personnel shortages limit broad implementation. • To address this issue, we aimed to develop a machine-learning model for automatic identification to be used in further steps of our solution for RHD screening for prioritization of follow-up. 1
  • 2. Preprocessing phase • Videos clipped at 16 frames • Rotation and resizing to 128x171 pixels (required by the DNN chosen) • Whitening (process that subtracts the pixels in each video by the mean of the videos in the original training data) 2 Video Pre-processing Before whitening After whitening Frame of a video with doppler Frame of a video without doppler
  • 3. Methodology • Videos with and without doppler were considered separately. • Undersampling according to the borderline-RHD class • Classify an exam directly, i.e., there is no view classification • Use of the C3D neural network proposed by Tran et al. [2015], originally trained with the Sports-1M dataset • Changed the classification layer according to the problem modeling followed • Fine-tuned the parameters with the training set 3 Methodology D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015
  • 4. Modified version of the C3D architecture (as showed below) • Input: 16 frames from a video of an exam; • 50 epochs with early stopping; • Batch size of 16; • Learning rate of 0.001 and a random crop strategy. 4 Network architecture Normal or RHD positive Visual feature extraction Classifier
  • 5. Preliminary experiments to understand the capability of the network in extracting visual features and separating the 2 classes of interest. We biased the training to maximize the Borderline accuracy. Results of confusion matrix per video considering two classes: RHD positive and negative: • accuracy: 0.628 (95% CI, 0.573 – 0.682) • specificity: 0.615 (95% CI, 0.435 – 0.795) • sensibility: 0.641 (95% CI, 0.432 – 0.850) 5 Results per video and 2 classes
  • 6. • Hyperparameter tuning (hyperband) • Take advantage of visual features from the doppler images; • Analyze the visual features the networks use to classify the exams (interpretability) and compare with those used by doctors; • Build a network architecture with 2 arms (see figure below), considering both doppler images and raw images from the exams. 6 Doing Normal or RHD positive DopplerImageRawImage