On the development of a Visual-Temporal-awareness Rheumatic Heart Disease classifier for Echocardiographic Videos

• Rheumatic Heart Disease (RHD) is a heart condition caused by abnormal immune
response to streptococcal infection,
• streptococcal: a bacteria normally associated with poor sanitation and
hygiene conditions.
• The burden of RHD is concentrated in low-income countries,
• health resources are scarce.
• Echocardiographic (echo) screening is the gold standard for diagnosis of latent
RHD;
• personnel shortages limit broad implementation.
• To address this issue, we aimed to develop a machine-learning model for automatic
identification to be used in further steps of our solution for RHD screening for
prioritization of follow-up.
1

Preprocessing phase
• Videos clipped at 16 frames
• Rotation and resizing to 128x171 pixels (required by the DNN chosen)
• Whitening (process that subtracts the pixels in each video by the mean of the
videos in the original training data)
2
Video Pre-processing
Before whitening After whitening
Frame of a video
with doppler
Frame of a video
without doppler

Methodology
• Videos with and without doppler were considered separately.
• Undersampling according to the borderline-RHD class
• Classify an exam directly, i.e., there is no view classification
• Use of the C3D neural network proposed by Tran et al. [2015], originally
trained with the Sports-1M dataset
• Changed the classification layer according to the problem modeling
followed
• Fine-tuned the parameters with the training set
3
Methodology
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks,
ICCV 2015

Modified version of the C3D architecture (as showed below)
• Input: 16 frames from a video of an exam;
• 50 epochs with early stopping;
• Batch size of 16;
• Learning rate of 0.001 and a random crop strategy.
4
Network architecture
Normal
or
RHD
positive
Visual feature extraction Classifier

Preliminary experiments to understand the capability of the network in extracting visual
features and separating the 2 classes of interest.
We biased the training to maximize the Borderline accuracy.
Results of confusion matrix per video considering two classes: RHD positive and
negative:
• accuracy: 0.628 (95% CI, 0.573 – 0.682)
• specificity: 0.615 (95% CI, 0.435 – 0.795)
• sensibility: 0.641 (95% CI, 0.432 – 0.850)
5
Results per video
and 2 classes

• Hyperparameter tuning (hyperband)
• Take advantage of visual features from the doppler images;
• Analyze the visual features the networks use to classify the exams (interpretability)
and compare with those used by doctors;
• Build a network architecture with 2 arms (see figure below), considering both
doppler images and raw images from the exams.
6
Doing
Normal
or
RHD positive
DopplerImageRawImage

On the development of a Visual-Temporal-awareness Rheumatic Heart Disease classifier for Echocardiographic Videos

Related slideshows

More Related Content

On the development of a Visual-Temporal-awareness Rheumatic Heart Disease classifier for Echocardiographic Videos