Deep Learning Tomography

Deep Learning Tomography
Dr. Amir Adler
Shell Technology Center Amsterdam
5.11.2019
1

Collaborators
2
Tomaso Poggio Mauricio Araya-Polo Stuart Farris Joseph Jennings
MIT Shell Stanford Stanford

Agenda
3
• Seismic Tomography: Forward & Inverse Problems
• Tomography via Empirical Risk Minimization
• The Deep Learning Approach
• Feature Selection: Hand-Crafted vs. Raw Data
• Image Reconstruction: FC vs. RNN Architectures
• Conclusions

Marine Seismic Survey for
Oil & Gas Exploration
4
Schuster, Seismic Inversion, SEG, 2017.

The Forward and Inverse Problems
5
Velocity Model
A 2D or 3D model of subsurface layers, where
each grid point is the corresponding acoustic
wave propagation velocity (m/s)

The Forward Problem
6
 
noise
spacedatatomodelvelocityfrommappingF
(unknown)modelocitytruth velground
dataseismicrecorded




n
m
d
  nmd  F

The Inverse Problem
7
 
  functionloss,
spacedatatomodelvelocityafrommapping~F
dataseismicrecorded
modelvelocitypredictedˆ




L
m
d
m
  mdm
m
~F,minargˆ ~
L

Tomography via
Empirical Risk Minimization
8
 
 
trainsetaoverriskempiricalminimalofsensein the"best"
Tbydefinedspacefunctionin thebest""for theSearch:
?vectorparametersset thetoHow:
,Tˆ
(function)operatorTomography,TDesign







A
Q
dm
d

Trainset Generation Workflow
Shots
Waveform
and
Locations
Geophones
Array
Geometry
Acoustic Wave
Propagation
Forward Model
𝐦𝑖, 𝐝𝑖 𝑖=1
𝑁
idVelocity
Models
Generator
Seismic Data
Ground Truth
Velocity Model
Trainset
𝐦𝑖

Tomography via
Empirical Risk Minimization
10
𝛼 = arg min
𝛼
1
𝑁
𝑖=1
𝑁
𝐿 𝐦𝑖, T 𝐝𝑖, 𝛼
𝑤ℎ𝑒𝑟𝑒:
𝐦𝑖, 𝐝i i=1
𝑁
− dataset with 𝑁 examples
𝐦𝑖 − ground truth i−th velocity model
𝐝𝑖 − seismic data resulting from model 𝐦𝑖
T X,𝛼 − Tomography operator, parameterized by 𝛼
L •,• −loss function

Empirical Risk Minimization with
MSE Loss
11
 
 
modelsstacked-columnhererepresentˆ,where
,
1
minargˆ
:taskregressionaobtainWe
ˆˆ,lossErrorSquaredthechoosingBy
1
2
2
2
2
mm
dm
mmmm



N
i
ii T
N
L



The Potential of Deep Learning
12
Network Capacity
(i.e Function Space)

High-Level Deep Learning Solution
13
 Input
 Hand-crafted: Semblance cube
 Hand-crafted: Spectrograms of raw seismic data
 Raw seismic data
 Features Extraction: Convolutional layers
 Image Reconstruction: Fully connected vs. recurrent layers
 Super-Resolution: Optional convolutional layers

Evaluated Architectures
14
Number Input Features
Extraction
Image
Reconstruction
Super-
Resolution
1 Semblance cube 3D CNN Multiple FC None
2 Spectrograms 2D CNN Multiple FC 2D CNN
3 Spectrograms 2D CNN RNN + FC 2D CNN
4 Spectrograms 2D CNN LSTM + FC 2D CNN
5 Spectrograms 2D CNN GRU + FC 2D CNN
6 Raw Data 1D CNN Multiple FC 2D CNN
7 Raw Data 1D CNN RNN + FC 2D CNN
8 Raw Data 1D CNN LSTM + FC 2D CNN
9 Raw Data 1D CNN GRU + FC 2D CNN
10 Raw Data 3D CNN Multiple FC None

Network Training Workflow
15
Deep Neural
Network
𝐦𝑖, 𝐝𝑖 𝑖=1
𝑁
Predicted
Velocity Model
𝐦𝑖
Learning
Algorithm (SGD) Loss
Seismic Gather
𝐝𝑖
𝐦𝑖
Data
Label
Trainset

Semblance-Based Architecture
16
Employ Semblance velocity analysis to
the raw data, and obtain a Semblance
cube: Time x Velocity x Midpoint

Semblance-Based Architecture
17
Use Semblance cube as input to 3D CNN:

Spectrograms-Based Architecture
22
Takahashi, Naoya, et al. "Deep Convolutional Neural Networks and Data Augmentation for
Acoustic Event Detection“, in Interspeech 2016.
Number of sensors
Use Spectrogram images of seismic traces as input to 2D CNN:

The Spectrogram
23Reveals temporal changes in frequency content

Spectrogram-Based Architecture
25

Recurrent Architectures for
Image Reconstruction
26
 Recurrent cells are designed for processing time series data.
 Natural candidates for seismic data processing
 Real-life product: Google’s motion sense radar in Pixel 4
 We evaluated three recurrent architectures, based on:
 Recurrent Neural Network (RNN) cells
 Long Short Term Memory (LSTM) cells
 Gated Recurrent Unit (GRU) cells

27
RNN Cell

28

Spectrogram-Based
Baseline Recurrent Architecture
29

Spectrogram-Based Architectures
30
VMB results of 4 models with salt bodies: 1st row ground truth (GT); 2nd row CNN
(non-recurrent); 3rd row RNN; 4th row LSTM; and 5th row GRU results. The
dimensions of each model are 70 x 90 pixels, representing depth (vertical axis) and
lateral offset (horizontal axis).

 Dataset: 9,600 gathers for training and 2,400 for testing
 Each gather: 3 x 17 = 51 seismic traces
 Noiseless data
Spectrogram-Based Architectures
31
Network Type CNN RNN LSTM GRU
Number of coefficients in recurrent layer 0 655,872 2,623,488 1,967,616
Total number of coefficients 7,182,728 1,557,896 3,525,512 2,869,640
Percentage of coefficients vs. CNN network 100% 21.68% 49.08% 39.95%
SSIM (averaged on 2,400 velocity models) 0.8199 0.8210 0.8378 0.8414
MSE (averaged on 2,400 velocity models) 0.0018 0.0019 0.0014 0.0013

Learning Features from Raw Traces
32
• Learn a collection of 1D filters, each processes a single seismic trace in the
time domain.
• Proposed [1,2] as a superior alternative to Spectrogram-based Deep
Learning, in the context of Automatic Speech Recognition.
1. Y. Hoshen, R. J. Weiss and K. W. Wilson, "Speech acoustic modeling from raw multichannel waveforms," in IEEE
ICASSP, 2015.
2. T. N. Sainath et al., "Multichannel Signal Processing With Deep Neural Networks for Automatic Speech
Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 5, 2017.

Raw Data-Based Architecture
33
Use raw seismic traces as inputs to 1D CNN:

Raw Data-Based
Baseline Recurrent Architecture
34
Use raw seismic traces as inputs to 1D CNN:

Spectrograms vs. Raw Data
35
 Dataset: 9,600 gathers for training and 2,400 for testing
 Each gather: 3 x 51 = 153 seismic traces
 Noiseless & Noisy data

Tomography from Raw Data Volume
 Reshape gather to 3D volume: Time x Receivers x Shots
Time
Receivers
Shots
3D CNN Layers
Fully - Connected
Layers
Features
Extractions
Image
Reconstruction
34
Predicted
Model

37
Tomography from Raw Data Volume

Conclusions
38
We Are Here
Network Capacity
(i.e Function Space)
1) Comparable good performance of the evaluated architectures for
relatively simple velocity models with salt and additive noise
2) No killer architecture, some advantage with recurrent layers, learn
features directly from raw data
3) Massive amounts of data, and increased network capacity are
required to maximize performance with complex velocity models

Publications
39
 M. Araya-Polo, J. Jennings, A. Adler, T. Dhalke,
"Deep Learning Tomography", in The Leading Edge, 2018.
 A. Adler, M. Araya-Polo, T. Poggio,
"Deep Recurrent Architectures for Seismic Tomography",
in EAGE Conference & Exhibition, 2019.
 M. Araya-Polo, A. Adler, J. Jennings, S. Farris, "Fast and
Accurate Seismic Tomography via Deep Learning", in
Deep Learning: Algorithms and Applications, Springer, 2020.

Deep Learning Tomography

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning Tomography

Similar to Deep Learning Tomography (20)

Recently uploaded

Recently uploaded (20)

Deep Learning Tomography