Modernizing upstream workflows with aws storage - john mallory

“The Digital Oilfield is not merely about computer chips, processors and software. It is
about the melding of operations technology with information technology and the
Internet of Things. It involves a powerful combination of distributed network sensors,
ubiquitous mobile connectivity, cloud computing, advanced big data analytics and
artificial intelligence. It has the ability to “learn” from what works in the best producing
wells and apply those learnings to entire fields. It will predict equipment breakdown
before it happens and bring about “condition-based” maintenance rather than
“schedule-based” methods. It will track workers in the field, feed them the data they
need via various platforms, “coach” their work in real-time and remove them from
hazardous situations. Ultimately, it will produce more oil and gas for less cost.”
– Accenture 2016 Digital Oilfield Outlook

Digital Oilfield …(simplified)
Operations
Technology
Information
Technology

Digital Transformation is a Journey
Monitoring
Insights
New Use Cases
Optimization
Transformation

Analytics = Value From Data
1%
of information gathered from the field is currently made available to oil and gas decision-makers
What Keeps Us From Using More?

Upstream Information Management
• Very large number of diverse complex, multi-modal & multi-scale datasets
• Not a sequential series of separate tasks, rather a continuum of multiple scenario iterations
Source: Common Data Access LimitedSource: Schlumberger

Data Silos Are a Key Challenge
Hadoop/Stream
Analytics
Clusters
HPC
Clusters
SAP, EDW,
Databases
Exploration Production OptimizationOperations &
Planning

Enter the Data Lake Architecture
Data Lake is a new and increasingly popular
architecture to store and analyze massive
volumes and heterogeneous types of data.
Benefits of a Data Lake
• All Data in One Place
• Quick Ingest
• Storage vs Compute
• Schema on Read
• Multi-User Environment

Cloud Data Migration
Direct ConnectSnow* data
transport family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
AWS Storage Platform and SolutionsThe AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS

Consolidate Data & Separate Storage & Compute
• Amazon S3 as the data lake storage tier; not a single analytics tool like an
IoT streaming analytics cluster or a Seismic Processing HPC cluster
• Decoupled storage and compute is cheaper and more efficient to operate
• Decoupled storage and compute allow us to evolve to clusterless
architectures (i.e. AWS Lambda, Amazon Athena, Redshift Spectrum & AWS
Glue)
• Do not build data silos in Hadoop or HPC clusters
• Gain the flexibility to use all the analytics tools and compute options in the
ecosystem around S3 & future proof the architecture

Designed for 11 9s
of durability
• Multiple Encryption Options
• Robust/Highly Flexible Access Controls
Durable Secure High performance
 Multiple upload
 Range GET
 Scalable Throughput
 Amazon EMR
 Amazon Redshift
 Amazon DynamoDB
 Amazon Athena
 Amazon Rekognition
 Amazon Glue
Integrated
 Simple REST API
 AWS SDKs
 Read-after-create consistency
 Event notification
 Lifecycle policies
 Simple Management Tools
 Hadoop compatibility
Easy to use
 Store as much as you need
 Scale storage and compute
independently
 Scale without limits
 Affordable
Scalable
Why Choose Amazon S3

S3 Standard S3 Standard - Infrequent Access Amazon Glacier
Active data Archive dataInfrequently accessed data
Milliseconds Minutes to HoursMilliseconds
$0.021/GB/mo $0.004/GB/mo$0.0125/GB/mo
Choice of storage classes on Amazon S3

Amazon S3 Amazon Glacier
Object
Object Storage is Foundational
LambdaEC2 EMR Spark Kinesis
Athena DynamoDB RedShift
Data Query
Steaming AnalyticsCompute
API
Gateway
QuickSight
Data Presentation

Upstream Integration – Linking Applications & Data
Geology &
Geophysics
Petrophysics
Reservoir
Engineering
Drilling &
Completions
Facility
Engineering
Production
Shared
Services
Policy-Based Data Management
AspenTech
PIMS
Landmark
OpenWorks
Schlumberger
Petrel
Intergraph
SPF
IHS
Petra
Paradigm
VoxelGeo
ESRI
ArcGIS
OtherSAPMicrosoft
SharePoint
Schlumberger
Eclipse
S3

What About Data Management?
16
Do we have all
the latest data
for this well?
Do we keep all
the relevant
data for this
well?
Do we have
data from all
domains?
Do we have
data in
formats that I
can use?
Are there
multiple
copies of the
same data?
Do we have
data adjacent
to this area?
Where do I
find all the
data I need?
Do we know
the history of
all our data?

Catalog Your Data
S3
Put data in S3
Amazon
DynamoDB
Amazon Elasticsearch
Service
Metadata
What is in the data lake?
Documents the data lake
Summary statistics
Classification
Data Sources
Search capabilities
https://aws.amazon.com/answers/big-data/data-lake-solution/

Glue Crawlers: auto-populate data catalogs
Automatic schema inference:
• Built-in classifiers detect file type and extract
schema: record structure and data types.
• Add your own or share with others in the Glue
community - It's all Grok and Python.
Auto-detects Hive-style partitions, grouping
similar files into one table.
Run crawlers on schedule to discover new data
and schema changes.
Serverless – only pay when crawls run.

AWS Snowball & Snowmobile
• Accelerate PBs with AWS-provided
appliances
• 50, 80, 100 TB models
• 100PB Snowmobile
AWS Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement), and
Choose the Right Ingestion Methods
Amazon Kinesis Firehose
• Ingest device streams directly into
AWS data stores
AWS Direct Connect
• COLO to AWS
• Use native copy tools
Native/ISV Connectors
• Sqoop, Flume, DistCp
• Commvault, Veritas, etc

AWS Snowball Edge
Petabyte-scale hybrid device with onboard compute and storage
• 100 TB local storage
• Local compute equivalent to an Amazon EC2
m4.4xlarge instance
• 10GBase-T, 10/25Gb SFP28, and 40Gb
QSFP+ copper, and optical networking
• Ruggedized and rack-mountable

Snowball Edge key features
S3-compatible endpoint
File interface (NFS)
Clustering
Run AWS Lambda functions
Faster data transfer
Encryption

What Do We Do With the Data?
Field Data
Well Data
Geophysical
Data
Geological
Data
Reservoir
Data
Production
Data
Reserves
Biometric
Data

What Do We Do With the Data? (Part 2)
• Well Placement Optimization
• Production Optimization
• Predictive Maintenance
• Fleet & Asset Management
• Improved Safety & Compliance

How Do We Do It? Choose the Right Tools..
Amazon Redshift, Spectrum
Enterprise Data Warehouse
Amazon EMR
Hadoop/Spark
Amazon Athena
Clusterless SQL
Amazon Glue
Clusterless ETL
Amazon Aurora
Managed Relational Database
Amazon Machine Learning
Predictive Analytics
Amazon Quicksight
Business Intelligence/Visualization
Amazon ElasticSearch Service
ElasticSearch
Amazon ElastiCache
Redis In-memory Datastore
Amazon DynamoDB
Managed NoSQL Database
Amazon Rekognition & Amazon Polly
Image Recognition & Text-to-Speech AI APIs
Amazon Lex
Voice or Text Chatbots

The Emerging Analytics Architecture
AthenaAmazon Athena
Interactive Query
AWS Glue
ETL & Data Catalog
Storage
Serverless
Compute
Data
Processing
Amazon S3
Exabyte-scale Object Storage
Amazon Kinesis Firehose
Real-Time Data Streaming
Amazon EMR
Managed Hadoop Applications
AWS Lambda
Trigger-based Code Execution
AWS Glue Data Catalog
Hive-compatible Metastore
Amazon Redshift Spectrum
Fast @ Exabyte scale
Amazon Redshift
Petabyte-scale Data Warehousing

Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Warehouse
Amazon DynamoDB
NoSQL Database
AWS Lambda
Spark Streaming on
EMR
Amazon Elasticsearch
Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine Learning
Predictive Analytics
Any Open Source Tool of
Choice on EC2
AWS Data Lake
Analytic
Capabilities
Data Science Sandbox
Visualization /
Reporting
Apache Storm
on EMR
Apache Flink
on EMR
Amazon Kinesis
Analytics
Serving Tier
Clusterless SQL Query
Amazon Athena
DataSourcesTransactionalData
Amazon Glue
Clusterless ETL
Amazon ElastiCache
Redis

Modernizing upstream workflows with aws storage - john mallory

Related slideshows

More Related Content

Modernizing upstream workflows with aws storage - john mallory