SlideShare a Scribd company logo
OIL & G S INDUSTRY DAY
Modernizing upstream workflows with aws storage -  john mallory
“The Digital Oilfield is not merely about computer chips, processors and software. It is
about the melding of operations technology with information technology and the
Internet of Things. It involves a powerful combination of distributed network sensors,
ubiquitous mobile connectivity, cloud computing, advanced big data analytics and
artificial intelligence. It has the ability to “learn” from what works in the best producing
wells and apply those learnings to entire fields. It will predict equipment breakdown
before it happens and bring about “condition-based” maintenance rather than
“schedule-based” methods. It will track workers in the field, feed them the data they
need via various platforms, “coach” their work in real-time and remove them from
hazardous situations. Ultimately, it will produce more oil and gas for less cost.”
– Accenture 2016 Digital Oilfield Outlook
Digital Oilfield …(simplified)
Operations
Technology
Information
Technology
Digital Transformation is a Journey
Monitoring
Insights
New Use Cases
Optimization
Transformation
Analytics = Value From Data
1%
of information gathered from the field is currently made available to oil and gas decision-makers
What Keeps Us From Using More?
Upstream Information Management
• Very large number of diverse complex, multi-modal & multi-scale datasets
• Not a sequential series of separate tasks, rather a continuum of multiple scenario iterations
Source: Common Data Access LimitedSource: Schlumberger
Data Silos Are a Key Challenge
Hadoop/Stream
Analytics
Clusters
HPC
Clusters
SAP, EDW,
Databases
Exploration Production OptimizationOperations &
Planning
Enter the Data Lake Architecture
Data Lake is a new and increasingly popular
architecture to store and analyze massive
volumes and heterogeneous types of data.
Benefits of a Data Lake
• All Data in One Place
• Quick Ingest
• Storage vs Compute
• Schema on Read
• Multi-User Environment
Cloud Data Migration
Direct ConnectSnow* data
transport family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
AWS Storage Platform and SolutionsThe AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
Consolidate Data & Separate Storage & Compute
• Amazon S3 as the data lake storage tier; not a single analytics tool like an
IoT streaming analytics cluster or a Seismic Processing HPC cluster
• Decoupled storage and compute is cheaper and more efficient to operate
• Decoupled storage and compute allow us to evolve to clusterless
architectures (i.e. AWS Lambda, Amazon Athena, Redshift Spectrum & AWS
Glue)
• Do not build data silos in Hadoop or HPC clusters
• Gain the flexibility to use all the analytics tools and compute options in the
ecosystem around S3 & future proof the architecture
Designed for 11 9s
of durability
• Multiple Encryption Options
• Robust/Highly Flexible Access Controls
Durable Secure High performance
 Multiple upload
 Range GET
 Scalable Throughput
 Amazon EMR
 Amazon Redshift
 Amazon DynamoDB
 Amazon Athena
 Amazon Rekognition
 Amazon Glue
Integrated
 Simple REST API
 AWS SDKs
 Read-after-create consistency
 Event notification
 Lifecycle policies
 Simple Management Tools
 Hadoop compatibility
Easy to use
 Store as much as you need
 Scale storage and compute
independently
 Scale without limits
 Affordable
Scalable
Why Choose Amazon S3
S3 Standard S3 Standard - Infrequent Access Amazon Glacier
Active data Archive dataInfrequently accessed data
Milliseconds Minutes to HoursMilliseconds
$0.021/GB/mo $0.004/GB/mo$0.0125/GB/mo
Choice of storage classes on Amazon S3
Amazon S3 Amazon Glacier
Object
Object Storage is Foundational
LambdaEC2 EMR Spark Kinesis
Athena DynamoDB RedShift
Data Query
Steaming AnalyticsCompute
API
Gateway
QuickSight
Data Presentation
Upstream Integration – Linking Applications & Data
Geology &
Geophysics
Petrophysics
Reservoir
Engineering
Drilling &
Completions
Facility
Engineering
Production
Shared
Services
Policy-Based Data Management
AspenTech
PIMS
Landmark
OpenWorks
Schlumberger
Petrel
Intergraph
SPF
IHS
Petra
Paradigm
VoxelGeo
ESRI
ArcGIS
OtherSAPMicrosoft
SharePoint
Schlumberger
Eclipse
S3
What About Data Management?
16
Do we have all
the latest data
for this well?
Do we keep all
the relevant
data for this
well?
Do we have
data from all
domains?
Do we have
data in
formats that I
can use?
Are there
multiple
copies of the
same data?
Do we have
data adjacent
to this area?
Where do I
find all the
data I need?
Do we know
the history of
all our data?
Lambda Metadata Extract
Catalog Your Data
S3
Put data in S3
Amazon
DynamoDB
Amazon Elasticsearch
Service
Metadata
What is in the data lake?
Documents the data lake
Summary statistics
Classification
Data Sources
Search capabilities
https://aws.amazon.com/answers/big-data/data-lake-solution/
Glue Crawlers: auto-populate data catalogs
Automatic schema inference:
• Built-in classifiers detect file type and extract
schema: record structure and data types.
• Add your own or share with others in the Glue
community - It's all Grok and Python.
Auto-detects Hive-style partitions, grouping
similar files into one table.
Run crawlers on schedule to discover new data
and schema changes.
Serverless – only pay when crawls run.
AWS Snowball & Snowmobile
• Accelerate PBs with AWS-provided
appliances
• 50, 80, 100 TB models
• 100PB Snowmobile
AWS Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement), and
Choose the Right Ingestion Methods
Amazon Kinesis Firehose
• Ingest device streams directly into
AWS data stores
AWS Direct Connect
• COLO to AWS
• Use native copy tools
Native/ISV Connectors
• Sqoop, Flume, DistCp
• Commvault, Veritas, etc
AWS Snowball Edge
Petabyte-scale hybrid device with onboard compute and storage
• 100 TB local storage
• Local compute equivalent to an Amazon EC2
m4.4xlarge instance
• 10GBase-T, 10/25Gb SFP28, and 40Gb
QSFP+ copper, and optical networking
• Ruggedized and rack-mountable
Snowball Edge key features
S3-compatible endpoint
File interface (NFS)
Clustering
Run AWS Lambda functions
Faster data transfer
Encryption
What Do We Do With the Data?
Field Data
Well Data
Geophysical
Data
Geological
Data
Reservoir
Data
Production
Data
Reserves
Biometric
Data
What Do We Do With the Data? (Part 2)
• Well Placement Optimization
• Production Optimization
• Predictive Maintenance
• Fleet & Asset Management
• Improved Safety & Compliance
How Do We Do It? Choose the Right Tools..
Amazon Redshift, Spectrum
Enterprise Data Warehouse
Amazon EMR
Hadoop/Spark
Amazon Athena
Clusterless SQL
Amazon Glue
Clusterless ETL
Amazon Aurora
Managed Relational Database
Amazon Machine Learning
Predictive Analytics
Amazon Quicksight
Business Intelligence/Visualization
Amazon ElasticSearch Service
ElasticSearch
Amazon ElastiCache
Redis In-memory Datastore
Amazon DynamoDB
Managed NoSQL Database
Amazon Rekognition & Amazon Polly
Image Recognition & Text-to-Speech AI APIs
Amazon Lex
Voice or Text Chatbots
The Emerging Analytics Architecture
AthenaAmazon Athena
Interactive Query
AWS Glue
ETL & Data Catalog
Storage
Serverless
Compute
Data
Processing
Amazon S3
Exabyte-scale Object Storage
Amazon Kinesis Firehose
Real-Time Data Streaming
Amazon EMR
Managed Hadoop Applications
AWS Lambda
Trigger-based Code Execution
AWS Glue Data Catalog
Hive-compatible Metastore
Amazon Redshift Spectrum
Fast @ Exabyte scale
Amazon Redshift
Petabyte-scale Data Warehousing
Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Warehouse
Amazon DynamoDB
NoSQL Database
AWS Lambda
Spark Streaming on
EMR
Amazon Elasticsearch
Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine Learning
Predictive Analytics
Any Open Source Tool of
Choice on EC2
AWS Data Lake
Analytic
Capabilities
Data Science Sandbox
Visualization /
Reporting
Apache Storm
on EMR
Apache Flink
on EMR
Amazon Kinesis
Analytics
Serving Tier
Clusterless SQL Query
Amazon Athena
DataSourcesTransactionalData
Amazon Glue
Clusterless ETL
Amazon ElastiCache
Redis
Modernizing upstream workflows with aws storage -  john mallory

More Related Content

Modernizing upstream workflows with aws storage - john mallory

  • 1. OIL & G S INDUSTRY DAY
  • 3. “The Digital Oilfield is not merely about computer chips, processors and software. It is about the melding of operations technology with information technology and the Internet of Things. It involves a powerful combination of distributed network sensors, ubiquitous mobile connectivity, cloud computing, advanced big data analytics and artificial intelligence. It has the ability to “learn” from what works in the best producing wells and apply those learnings to entire fields. It will predict equipment breakdown before it happens and bring about “condition-based” maintenance rather than “schedule-based” methods. It will track workers in the field, feed them the data they need via various platforms, “coach” their work in real-time and remove them from hazardous situations. Ultimately, it will produce more oil and gas for less cost.” – Accenture 2016 Digital Oilfield Outlook
  • 5. Digital Transformation is a Journey Monitoring Insights New Use Cases Optimization Transformation
  • 6. Analytics = Value From Data 1% of information gathered from the field is currently made available to oil and gas decision-makers What Keeps Us From Using More?
  • 7. Upstream Information Management • Very large number of diverse complex, multi-modal & multi-scale datasets • Not a sequential series of separate tasks, rather a continuum of multiple scenario iterations Source: Common Data Access LimitedSource: Schlumberger
  • 8. Data Silos Are a Key Challenge Hadoop/Stream Analytics Clusters HPC Clusters SAP, EDW, Databases Exploration Production OptimizationOperations & Planning
  • 9. Enter the Data Lake Architecture Data Lake is a new and increasingly popular architecture to store and analyze massive volumes and heterogeneous types of data. Benefits of a Data Lake • All Data in One Place • Quick Ingest • Storage vs Compute • Schema on Read • Multi-User Environment
  • 10. Cloud Data Migration Direct ConnectSnow* data transport family 3rd Party Connectors Transfer Acceleration Storage Gateway Kinesis Firehose AWS Storage Platform and SolutionsThe AWS Storage Portfolio Object Amazon GlacierAmazon S3 Block Amazon EBS (persistent) Amazon EC2 Instance Store (ephemeral) File Amazon EFS
  • 11. Consolidate Data & Separate Storage & Compute • Amazon S3 as the data lake storage tier; not a single analytics tool like an IoT streaming analytics cluster or a Seismic Processing HPC cluster • Decoupled storage and compute is cheaper and more efficient to operate • Decoupled storage and compute allow us to evolve to clusterless architectures (i.e. AWS Lambda, Amazon Athena, Redshift Spectrum & AWS Glue) • Do not build data silos in Hadoop or HPC clusters • Gain the flexibility to use all the analytics tools and compute options in the ecosystem around S3 & future proof the architecture
  • 12. Designed for 11 9s of durability • Multiple Encryption Options • Robust/Highly Flexible Access Controls Durable Secure High performance  Multiple upload  Range GET  Scalable Throughput  Amazon EMR  Amazon Redshift  Amazon DynamoDB  Amazon Athena  Amazon Rekognition  Amazon Glue Integrated  Simple REST API  AWS SDKs  Read-after-create consistency  Event notification  Lifecycle policies  Simple Management Tools  Hadoop compatibility Easy to use  Store as much as you need  Scale storage and compute independently  Scale without limits  Affordable Scalable Why Choose Amazon S3
  • 13. S3 Standard S3 Standard - Infrequent Access Amazon Glacier Active data Archive dataInfrequently accessed data Milliseconds Minutes to HoursMilliseconds $0.021/GB/mo $0.004/GB/mo$0.0125/GB/mo Choice of storage classes on Amazon S3
  • 14. Amazon S3 Amazon Glacier Object Object Storage is Foundational LambdaEC2 EMR Spark Kinesis Athena DynamoDB RedShift Data Query Steaming AnalyticsCompute API Gateway QuickSight Data Presentation
  • 15. Upstream Integration – Linking Applications & Data Geology & Geophysics Petrophysics Reservoir Engineering Drilling & Completions Facility Engineering Production Shared Services Policy-Based Data Management AspenTech PIMS Landmark OpenWorks Schlumberger Petrel Intergraph SPF IHS Petra Paradigm VoxelGeo ESRI ArcGIS OtherSAPMicrosoft SharePoint Schlumberger Eclipse S3
  • 16. What About Data Management? 16 Do we have all the latest data for this well? Do we keep all the relevant data for this well? Do we have data from all domains? Do we have data in formats that I can use? Are there multiple copies of the same data? Do we have data adjacent to this area? Where do I find all the data I need? Do we know the history of all our data?
  • 18. Catalog Your Data S3 Put data in S3 Amazon DynamoDB Amazon Elasticsearch Service Metadata What is in the data lake? Documents the data lake Summary statistics Classification Data Sources Search capabilities https://aws.amazon.com/answers/big-data/data-lake-solution/
  • 19. Glue Crawlers: auto-populate data catalogs Automatic schema inference: • Built-in classifiers detect file type and extract schema: record structure and data types. • Add your own or share with others in the Glue community - It's all Grok and Python. Auto-detects Hive-style partitions, grouping similar files into one table. Run crawlers on schedule to discover new data and schema changes. Serverless – only pay when crawls run.
  • 20. AWS Snowball & Snowmobile • Accelerate PBs with AWS-provided appliances • 50, 80, 100 TB models • 100PB Snowmobile AWS Storage Gateway • Instant hybrid cloud • Up to 120 MB/s cloud upload rate (4x improvement), and Choose the Right Ingestion Methods Amazon Kinesis Firehose • Ingest device streams directly into AWS data stores AWS Direct Connect • COLO to AWS • Use native copy tools Native/ISV Connectors • Sqoop, Flume, DistCp • Commvault, Veritas, etc
  • 21. AWS Snowball Edge Petabyte-scale hybrid device with onboard compute and storage • 100 TB local storage • Local compute equivalent to an Amazon EC2 m4.4xlarge instance • 10GBase-T, 10/25Gb SFP28, and 40Gb QSFP+ copper, and optical networking • Ruggedized and rack-mountable
  • 22. Snowball Edge key features S3-compatible endpoint File interface (NFS) Clustering Run AWS Lambda functions Faster data transfer Encryption
  • 23. What Do We Do With the Data? Field Data Well Data Geophysical Data Geological Data Reservoir Data Production Data Reserves Biometric Data
  • 24. What Do We Do With the Data? (Part 2) • Well Placement Optimization • Production Optimization • Predictive Maintenance • Fleet & Asset Management • Improved Safety & Compliance
  • 25. How Do We Do It? Choose the Right Tools.. Amazon Redshift, Spectrum Enterprise Data Warehouse Amazon EMR Hadoop/Spark Amazon Athena Clusterless SQL Amazon Glue Clusterless ETL Amazon Aurora Managed Relational Database Amazon Machine Learning Predictive Analytics Amazon Quicksight Business Intelligence/Visualization Amazon ElasticSearch Service ElasticSearch Amazon ElastiCache Redis In-memory Datastore Amazon DynamoDB Managed NoSQL Database Amazon Rekognition & Amazon Polly Image Recognition & Text-to-Speech AI APIs Amazon Lex Voice or Text Chatbots
  • 26. The Emerging Analytics Architecture AthenaAmazon Athena Interactive Query AWS Glue ETL & Data Catalog Storage Serverless Compute Data Processing Amazon S3 Exabyte-scale Object Storage Amazon Kinesis Firehose Real-Time Data Streaming Amazon EMR Managed Hadoop Applications AWS Lambda Trigger-based Code Execution AWS Glue Data Catalog Hive-compatible Metastore Amazon Redshift Spectrum Fast @ Exabyte scale Amazon Redshift Petabyte-scale Data Warehousing
  • 27. Amazon S3 Data Lake Amazon Kinesis Streams & Firehose Hadoop / Spark Streaming Analytics Tools Amazon Redshift Data Warehouse Amazon DynamoDB NoSQL Database AWS Lambda Spark Streaming on EMR Amazon Elasticsearch Service Relational Database Amazon EMR Amazon Aurora Amazon Machine Learning Predictive Analytics Any Open Source Tool of Choice on EC2 AWS Data Lake Analytic Capabilities Data Science Sandbox Visualization / Reporting Apache Storm on EMR Apache Flink on EMR Amazon Kinesis Analytics Serving Tier Clusterless SQL Query Amazon Athena DataSourcesTransactionalData Amazon Glue Clusterless ETL Amazon ElastiCache Redis