Introducing Azure SQL Data Warehouse

James Serra
Data Platform Solution Architect
Microsoft

Parallel Data
Warehouse v1
Data Allegro
product on
Windows &
SQL. First DW
appliance by
MSFT in
partnership
with
Dell and HP
Microsoft
Acquired Data
Allegro
Company
viewed as
most efficient
way to bring
MPP to SQL
Server world
Analytics
Platform
System (APS)
Introduction of
Hadoop region
within
appliance and
new naming to
reflect broader
Big Data
capabilities
SQL DW
Service
Introduction of
Azure SQL DW
Service based
on APS’s MPP
capabilities
Fast Track
Data
Warehouse
Launch
DW Reference
Architectures
based on SMP
DW best
practices
offered with
leading H/W
Partners
Parallel Data
Warehouse v2
Re-architected
Product
delivering new
form factors
and greatly
improved
price/performa
nce.
Microsoft & Data Warehouse
2008 20132010 201520142011

Customer challenges in managing data
Increased data
types and volumes
Varied data sources
Added complexity
and cost

BI and analytics
Data management and processing
Data sources Non-relational data
Data enrichment and federated query
OLTP ERP CRM LOB Devices Web Sensors Social
Self-service Corporate Collaboration Mobile Machine learning
Single query model Extract, transform, load Data quality Master data management
Box software Appliances Cloud
SQL Server
Box software Appliances Cloud

Parallelism
• Uses many separate CPUs running in parallel to execute a single
program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel
Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing

SQL DW Logical Architecture (overview)
“Compute” node Balanced storage
SQL
SQL
SQL
SQL
DMS
DMS
DMS
DMS
Compute Node – the “worker bee” of SQL DW
• Runs Azure SQL Server DB
• Contains a “slice” of each database
• CPU is saturated by storage
Control Node – the “brains” of the SQL DW
• Also runs Azure SQL Server DB
• Holds a “shell” copy of each database
• Metadata, statistics, etc
• The “public face” of the appliance
Data Movement Services (DMS)
• Part of the “secret sauce” of SQL DW
• Moves data around as needed
• Enables parallel operations among the compute
nodes (queries, loads, etc)
“Control” node
SQL
DMS

SQL DW Logical Architecture (overview)
SQL“Control” node
SQL
SQL
SQL
SQL
DMS
DMS
DMS
DMS
DMS
1) User connects to the appliance (control node)
and submits query
2) Control node query processor determines
best *parallel* query plan
3) DMS distributes sub-queries to each compute
node
4) Each compute node executes query on its
subset of data
5) Each compute node returns a subset of the
response to the control node
6) If necessary, control node does any final
aggregation/computation
7) Control node returns results to user
Queries running in parallel on a subset of the data, using separate pipes effectively making the pipe larger

Elastic scale & performance
Real-time elasticity
Resize in <1 minute On-demand compute
Expand or reduce
as needed

Storage can be as big or
small as required
Customers can execute niche
workloads without re-scanning data
Elastic scale & performance
Scale

App Service
Intelligent App
Hadoop
Azure Machine
Learning
Power BI
Azure SQL
Database
SQL
AzureSQL Data
Warehouse
End-to-end platform built for the cloud
Power of integration

Azure Data Factory
Migration Accelerator
ExpressRoute
End-to-end platform built for the cloud
Bring compute to data, keep data in its place

Market leading price/performance
Bring your data warehouse to the cloud
Automated
Minimize cost
Policy-based
Secure data

Query unstructured data via PolyBase/T-SQL
PolyBase
Scale out compute
SQL DW Instance
Hadoop VMs /
Azure Storage
Any data, any size, anywhere

Hassle-free management
Infrastructure
Management
Azure support
With built-in ease of use

When Paused, Pay only for Storage
Use it only when you need it – no reloading / restoring of data
Save Costs with Dynamic Pause and Resume
• When paused, cloud-scale storage is min cost.
• Policy-based (i.e. Nights/weekends)
• Automate via PowerShell/REST API
• Data remains in place

Geo-storage replication
 Azure Storage Page Blobs, 3 copies locally
 High durability/availability
 Another 3 copies in different region
Defend against regional disasters
Geo replication

• Auto backups, every 4 hours
• On-demand backups in Azure Storage
• REST API, PowerShell or Azure Portal
• Scheduled exports
• Near-online backup/restore
• Backups retention policy:
• Auto backups, up to 35 days
• On-demand backups
retained indefinitely
Geo- replicated
Restore from backup
SQL DW backups
sabcp01bl21
Azure Storage
sabcp01bl21
Automatic backup and geo-restore
Recover from data deletion or alteration or disaster

Hybrid scenarios which work well
Both Analytics Platform System and Azure SQL Data Warehouse
have a Massively Parallel Processing (MPP) engine. Here are a
few scenarios where they can be leveraged together.
Dev/test
Test new ideas in
SQL DW before rolling
out to production in APS
Archive
Archive cold data to blob
storage for any workload
execution
Governance
Store data in APS that
company policy prohibits
being in the cloud

Microsoft
Data
Platform
Relational Beyond-Relational
On-premisesCloud
Comprehensive
Connected
Choice
SQL ServerAzureVM
Azure SQL DB
Azure SQL DW
AzureData Lake Analytics
AzureData Lake Store
Fast Trackfor SQL Server
AnalyticsPlatformSystem
SQL Server2016 + SuperdomeX
AnalyticsPlatformSystem
Hadoop
Federated Query
Power BI
AzureMachineLearning
AzureData Factory

SQL DW: Building on SQL DB Foundation
Elastic, Petabyte Scale
DW Optimized
99.99% uptime SLA,
Geo-restore
Azure Compliance (ISO, HIPAA, EU, etc.)
True SQL Server Experience;
Existing Tools Just Work
SQL DW
SQL DB
Service Tiers

Measure of power Simply buy the query performance you need, not just hardware
Transparency Quantified by workload objectives: how fast rows are scanned, loaded, copied
On demand First DW service to offer compute power on demand, independent of storage
Scan Rate 3.36M row/sec
Loading Rate 130K row/sec
Table Copy Rate 350K row/sec
* *
100 DWU = 297 sec
400 DWU = 74 sec
800 DWU = 37 sec
1,600 DWU = 19 sec
*

What is Hadoop?
Microsoft Confidential
 Distributed, scalable system on commodity HW
 Composed of a few parts:
 HDFS – Distributed file system
 MapReduce – Programming model
 Other tools: Hive, Pig, SQOOP, HCatalog, HBase,
Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie,
ZooKeeper, Flume, Storm
 Main players are Hortonworks, Cloudera, MapR
 WARNING: Hadoop, while ideal for processing huge
volumes of data, is inadequate for analyzing that
data in real time (companies do batch analytics
instead)
Core Services
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
OOZIE
AMBARI
YARN
MAP
REDUCE
HIVE &
HCATALOG
PIG
HBASEFALCON
Hadoop Cluster
compute
&
storage . . .
. . .
. .
compute
&
storage
.
.
Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware

Use cases where PolyBase simplifies using Hadoop data
Bringing islands of Hadoop data together
High performance queries against Hadoop data
(Predicate pushdown)
Archiving data warehouse data to Hadoop (move)
(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)
(Hadoop as backup, analysis, on-prem use)
Importing Hadoop data into data warehouse (copy)
(Hadoop as staging area, sandbox, Data Lake)

Introducing Azure SQL Data Warehouse





Azure SQL Data Warehouse loading patterns and strategies: https://blogs.msdn.microsoft.com/sqlcat/2016/02/06/azure-sql-data-warehouse-loading-patterns-and-strategies/

Broad SQL Server Partner
Ecosystem
+ Leverage Azure ML, HDInsight, PowerBI, ADF,
and more.
+ Industry’s broadest ecosystem of DW partners,
including Tableau, Informatica, Attunity, and SAP.
Streamlined deployment with Azure Portal.
Deep tool integration with top partners including:
• Single-click configuration
• Optimized data movement
• Logical pushdown
Azure SQL DW
Azure ML
Azure Event Hub
Azure HDInsight

Market-Leading Price/Performance
• Best On-Demand Price/Performance
‐ Advantages in elasticity and pause to
reduce customer cost
• SQL DW start small, can grow to PB+
• Pay for performance by scaling
compute against storage
100GB 1TB 2TB 1+PB
Performance

How does SQL Data Warehouse differ from Redshift?
Elasticity
Amazon Redshift SQL DW
Pause/resume
Simplicity
Hybrid
Compatibility

Summary: Azure SQL DW Service
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Support your smallest to your largest data storage needs while handling queries up to 100x faster.

Azure getting started
• Free Azure account, $200 in credit, https://azure.microsoft.com/en-us/free/
• Startups: BizSpark, $750/month free Azure, BizSpark Plus - $120k/year free Azure,
https://www.microsoft.com/bizspark/
• MSDN subscription, $150/month free Azure, https://azure.microsoft.com/en-us/pricing/member-
offers/msdn-benefits/
• Microsoft Educator Grant Program, faculty - $250/month free Azure for a year, students -
$100/month free Azure for 6 months, https://azure.microsoft.com/en-us/pricing/member-
offers/msdn-benefits/
• Microsoft Azure for Research Grant, http://research.microsoft.com/en-
us/projects/azure/default.aspx
• DreamSpark for students, https://www.dreamspark.com/Student/Default.aspx
• DreamSpark for academic institutions: https://www.dreamspark.com/Institution/Subscription.aspx
• Various Microsoft funds

Questions?
James Serra
jserra@microsoft.com

Introducing Azure SQL Data Warehouse

More Related Content

What's hot

What's hot (20)

Similar to Introducing Azure SQL Data Warehouse

Similar to Introducing Azure SQL Data Warehouse (20)

More from James Serra

More from James Serra (20)

Recently uploaded

Recently uploaded (20)

Introducing Azure SQL Data Warehouse