Azure data platform overview

About Me
 Microsoft, Big Data Evangelist
 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”

I tried to understand the Microsoft data platform on my own…
And felt like I was body slammed by Randy
Savage:
Let’s prevent that from happening…

Data platform continuum
Hybrid Cloud
On premises
Shared
Lower cost
Dedicated
Higher cost
Higher administration Lower administration
Off premises

On-Premises
The “evolution” of data platforms

IaaSOn-Premises

PaaSIaaSOn-Premises

PaaSIaaSOn-Premises Pay per query

Microsoft Big Data Portfolio
SQL Server Stretch
Business intelligence
Machine learning analytics
Insights
SQL Server 2017
SQL Server 2016 Fast Track
Azure SQL DW
Databricks
Cosmos DB
HDInsight
Hadoop
Analytics Platform System
Sequential Scale Out + AcrossScale Up
Key
Relational Non-relational
On-premisesCloud
Microsoft has solutions covering
and connecting all four
quadrants – that’s why SQL
Server is one of the most utilized
databases in the world
Azure SQL Database
SQL Server in Azure VM

 VM hosted on Microsoft Azure Infrastructure (“IaaS”)
• From Microsoft images (gallery) or your own images (custom)
SQL 2008R2 / 2012 / 2014 / 2016 / 2017 Web / Standard / Enterprise
Images refreshed with latest version, SP, CU
• Windows Server 2008 R2 / 2012 R2 / 2016, Linux RHEL / Ubuntu
• Fast provisioning (~10 minutes).
• Accessible via RDP and Powershell
• Full compatibility with SQL Server “Box” software
 Pay per use
• Per minute (only when running)
• Cost depends on size and licensing
• EA customers can use existing SQL licenses (BYOL)
• Network: only outgoing (not incoming)
• Storage: only used (not allocated)
 Elasticity
• 1 core / 2 GB mem / 1 TB   128 cores / 3.5 TB mem / 256 TB

Azure SQL Database
A relational database-as-a-service (“PaaS”), fully managed by Microsoft.
For cloud-designed apps when near-zero administration and enterprise-grade capabilities are key.
Perfect for organizations looking to dramatically increase the DB:IT ratio.

Azure SQL Database Managed Instance
Managed Instance
Instance scoped programming model with
high compatibility to on-premises databases
Single
Standalone managed database best for
predictable and stable workloads
Elastic pool
Shared resource model best for greater
efficiency through multi-tenancy
Best for modernization at
scale with low cost and effort

Supports compatibility modes (SQL Server 2005+), Instance sizes up to 8TB
Security
• TDE
• SQL Audit
• Row level security
• Always Encrypted

Scalable High performanceReliable and available
Adapts on-demand to your workload's needs, auto-scaling up to 100TB per database.
100
TB

Programming
Model
General
Purpose
Business
Critical
Hyperscale Elastic Pools
Instance (MI) GA, 8TB GA, 4TB Private Preview,
100TB
April private
preview
Database
(Single)
GA, 4TB GA, 4TB Public Preview,
100TB
GA

More choices and full integration into Azure’s ecosystem and services
Managed community
MySQL, PostgreSQL,
and MariaDB
Scale in seconds with
built-in high availability
Secure and compliantLanguages and
frameworks of your choice
Industry-leading
global reach
AZURE DATABASE SERVICES FOR
MYSQL, POSTGRESQL, AND MARIADB
Easy Lift and Shift Enterprise Ready
My

SMP vs MPP
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• SQL Server implementations traditionally have been SMP
• Mostly, the solution is housed on a shared storage
SMP - Symmetric
Multiprocessing

Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Support your smallest to your largest data storage needs while handling queries up to 100x faster.

AZURE RELATIONAL DATABASE PLATFORM
PowerBI,AppServices,DataFactory,
Analytics,ML,Cognitive,Bot…
Global Azure with 54 Regions
Azure Compute
SQL Data
Warehouse
Azure Storage
SQL Database MariaDBPostgreSQL
Flexible: On-demand scaling, Resource governance
Trusted: HA/DR, Backup/Restore, Security, Audit, Isolation
Intelligent: Advisors, Tuning, Monitoring
Database
Services
Platform
MySQL

Azure Database Migration Service (DMS)
A seamless, end-to-end solution for moving on-premises SQL Server, Oracle, and other relational
databases to the cloud.
Azure Database Migration Guide
https://datamigration.microsoft.com/

Relational and non-relational defined
Relational databases (RDBMS, SQL Databases)
• Example: Microsoft SQL Server, Oracle Database, IBM DB2
• Mostly used in large enterprise scenarios
• Analytical RDBMS (OLAP, MPP) solutions are SQL DW, Redshift, Teradata, Netezza
Non-relational databases (NoSQL databases)
• Example: Azure Cosmos DB, MongoDB, Cassandra
• Four categories: Key-value stores, Wide-column stores, Document stores and Graph stores
Hadoop: Made up of Hadoop Distributed File System (HDFS), YARN and MapReduce (Ideal for data lake)
OLTP vs OLAP/DW
SMP vs MPP

A globally distributed, massively scalable, multi-model database service
Column-family
Document
Graph
Turnkey global distribution
Elastic scale out
of storage & throughput
Guaranteed low latency at the 99th percentile
Comprehensive SLAs
Five well-defined consistency models
Table API
Key-value
Azure Cosmos DB
MongoDB API
Cassandra API

Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Azure Data Lake
Storage Gen1
SQL Server 2019 Big
Data Cluster
Azure Databricks
Azure HDInsight
PolyBase & Stored
Procedures
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
SQL Database (Single, MI,
HyperScale)
SQL Server in a VM
Cosmos DB
Power BI Aggregations

Blob Storage Data Lake Store
Azure Data Lake Storage Gen2
Large partner ecosystem
Global scale – All 50 regions
Durability options
Tiered - Hot/Cool/Archive
Cost Efficient
Built for Hadoop
Hierarchical namespace
ACLs, AAD and RBAC
Performance tuned for big data
Very high scale capacity and throughput
Large partner ecosystem
Global scale – All 50 regions
Durability options
Tiered - Hot/Cool/Archive
Cost Efficient
Built for Hadoop
Hierarchical namespace
ACLs, AAD and RBAC
Performance tuned for big data
Very high scale capacity and throughput

LRS
Multiple replicas across
a datacenter
Protect against disk,
node, rack failures
Write is ack’d when all
replicas are committed
Superior to dual-parity
RAID
11 9s of durability
SLA: 99.9%
GRS
Multiple replicas across each
of 2 regions
Protects against major
regional disasters
Asynchronous to secondary
16 9s of durability
SLA: 99.9%
RA-GRS
GRS + Read access to secondary
Separate secondary endpoint
RPO delay to secondary can be
queried
SLA: 99.99% (read), 99.9% (write)
Zone 1
ZRS
Replicas across 3 Zones
Protect against disk, node, rack and
zone failures
Synchronous writes to all 3 zones
12 9s of durability
Available in 8 regions
SLA: 99.9%
Zone 2 Zone 3

Caching Layer (Avere tech)
Extra Hot Tier - Premium (SSD + NVME)
Hot Tier (HDD)
Cool Tier (HDD)
Cooler Tier
Archive Tier
Deep Storage Tier (Glass, DNA, etc.)
Analytics Engines
(Hadoop, Spark, SCOPE …)
High Performance
Compute
AI / ML
Current Future
Edge
Azure File
Sync
Azure Backup
Data
Box
Data
Box
Edge
Azure
Stack
REST HDFS NFS SMB …
Automatic
Lifecycle
Management
Avere
FXT

File Sync
• Windows Srv <-> Azure
• Local caching
• With offline (Databox) can
'sync' remainder
Fuse
• Mount blobs as local FS
• Commit on write
• Linux
Site Replication
• On premise & cloud
• Windows, Linux
• Physical, virtual
• Hyper-V, VMWare
Network Acceleration
• Aspera
• Signiant
AZCopy
• Throughput +30%
• S3 to Azure Blobs
• Sync to cloud
• Hi Latency 10-100%
NetApp
• CloudSync
• SnapMirror
• SnapVault
Data Factory
• On premise & cloud sources
• Structured & unstructured
• Over 60 connectors
• UI design data flow
Partners
• Peer Global File Service
• Talon FAST
• Zerto
• …
Offline
• Data Box
• Data Box Heavy
• Data Box Disk
• Disk Import / Export
Fast Data Transfer
microsoft.com/en-us/garage/profiles/fast-data-transfer/

Data Box Heavy PREVIEW
• Capacity: 1 PB
• Weight 500+ lbs
• Secure, ruggedized
appliance
• Preview September 2018
• Same service as Data Box,
but targeted to petabyte-
sized datasets.
Data Box Gateway PREVIEW
• Virtual device provisioned in
your hypervisor
• Supports storage gateway,
SMB, NFS, Azure blob, files
• Preview: September 2018
• Virtual network transfer
appliance (VM), runs on your
choice of hardware.
Data Box Edge PREVIEW
• Local Cache Capacity: ~12 TB
• Includes Data Box Gateway
and Azure IoT Edge.
• Preview: September 2018
• Data Box Edge manages
uploads to Azure and can
pre-process data prior to
upload.
Data Box
• Capacity: 100 TB
• Weight: ~50 lbs
• Secure, ruggedized
appliance
• GA September 2018
• Data Box enables bulk
migration to Azure when
network isn’t an option.
Data Box DiskPREVIEW
• Capacity: 8TB ea.; 40TB/order
• Secure, ruggedized USB
drives orderable in packs of
5 (up to 40TB).
• Currently in Preview
• Perfect for projects that
require a smaller form factor,
e.g., autonomous vehicles.
Order Fill UploadSend Return Cloud to Edge Edge to Cloud Pre-processing ML Inferencing
Network Data Transfer Edge Compute
Offline Data Transfer Online Data Transfer

Exactly what is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
• Inexpensively store unlimited data
• Collect all data “just in case”
• Store data with no modeling – “Schema on read”
• Complements EDW
• Frees up expensive EDW resources
• Quick user access to data
• ETL Hadoop tools
• Easily scalable
• Place to move older data (archive)
• Place to backup data to

Needs data governance so your data lake does not turn
into a data swamp!

Objectives
 Plan the structure based on optimal data retrieval
 Avoid a chaotic, unorganized data swamp
Data Retention Policy
Temporary data
Permanent data
Applicable period (ex: project lifetime)
etc…
Business Impact / Criticality
High (HBI)
Medium (MBI)
Low (LBI)
etc…
Confidential Classification
Public information
Internal use only
Supplier/partner confidential
Personally identifiable information (PII)
Sensitive – financial
Sensitive – intellectual property
etc…
Probability of Data Access
Recent/current data
Historical data
etc…
Owner / Steward / SME
Subject Area
Security Boundaries
Department
Business unit
etc…
Time Partitioning
Year/Month/Day/Hour/Minute
Downstream App/Purpose
Common ways to organize the data:

Data Warehouse
Serving, Security & Compliance
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc query
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Easily create reports (Self-service BI)
• Know questions

What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)

Azure
HDInsight
Hadoop and Spark
as a Service on Azure
Fully-managed Hadoop and Spark
for the cloud
100% Open Source Hortonworks
data platform
Clusters up and running in minutes
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

Hortonworks Data Platform (HDP) 3.0
Simply put, Hortonworks ties all the open source products together (20)
(under the covers of HDInsight 4.0 – public preview)

Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
…
Compute pool
SQL Compute
Node
IoT data
Directly
read from
HDFS
Persistent storage
…
Storage pool
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps BI
SQL Server
master instance
Node Node Node Node Node Node Node
SQL
Data mart
SQL Data
Node
SQL Data
Node
Compute pool
SQL Compute
Node
Storage Storage

© Microsoft Corporation
Machine learning and AI portfolio
What engines do you want to use?
Deployment target
Which experience do you want?
Build your own or consume
pre-trained models?
Microsoft ML &
AI products
Build your own
Azure Machine
Learning
Code first
(On-prem)
ML Server
On-prem
Hadoop
SQL Server
(cloud)
BYOT
SQL Server Hadoop Azure Batch DSVM Spark
Visual tooling
(cloud)
AML Studio
Consume
Cognitive
services, bots
Spark ML,
SparkR, SparklyR
Notebooks Jobs
Azure Databricks
Spark
When to use what

© Microsoft Corporation
Advanced analytics pattern in Azure
Azure Data
Lake store
Azure
Storage
HDInsightAzure Databricks
Azure ML
Services
ML server
Model training
Long-term storage Data processing
Azure Data
Lake Analytics
Azure ML
Studio
SQL Server
(in-database ML)
Azure Databricks
(Spark ML)
Data Science
VM
Cosmos DB
Serving storage
SQL DB
SQL DW
Azure Analysis
Services
Cosmos DB
Batch AI
SQL DB
Azure Data
Factory
Orchestration
Azure Container
Service
Trained model hosting
SQL Server
(in-database ML)
Data collection and understanding, modeling, and deployment
Sensors and IoT
(unstructured)
Logs, files, and media
(unstructured)
Business/custom apps
(structured)
Applications
Dashboards
Power BI

Artificial Intelligence Decision Tree
Big Data Decision Tree v4
Business Intelligence Solutions Decision Tree

Q & A ?
James Serra, Big Data Evangelist
Email me at: jamesserra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)

INGEST STORE PREP & TRAIN MODEL & SERVE
C L O U D D A T A W A R E H O U S E
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
Media (unstructured)
Files (unstructured)
PolyBase
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI

M O D E R N D A T A W A R E H O U S E
Logs (unstructured)
Azure Data Factory
Azure Databricks
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
PolyBase
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI

A D V A N C E D A N A L Y T I C S O N B I G D A T A
Cosmos DB
(structured)
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
PolyBase
SparkR
Azure Databricks
Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet
their unique needs.
Real-time apps

R E A L T I M E A N A L Y T I C S
Sensors and IoT
(unstructured)
Apache Kafka for
HDInsight
Cosmos DB
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory
Azure Databricks
Real-time apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to
tailor the above architecture to meet their unique needs.
PolyBase

INGEST STORE MODEL & SERVE
D A T A M A R T C O N S O L I D A T I O N
Azure Data Lake Store Gen2 Azure SQL
Data Warehouse
Azure Data Factory Azure Analysis
Services
Power BI
RDBMS data marts
Hadoop
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
PolyBase

H U B & S P O K E A R C H I T E C T U R E F O R B I
Azure SQL
Data Warehouse
PolyBase
(structured)
Power BI
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
Multiple Azure Analysis
Services instances
SQL
Multiple Azure SQL
Database instances
Data Marts
Data Cubes
Azure Databricks
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory

A U T O S C A L I N G D A T A W A R E H O U S E
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
Azure Analysis
Services
Azure Functions
(Auto-scaling)
(structured)
Logs (unstructured)
Azure SQL
Data Warehouse
PolyBase
Power BIAzure Data Lake Store Gen2Azure Data Factory
Azure Databricks

D A T A W A R E H O U S E M I G R A T I O N
Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
(structured)
Azure SQL Data
Warehouse
Logs (unstructured)
Azure Data Factory Azure Databricks
Azure Analysis
Services
Power BI
PolyBase

Azure data platform overview

More Related Content

What's hot

What's hot (20)

Similar to Azure data platform overview

Similar to Azure data platform overview (20)

More from James Serra

More from James Serra (19)

Recently uploaded

Recently uploaded (20)

Azure data platform overview

Editor's Notes