Microsoft cloud big data strategy

About Me
 Microsoft, Big Data Evangelist
 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”

Agenda
 Big data defined
 Microsoft big data solution
 Azure data lake

Big Data is changing
traditional data
warehousing
… data warehousing has reached the
most significant tipping point since
its inception. The biggest, possibly
most elaborate data management
system in IT is changing.
– Gartner, “The State of Data Warehousing”*
* Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)
Data sources
ETL
Data warehouse
BI and analytics

Big Data has new data characteristics
Data complexity: variety and velocity
Petabytes

Big Data is driving transformative changes
Traditional Big Data
Relational data
with highly modeled schema
All data
with schema agility
Specialized HW Commodity HW
Data
characteristics
Costs
Culture
Operational reporting
Focus on rear-view analysis
Experimentation leading
to intelligent action
With machine learning, graph, a/b testing

Big Data introduces new culture of experimentation
Understand customer patterns to
uncover cross-sell opportunities
Historical campaign
effectiveness
Generate year-end financial
reports
Financial monitoring with real-time
recommendations to increase revenue
Generate year-end financial
reports
Real-time product offers and
promotions based on behavior
Collect historical data on
equipment performance
Real-time monitoring to
identify proactive maintenance
Shipping features without
understanding success
Building successful features
correlating user action with
product experience

Action
Value
From data to decisions and actions

However, there are challenges to Big Data…
Obtaining skills
and capabilities
Determining how
to get value
Integrating with
existing IT investments
*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)

But, Microsoft has done it before
We needed to better leverage data and analytics to do
more experimentation
So we:
• Designed a data lake for everyone to put their data into
• Built tools approachable by any developer
• Created machine learning tools for collaborating
across large experiment models
Result:
• Across Microsoft, ten thousand developers doing
experimentation leading to better insights
• Leading to growth in our Microsoft businesses:
• Office productivity revenue (45%YoY)*
• Intelligent Cloud (100% YoY)*
• Bing search share doubles
2010 2011 2012 2013 2014 2015
Growth of data @ Microsoft
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
PetabytesExabytes
* Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast

Microsoft is now taking
everything we’ve
learned on this journey
and bringing it to our
customers
Technology. Cost. Culture.

Big Data as a cornerstone of Cortana Intelligence
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
SQL Data
Warehouse

CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology
Workload optimized,
managed clusters
Specific apps in a multi-
tenant form factor
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Hadoop Managed Hadoop Big Data as-a-service
Azure HDInsight
BIGDATA
STORAGE
BIGDATA
ANALYTICS
Bringing Big Data to everybody
Accelerate the pace of innovation through a state-of-the-art cloud platform
UserAdoption

Microsoft Big Data Portfolio
SQL Server Stretch
Business intelligence
Machine learning analytics
Insights
Azure SQL Database
SQL Server 2016
SQL Server 2016 Fast Track
Azure SQL DW
Azure Data Lake
DocumentDB
HDInsight
Hadoop
Analytics Platform System
Sequential Scale Out + AcrossScale Up
Key
Relational Non-relational
On-premisesCloud
Microsoft has solutions covering
and connecting all four
quadrants – that’s why SQL
Server is one of the most utilized
databases in the world
16

Azure
HDInsight
A Cloud Spark and
Hadoop service for the
Enterprise
Reliable with an industry leading SLA
Enterprise-grade security and monitoring
Productive platform for developers and
scientists
Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

Hortonworks Data Platform (HDP) 2.5
Simply put, Hortonworks ties all the open source products together (22)
(under the covers of HDInsight)

Azure
Data Lake Store
A No limits Data Lake that
powers Big Data Analytics
Petabyte size files and Trillions of objects
Scalable throughput for massively parallel
analytics
HDFS for the cloud
Always encrypted, role-based security &
auditing
Enterprise-grade support

Azure
Data Lake Analytics
A No limits Analytics Job
Service to power intelligent
action
Start in seconds, scale instantly, pay per job
Develop massively parallel programs with
simplicity
Debug and optimize your big data programs
with ease
Virtualize your analytics
Enterprise-grade security, auditing and
support

Azure Data Lake
YARN
U-SQL
Analytics HDInsight
Hive R Server
HDFS
Store
Store and analyze data of any kind and size
Develop faster, debug and optimize smarter
Interactively explore patterns in your data
No learning curve
Managed and supported
Dynamically scales to match your business
priorities
Enterprise-grade security
Built on YARN, designed for the cloud

Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Integrated with on-premises and cloud assets.
Simple compute & storage billing
Pay for what you need
High performance without rewriting
applications
Low cost for latent data
Infrastructure, management and
support provided
Scales to petabytes of data with MPP processing
Resize compute nodes < 1 minute
Faster time to insight than other SMP offering
Designed for “on-demand” workload
Integrated with Azure platform and
other Microsoft services
Enables hybrid solutions
Built on SQL Server experience &
technology

PolyBase
Query relational and non-relational data with T-SQL
By preview early this year PolyBase will support Teradata, Oracle,
SQL Server, MongoDB, Hadoop and Azure blob storage

Publish-subscribe data
distribution
Managed PaaS (Platform
as a Service) solution
Scales with your needs to
millions of events per
second
Provides a durable buffer
between event publishers
and event consumers
Azure Event Hubs

Azure Stream Analytics
Process real-time data in Azure
Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure,
and applications
Performs time-sensitive analysis using SQL-like language against multiple real-time streams and
reference data
Outputs to persistent stores, dashboards or back to devices
Point of
Service Devices
Self Checkout
Stations
Kiosks
Smart
Phones
Slates/
Tablets
PCs/
Laptops
Servers
Digital
Signs
Diagnostic
EquipmentRemote Medical
Monitors
Logic
Controllers
Specialized
DevicesThin
Clients
Handhelds
Security
POS
Terminals
Automation
Devices
Vending
Machines
Kinect
ATM

Azure Machine Learning
Get started with just a browser
Requires no provisioning; simply log
on to your Azure subscription or try
it for free off azure.com/ml
Experience the power of choice
Choose from hundreds of algorithms
and packages from R and Python or
drop in your own custom code
Take advantage of business-tested
algorithms from Xbox and Bing
Deploy solutions in minutes
With the click of a button, deploy
the finished model as a web service
that can connect to any data,
anywhere
Connect to the world
Brand and monetize solutions on
our global Machine Learning
Marketplace
https://datamarket.azure.com/
Beyond business intelligence – machine intelligence
Microsoft Azure
Machine Learning Studio
Modeling environment (shown)
Microsoft Azure
Machine Learning API service
Model in production as a web service
Microsoft Azure
Machine Learning Marketplace
APIs and solutions for broad use

Enable enterprise-wide self-service data source registration and discovery
A metadata repository that allow users to register, enrich,
understand, discover, and consume data sources
Delivers differentiated value though
‒ Data source discovery; rather than data discovery
‒ Support for data from any source; Structured and
unstructured, on premises and in the cloud
‒ Publishing, discovery and consumption through any tool
‒ Annotation crowdsourcing: empowering any user to
capture and share their knowledge.
This, while allowing IT to maintain control and oversight

Azure Data Factory
Connect to relational or non-
relational data that is on-
premises or in the cloud
Orchestrate data movement &
data processing
Publish to Power BI users as a
searchable data view
Operationalize (schedule,
manage, debug) workflows
Lifecycle management,
monitoring
Orchestrate trusted information production in Azure
Microsoft Confidential – Under Strict NDA
C#
MapReduce
Hive
Pig
Stored Procedures
Azure Machine Learning

Discovery & exploration –
Custom visualizations—
R integration -

146.03K145.84K145.96K146.06K 40.08K38.84K39.99K40.33K

Microsoft
Cognitive
Services
Give your apps
a human side
Cognitive Services API Collection

Azure Analysis Services
Azure Analysis Services is based on the proven analytics engine that has helped
organizations turn complex data into a trusted, single source of truth for years.
Built for
hybrid data
Access and model
data on-premises,
in the cloud, or both
Interactive
visualization
Quick, highly interactive
self-service data discovery
with support of major
data visualization tools
Proven
technology
Powerful, proven tabular
models built from SQL Server
2016 Analysis Services
Cloud
powered
Easy to deploy, scale, and
manage as a platform-as-
a-service solution

SQL Server
R Services
Linux
Hadoop Teradata
Windows
CommercialCommunity
R ServerR Open

Fully managed database service
built on a native JSON data model
Application controlled schema with
massive scale-out enables iterative
development and evolving data models
Automatic indexing enables robust
querying over schema-free data
Integrated transactional JavaScript
processing + tunable consistency enable
high performance application
experiences
Azure DocumentDB

SQL Server on Linux
(Preview today, GA in
mid-2017)
Red Hat - Microsoft
Partnership
(Nov 2015)
Microsoft joins Eclipse
Foundation (Mar 2016).
HD Insight PaaS on
Linux GA (Sep 2015)
C:Usersmarkhill>
root@localhost: #
bash
Azure Marketplace 60% of all images in
Azure Marketplace
are based on
Linux/OSS
In partnership with the Linux
Foundation, Microsoft releases the
Microsoft Certified Solutions Associate
(MCSA) Linux on Azure certification.
493,141,677 ?????? Microsoft Open Source Hub
Ross Gardler: President Apache Software
Foundation
Wim Coekaerts: Oracle’s Mr Linux
1 out of 4 VMs on Azure runs
Linux, and getting larger every
day
• 28.9% of All VMs are Linux
• >50% of new VMs

Azure Data Lake
Big Data made easy
Analytics on any data,
any size
Easier and more
productive for all users Enterprise-ready

Petabyte size files and
Trillions of objects • Store data in it’s native format
• PB sized files, 200x larger than
anyone else
• Scalable throughput for
massively parallel analytics
• No need to redesign
application or reparation data
at higher scale
TBs
EBs
Store

Any type
of analytics
• Batch, interactive, streaming,
machine learning
• Allows for exploratory analytics
over data
• Analyze with Hadoop and
Microsoft solutions
Cortana Intelligence Suite
YARN
U-SQL
Analytics HDInsight
HDFS
Store
Hive R Server

Start in seconds, Scale
instantly, Pay per job
with Analytics
• Process big data jobs in 30
seconds
• No infrastructure to worry
about (no servers, no VMs, no
clusters)
• Instantly scale analytic units up
or down (processing power)
• Architected for cloud scale and
performance
• Frees you up to focus only on
your business logic

Easy for administrators
to spin up quickly
• Deploy big data projects
in minutes
• No hardware to install,
tune, configure or deploy
• No infrastructure or
software to manage
• Scale to tens to thousands
of machines instantly

Debug and Optimize
your Big Data
programs with ease
• Deep integration with
Visual Studio, Visual Studio
Code, Eclipse, & IntelliJ
• Easy for novices to write
simple queries
• Integrated with U-SQL,
Hive, Storm, and Spark
• Actively offers recommendations
to improve performance and
reduce cost
• Playback visually displays job run

Develop massively
parallel programs with
simplicity
• U-SQL: a simple
and powerful language that’s
familiar and easily extensible
• Unifies the declarative
nature of SQL with expressive
power of C#
• Leverage existing libraries in
.NET languages, R and Python
• Massively parallelize code on
diverse workloads (ETL, ML,
image tagging, facial detection)

Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store
Benefits
• Avoid moving large amounts of data across the
network between stores (federated query/logical data
warehouse)
• Single view of data irrespective of physical location
• Minimize data proliferation issues caused by
maintaining multiple copies
• Single query language for all data
• Each data store maintains its own sovereignty
• Design choices based on the need
• Push SQL expressions to remote SQL sources
• Filters
• Joins
U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage

Easy for data scientists
with familiar R language
R Server for HDInsight
• Largest portable R parallel
analytics library
• Terabyte-scale machine
learning—1,000x larger than
in open source R
• Up to 100x faster performance
using Spark and optimized
vector/math libraries
• Enterprise-grade security
and support
*Applies to HDInsight only

Highest availability
guarantee in the industry
for peace of mind
• Managed, monitored and
supported by Microsoft
• Enterprise-leading SLA—
99.9% uptime
• No IT resources needed for
upgrades and patching
• Microsoft monitors your
deployment so you don’t
have to
99.9% SLA

Azure Regions
38 Regions Worldwide, 32 Generally Available
 100+ datacenters
 Top 3 networks in the world
 2.5x AWS, 7x Google DC Regions
 G Series – Largest VM in World, 32 cores, 448GB Ram, SSD…

Always encrypted,
Role-based security
& Auditing
• Always encrypted; in motion
using SSL, and at rest using keys
in Azure Key Vault
• Single sign-on, multi-factor
authentication and seamless
integration of on-premises
identities with Active Directory
• Fine-grained POSIX-based ACLs
for role-based access controls
• Auditing every access /
configuration change

Lower total cost
of ownership
• No hardware
• Hadoop support included with
Azure support
• Pay only for what you use
• Independently scale storage
and compute
• No need to hire specialized
operations team
• 63% lower total cost of
ownership than on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud
with Microsoft Azure HDInsight”

Recognized by
top analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.

Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted under the “Presentations” tab)

Microsoft cloud big data strategy

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Microsoft cloud big data strategy

Similar to Microsoft cloud big data strategy (20)

More from James Serra

More from James Serra (17)

Recently uploaded

Recently uploaded (20)

Microsoft cloud big data strategy

Editor's Notes