SlideShare a Scribd company logo
Introduction to Azure Data Lake
Introduction to
Azure Data Lake
Athens May 26, 2017
PresenterInfo
1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
Antonios
Chatzipavlis
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
SQLschool.gr
Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες
IT Professionals, DBAs, Developers, Information Workers αλλά και
απλούς χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : help@sqlschool.gr
• Articles about SQL Server
• SQL Server News
• SQL Nights
• Webcasts
• Downloads
• Resources
What we are doing here Follow us in socials
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQL School Greece group
SELECT KNOWLEDGE
FROM SQL SERVER
▪ Sign up for a free membership today at sqlpass.org.
▪ Linked In: http://www.sqlpass.org/linkedin
▪ Facebook: http://www.sqlpass.org/facebook
▪ Twitter: @SQLPASS
▪ PASS: http://www.sqlpass.org
PASSVirtualChapters
Data Lake Overview
What is Azure Data Lake?
“A single store of all data… ranging from
raw data (which implies exact copy of
source system data) to transformed data
which is used for various forms including
reporting, visualization, analytics, and
machine learning”
Built on Open-Source
Azure Ecosystem Integration
Azure Data Lake
• Data Lake Analytics
• HDInsight
• Data Lake Store
• Develop, debug, and optimize big data programs with ease
• Integrates seamlessly with your existing IT investments
• Store and analyze petabyte-size files and trillions of objects
• Affordable and cost effective
• Enterprise grade security, auditing, and support
What Azure Data Lake Offers?
Data Lakes vs Data Warehouses
DATA WAREHOUSE vs. DATA LAKE
Structured
Processed
DATA
Structured
Semi-structured
Unstructured
Raw
Schema-on-Write PROCESSING Schema-on-Read
Expensive for large data volumes STORAGE Designed for low-cost storage
Less Agile
Fixed configuration
AGILITY
Highly Agile
Configure and Reconfigure as needed
Mature SECURITY Maturing
Business Professionals USERS Data Scientists et. al.
Data Lake Store
• Enterprise-wide hyper-scale repository for big data analytic workloads.
- Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single
place for operational and exploratory analytics.
• Can be accessed from Hadoop (available with HDInsight cluster) using
the WebHDFS-compatible REST APIs.
• Specifically designed to enable analytics on the stored data and is tuned
for performance for data analytics scenarios.
• It includes, out of the box, all the enterprise-grade capabilities
- security, manageability, scalability, reliability, and availability
• Essential for real-world enterprise use cases.
What is Azure Data Lake Store?
Azure Data Lake Store vs Azure Blob Storage
AZURE DATA LAKE STORE vs. AZURE BLOB STORAGE
Optimized storage for big data
analytics workloads
PURPOSE
General purpose object store for a wide variety
of storage scenarios
Batch, interactive, streaming analytics
and machine learning data such as
log files, IoT data, click streams, large
datasets
USE CASES
Any type of text or binary data, such as
application back end, backup data, media
storage for streaming and general purpose
data
Data Lake Store account contains
folders, which in turn contains data
stored as files
KEY CONCEPTS
Storage account has containers, which in turn
has data in the form of blobs
Hierarchical file system STRUCTURE Object store with flat namespace
Based on Azure Active Directory
Identities
SECURITY
Based on shared secrets - Account Access
Keys and Shared Access Signature Keys.
Data Lake Analytics
• Is an on-demand analytics job service to simplify big data analytics.
• Focus on writing, running, and managing jobs rather than on
operating distributed infrastructure.
• Can handle jobs of any scale instantly by setting the dial for how much
power you need.
• You only pay for your job when it is running, making it cost-effective.
• The analytics service supports Azure Active Directory letting you
manage access and roles, integrated with your on-premises identity
system.
What is Azure Data Lake Analytics?
• Dynamic scaling
• Develop faster, debug, and optimize smarter using
familiar tools
• U-SQL: simple and familiar, powerful, and extensible
• Integrates seamlessly with your IT investments
• Affordable and cost effective
• Works with all your Azure Data
Azure Data Lake Analytics Key Capabilities
HDInsight
- A only fully-managed cloud Apache Hadoop offering
- Provides optimized open-source analytic clusters for
- Spark,
- Hive,
- MapReduce,
- HBase,
- Storm,
- Kafka,
- Microsoft R Server
- Provides a 99.9% SLA
- Deploy these big data technologies and ISV applications
as managed clusters with enterprise-level security and
monitoring.
What is Azure
HDInsight?
U-SQL
Is the new big data query language of
the Azure Data Lake Analytics service
It evolved out of Microsoft's internal Big
Data language called
SCOPE : Easy and Efficient Parallel
Processing of Massive Data Sets
by Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren
Shakib, Simon Weaver, Jingren Zhou
http://www.vldb.org/pvldb/1/1454166.pdf
What is U-SQL?
– a familiar SQL-like declarative
language
– with the extensibility and
programmability provided by C# types
and the C# expression language
– and big data processing concepts such
as “schema on reads”, custom
processors and reducers.
U-SQL
combines
– Azure Data Lake Storage,
– Azure Blob Storage,
– Azure SQL DB, Azure SQL Data
Warehouse,
– SQL Server instances running in
Azure VMs.
Provides the
ability to query
and combine
data from a
variety of data
sources
– Its keywords such as SELECT have to be
in UPPERCASE.
– Its expression language inside SELECT
clauses, WHERE predicates etc is C#.
– This for example means, that the
comparison operations inside a predicate
follow C# syntax (e.g., a == "foo"),
– and that the language uses C# null
semantics which is 2-valued and not 3-
valued as in ANSI SQL.
It’s NOT
ANSI SQL
• Azure Data Lake Analytics provides U-SQL for batch processing.
• U-SQL is written and executed in form of a batch script.
• U-SQL also supports data definition statements such as CREATE
TABLE to create metadata artifacts either in separate scripts or
sometimes even in combination with the transformation scripts.
• U-SQL Scripts can be submitted in a variety of ways.
- Directly from within the Azure Data Lake Tools for Visual Studio,
- From the Azure Portal
- Programmatically via the Azure Data Lake SDK job submission API
- Azure Powershell extension's job submission command
How does a U-SQL Script process Data?
It follows the following general processing pattern:
• Retrieve data from stored locations in rowset format
- Stored locations can be files that will be schematized on read with EXTRACT expressions
- Stored locations can be U-SQL tables that are stored in a schematized format
- Or can be tables provided by other data sources such as an Azure SQL database.
• Transform the rowset(s)
- Several transformations over the rowsets can be composed in a data flow format
• Store the transformed rowset data
- Store it in a file with an OUTPUT statement, or
- Store it in a U-SQL table with an INSERT statement
How does a U-SQL Script process Data?
DECLARE @in string = "/Samples/Data/SearchLog.tsv";
DECLARE @out string = "/output/result.tsv";
@searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string,
Duration int?, Urls string, ClickedUrls string
FROM @in USING Extractors.Tsv();
@rs1 = SELECT Start, Region, Duration FROM @searchlog WHERE Region == "en-gb";
@rs1 = SELECT Start, Region, Duration FROM @rs1
WHERE Start >= DateTime.Parse("2012/02/16");
OUTPUT @rs1
TO @out
USING Outputters.Tsv();
U-SQL Scripts
DEMO
– Create Data Lake Stores
– Create Data Lake Analytics accounts and
connect them to Data Lake Stores
– Import data into Azure Data Lake Stores
– Run U-SQL jobs in Azure Data Lake
Analytics
Ask your
Questions
☺
Thank you
SELECT KNOWLEDGE FROM SQL SERVER
Copyright © 2017 SQLschool.gr. All right reserved.
PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

More Related Content

What's hot

Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
HARIHARAN R
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Azure purview
Azure purviewAzure purview
Azure purview
Shafqat Turza
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
Sivakumar Ramar
 
Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
Dimko Zhluktenko
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Mark Kromer
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
slidedown1
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
David Giard
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
Sergio Zenatti Filho
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 

What's hot (20)

Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
 
Azure purview
Azure purviewAzure purview
Azure purview
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 

Viewers also liked

Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
Antonios Chatzipavlis
 
Introduction to azure document db
Introduction to azure document dbIntroduction to azure document db
Introduction to azure document db
Antonios Chatzipavlis
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on Azure
Antonios Chatzipavlis
 
Row level security
Row level securityRow level security
Row level security
Antonios Chatzipavlis
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
Antonios Chatzipavlis
 
Microsoft SQL Family and GDPR
Microsoft SQL Family and GDPRMicrosoft SQL Family and GDPR
Microsoft SQL Family and GDPR
Antonios Chatzipavlis
 
Dynamic data masking sql server 2016
Dynamic data masking sql server 2016Dynamic data masking sql server 2016
Dynamic data masking sql server 2016
Antonios Chatzipavlis
 
Live Query Statistics & Query Store in SQL Server 2016
Live Query Statistics & Query Store in SQL Server 2016Live Query Statistics & Query Store in SQL Server 2016
Live Query Statistics & Query Store in SQL Server 2016
Antonios Chatzipavlis
 
Introduction to sql database on azure
Introduction to sql database on azureIntroduction to sql database on azure
Introduction to sql database on azure
Antonios Chatzipavlis
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
Antonios Chatzipavlis
 

Viewers also liked (10)

Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
Introduction to azure document db
Introduction to azure document dbIntroduction to azure document db
Introduction to azure document db
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on Azure
 
Row level security
Row level securityRow level security
Row level security
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Microsoft SQL Family and GDPR
Microsoft SQL Family and GDPRMicrosoft SQL Family and GDPR
Microsoft SQL Family and GDPR
 
Dynamic data masking sql server 2016
Dynamic data masking sql server 2016Dynamic data masking sql server 2016
Dynamic data masking sql server 2016
 
Live Query Statistics & Query Store in SQL Server 2016
Live Query Statistics & Query Store in SQL Server 2016Live Query Statistics & Query Store in SQL Server 2016
Live Query Statistics & Query Store in SQL Server 2016
 
Introduction to sql database on azure
Introduction to sql database on azureIntroduction to sql database on azure
Introduction to sql database on azure
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 

Similar to Introduction to Azure Data Lake

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
Michael Rys
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake
Olga Zinkevych
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
DataConf
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
Shy Engelberg
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
Ruslan Drahomeretskyy
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
CCG
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Tech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL DatabasesTech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL Databases
Ralph Attard
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Charley Hanania
 

Similar to Introduction to Azure Data Lake (20)

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake Ai big dataconference_eugene_polonichko_azure data lake
Ai big dataconference_eugene_polonichko_azure data lake
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Tech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL DatabasesTech-Spark: Azure SQL Databases
Tech-Spark: Azure SQL Databases
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
 

More from Antonios Chatzipavlis

Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
Antonios Chatzipavlis
 
SQL server Backup Restore Revealed
SQL server Backup Restore RevealedSQL server Backup Restore Revealed
SQL server Backup Restore Revealed
Antonios Chatzipavlis
 
Migrate SQL Workloads to Azure
Migrate SQL Workloads to AzureMigrate SQL Workloads to Azure
Migrate SQL Workloads to Azure
Antonios Chatzipavlis
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
Antonios Chatzipavlis
 
Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019
Antonios Chatzipavlis
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)
Antonios Chatzipavlis
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
Antonios Chatzipavlis
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs
Antonios Chatzipavlis
 
Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns
Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
Antonios Chatzipavlis
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
Antonios Chatzipavlis
 
Sqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plansSqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plans
Antonios Chatzipavlis
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Antonios Chatzipavlis
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
Antonios Chatzipavlis
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Antonios Chatzipavlis
 
Auditing Data Access in SQL Server
Auditing Data Access in SQL ServerAuditing Data Access in SQL Server
Auditing Data Access in SQL Server
Antonios Chatzipavlis
 
Stretch db sql server 2016 (sn0028)
Stretch db   sql server 2016 (sn0028)Stretch db   sql server 2016 (sn0028)
Stretch db sql server 2016 (sn0028)
Antonios Chatzipavlis
 
Troubleshooting sql server
Troubleshooting sql serverTroubleshooting sql server
Troubleshooting sql server
Antonios Chatzipavlis
 

More from Antonios Chatzipavlis (19)

Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
SQL server Backup Restore Revealed
SQL server Backup Restore RevealedSQL server Backup Restore Revealed
SQL server Backup Restore Revealed
 
Migrate SQL Workloads to Azure
Migrate SQL Workloads to AzureMigrate SQL Workloads to Azure
Migrate SQL Workloads to Azure
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
 
Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs
 
Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Sqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plansSqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plans
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
 
Auditing Data Access in SQL Server
Auditing Data Access in SQL ServerAuditing Data Access in SQL Server
Auditing Data Access in SQL Server
 
Stretch db sql server 2016 (sn0028)
Stretch db   sql server 2016 (sn0028)Stretch db   sql server 2016 (sn0028)
Stretch db sql server 2016 (sn0028)
 
Troubleshooting sql server
Troubleshooting sql serverTroubleshooting sql server
Troubleshooting sql server
 

Recently uploaded

Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 

Recently uploaded (20)

Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 

Introduction to Azure Data Lake

  • 2. Introduction to Azure Data Lake Athens May 26, 2017
  • 3. PresenterInfo 1982 I started working with computers 1988 I started my professional career in computers industry 1996 I started working with SQL Server 6.0 1998 I earned my first certification at Microsoft as Microsoft Certified Solution Developer (3rd in Greece) 1999 I started my career as Microsoft Certified Trainer (MCT) with more than 30.000 hours of training until now! 2010 I became for first time Microsoft MVP on Data Platform I created the SQL School Greece www.sqlschool.gr 2012 I became MCT Regional Lead by Microsoft Learning Program. 2013 I was certified as MCSE : Data Platform I was certified as MCSE : Business Intelligence 2016 I was certified as MCSE: Data Management & Analytics Antonios Chatzipavlis SQL Server Expert and Evangelist Data Platform MVP MCT, MCSE, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
  • 4. SQLschool.gr Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες IT Professionals, DBAs, Developers, Information Workers αλλά και απλούς χομπίστες που απλά τους αρέσει ο SQL Server. Help line : help@sqlschool.gr • Articles about SQL Server • SQL Server News • SQL Nights • Webcasts • Downloads • Resources What we are doing here Follow us in socials fb/sqlschoolgr fb/groups/sqlschool @antoniosch @sqlschool yt/c/SqlschoolGr SQL School Greece group SELECT KNOWLEDGE FROM SQL SERVER
  • 5. ▪ Sign up for a free membership today at sqlpass.org. ▪ Linked In: http://www.sqlpass.org/linkedin ▪ Facebook: http://www.sqlpass.org/facebook ▪ Twitter: @SQLPASS ▪ PASS: http://www.sqlpass.org
  • 8. What is Azure Data Lake? “A single store of all data… ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various forms including reporting, visualization, analytics, and machine learning”
  • 11. • Data Lake Analytics • HDInsight • Data Lake Store • Develop, debug, and optimize big data programs with ease • Integrates seamlessly with your existing IT investments • Store and analyze petabyte-size files and trillions of objects • Affordable and cost effective • Enterprise grade security, auditing, and support What Azure Data Lake Offers?
  • 12. Data Lakes vs Data Warehouses DATA WAREHOUSE vs. DATA LAKE Structured Processed DATA Structured Semi-structured Unstructured Raw Schema-on-Write PROCESSING Schema-on-Read Expensive for large data volumes STORAGE Designed for low-cost storage Less Agile Fixed configuration AGILITY Highly Agile Configure and Reconfigure as needed Mature SECURITY Maturing Business Professionals USERS Data Scientists et. al.
  • 14. • Enterprise-wide hyper-scale repository for big data analytic workloads. - Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics. • Can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs. • Specifically designed to enable analytics on the stored data and is tuned for performance for data analytics scenarios. • It includes, out of the box, all the enterprise-grade capabilities - security, manageability, scalability, reliability, and availability • Essential for real-world enterprise use cases. What is Azure Data Lake Store?
  • 15. Azure Data Lake Store vs Azure Blob Storage AZURE DATA LAKE STORE vs. AZURE BLOB STORAGE Optimized storage for big data analytics workloads PURPOSE General purpose object store for a wide variety of storage scenarios Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets USE CASES Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data Data Lake Store account contains folders, which in turn contains data stored as files KEY CONCEPTS Storage account has containers, which in turn has data in the form of blobs Hierarchical file system STRUCTURE Object store with flat namespace Based on Azure Active Directory Identities SECURITY Based on shared secrets - Account Access Keys and Shared Access Signature Keys.
  • 17. • Is an on-demand analytics job service to simplify big data analytics. • Focus on writing, running, and managing jobs rather than on operating distributed infrastructure. • Can handle jobs of any scale instantly by setting the dial for how much power you need. • You only pay for your job when it is running, making it cost-effective. • The analytics service supports Azure Active Directory letting you manage access and roles, integrated with your on-premises identity system. What is Azure Data Lake Analytics?
  • 18. • Dynamic scaling • Develop faster, debug, and optimize smarter using familiar tools • U-SQL: simple and familiar, powerful, and extensible • Integrates seamlessly with your IT investments • Affordable and cost effective • Works with all your Azure Data Azure Data Lake Analytics Key Capabilities
  • 20. - A only fully-managed cloud Apache Hadoop offering - Provides optimized open-source analytic clusters for - Spark, - Hive, - MapReduce, - HBase, - Storm, - Kafka, - Microsoft R Server - Provides a 99.9% SLA - Deploy these big data technologies and ISV applications as managed clusters with enterprise-level security and monitoring. What is Azure HDInsight?
  • 21. U-SQL
  • 22. Is the new big data query language of the Azure Data Lake Analytics service It evolved out of Microsoft's internal Big Data language called SCOPE : Easy and Efficient Parallel Processing of Massive Data Sets by Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou http://www.vldb.org/pvldb/1/1454166.pdf What is U-SQL?
  • 23. – a familiar SQL-like declarative language – with the extensibility and programmability provided by C# types and the C# expression language – and big data processing concepts such as “schema on reads”, custom processors and reducers. U-SQL combines
  • 24. – Azure Data Lake Storage, – Azure Blob Storage, – Azure SQL DB, Azure SQL Data Warehouse, – SQL Server instances running in Azure VMs. Provides the ability to query and combine data from a variety of data sources
  • 25. – Its keywords such as SELECT have to be in UPPERCASE. – Its expression language inside SELECT clauses, WHERE predicates etc is C#. – This for example means, that the comparison operations inside a predicate follow C# syntax (e.g., a == "foo"), – and that the language uses C# null semantics which is 2-valued and not 3- valued as in ANSI SQL. It’s NOT ANSI SQL
  • 26. • Azure Data Lake Analytics provides U-SQL for batch processing. • U-SQL is written and executed in form of a batch script. • U-SQL also supports data definition statements such as CREATE TABLE to create metadata artifacts either in separate scripts or sometimes even in combination with the transformation scripts. • U-SQL Scripts can be submitted in a variety of ways. - Directly from within the Azure Data Lake Tools for Visual Studio, - From the Azure Portal - Programmatically via the Azure Data Lake SDK job submission API - Azure Powershell extension's job submission command How does a U-SQL Script process Data?
  • 27. It follows the following general processing pattern: • Retrieve data from stored locations in rowset format - Stored locations can be files that will be schematized on read with EXTRACT expressions - Stored locations can be U-SQL tables that are stored in a schematized format - Or can be tables provided by other data sources such as an Azure SQL database. • Transform the rowset(s) - Several transformations over the rowsets can be composed in a data flow format • Store the transformed rowset data - Store it in a file with an OUTPUT statement, or - Store it in a U-SQL table with an INSERT statement How does a U-SQL Script process Data?
  • 28. DECLARE @in string = "/Samples/Data/SearchLog.tsv"; DECLARE @out string = "/output/result.tsv"; @searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int?, Urls string, ClickedUrls string FROM @in USING Extractors.Tsv(); @rs1 = SELECT Start, Region, Duration FROM @searchlog WHERE Region == "en-gb"; @rs1 = SELECT Start, Region, Duration FROM @rs1 WHERE Start >= DateTime.Parse("2012/02/16"); OUTPUT @rs1 TO @out USING Outputters.Tsv(); U-SQL Scripts
  • 29. DEMO – Create Data Lake Stores – Create Data Lake Analytics accounts and connect them to Data Lake Stores – Import data into Azure Data Lake Stores – Run U-SQL jobs in Azure Data Lake Analytics
  • 32. SELECT KNOWLEDGE FROM SQL SERVER Copyright © 2017 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION