Data virtualization using polybase

•Download as PPTX, PDF•

0 likes•380 views

This document provides an overview of using Polybase for data virtualization in SQL Server. It discusses installing and configuring Polybase, connecting external data sources like Azure Blob Storage and SQL Server, using Polybase DMVs for monitoring and troubleshooting, and techniques for optimizing performance like predicate pushdown and creating statistics on external tables. The presentation aims to explain how Polybase can be leveraged to virtually access and query external data using T-SQL without needing to know the physical data locations or move the data.

Recommended for you

Data Mesh Part 4 Monolith to Mesh

This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems. Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/) Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.

•by Jeffrey T. Pollock

data meshdata managementstreaming data

Data Lake Overview

The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.

•by James Serra

data lakeadls gen2modern data warehouse

Introduction to Vertica (Architecture & More)

LivePersonDev is happy to host this meetup with Zvika Gutkin, an Oracle and Vertica expert DBA in LivePerson, and specialist in BI and Big Data. At LivePerson, we handle enormous amounts of data. We use Vertica to analyse this data in real time. In this lecture Zvika will cover the following: 1. Present the architecture of Vertica 2. Compare row store to column store 3. Explain how Vertica achieve Fast query time 4. Show few use cases . 5. Explains what does Liveperson do with Vertica? Why we chose Vertica? 6. Talk about why we Love Vertica and Why we hate it . 7. Is Vertica SQL DB or NoSQL? Is vertica Consistent or Eventually consistent? 8. How Vertica differ from other SQL and noSQL technologies?

•by LivePerson

developerverticaliveperson

• Overview
• Installing and Configure Polybase
• Data Virtualization using Polybase
• DMVs and Polybase
• Performance and Troubleshooting
Presentation
Content

Proliferation of Data Platform technologies
What is Data Virtualization?
What is Polybase?
Data Virtualization using Polybase
Overview

Connect / Explore / Learn
Proliferation of Data Platform technologies
Massive increasing
amount of data
The Problem Technologies RDBMS

Connect / Explore / Learn
A modern take on the classic problem of ETL.
Data appears to come from one source system while under the
covers defining links to where the data really lives.
End user or analyst:
• Can read this data using one SQL dialect.
• Join with structured data sets from different systems without needing to
know the source of each data set.
• No dependencies from database developers to build in ETL flows to move
data from one system to the next.
What is
Data Virtualization?

Recommended for you

Introduction to Azure Databricks

Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.

•by James Serra

azure databricks

Google BigQuery

Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)

•by Matthias Feys

googlegoogle cloud platformbigquery

국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장

This document discusses how a construction equipment company in Korea implemented machine learning and cloud technologies. It provides an overview of the company's existing IoT system for remotely monitoring construction equipment. It then describes a project to build an analytics environment using AWS services like SageMaker to enable data-driven decision making. This would allow for tasks like demand forecasting of equipment parts using time series models in SageMaker. The document shares the architecture designed, example SageMaker code, and discusses other potential use cases like abnormal detection that could leverage AWS machine learning services.

•by BESPIN GLOBAL

aiai교육ai사례

Connect / Explore / Learn
Polybase has been available since 2010.
General Available in SQL Server 2016.
Polybase purpose was to integrate SQL Server with Hadoop by allowing us to
run MapReduce jobs against a remote Hadoop cluster and bringing the
results back into SQL Server reducing the computational burden on our
relatively more expensive SQL Server instances.
PolyBase in SQL Server 2019 has grown and adapted to this era of data
virtualization and gives us the ability to integrate with a variety of source
systems like Hadoop cluster, Azure Blob Storage, other SQL Server instances,
Oracle database, Teradata, MongoDB, Cosmos DB, an Apache Spark cluster,
Apache Hive tables, and even Microsoft Excel.
The best part of it is that developers need only T-SQL.
PolyBase is no panacea, and there are trade-offs compared to storing all data
natively in one source system, particularly around performance.
What is Polybase?

Feature Selection
Polybase Configuration
Java Installation
Polybase Services and Accounts
Firewall Rules
Data Virtualization using Polybase
Installing and Configure Polybase

Connect / Explore / Learn
Feature Selection

Connect / Explore / Learn
Polybase
Configuration
Scale-out group rules
Each machine hosting SQL Server must be part of the same Active Directory domain.
You must use the same Active Directory service account for each installation of the
PolyBase Engine and PolyBase Data Movement services.
Each machine hosting SQL Server must be able to communicate with all other Scale-Out
Group members in close physical proximity and on the same network, avoiding
geographically distributed servers and communications through the Internet.
Each SQL Server instance must be running the same major version of SQL Server
PolyBase services are machine-level rather than instance-level services.

Recommended for you

Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019

Lake Formation, 데이터레이크 관리와 운영을 하나로 - 이재성(플레이오토) Lake Formation 신규 서비스를 활용하여 기존에 데이터레이크를 구축하는 것에 비해 얼마나 효율적인지 소개 합니다.

•by AWSKRUG - AWS한국사용자모임

awscommunityday2019

Data Visualization Tools in Python

Overview of tools available in python for performing data visualization (statistical, geographical, reporting, etc). Prepared for Minsk DataViz Day (October 4, 2017)

•by Roman Merkulov

datavisualizationdatascience

Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash

Version 7 of the Elastic Stack adds powerful new features to the popular open source platform for search, logging, and analytics. Come hear directly from Elastic engineers and architecture team members on powerful new additions like GIS functionality and frozen-tier search. Plus, hear about the full range of orchestration options for getting the most out of your deployments, however and wherever you choose to run them. This session is sponsored by Elastic.

•by Amazon Web Services

dc-summit-2019

Connect / Explore / Learn
Java Installation

Connect / Explore / Learn
Polybase Services
and Accounts

Connect / Explore / Learn
Setup Complete

Connect / Explore / Learn
Firewall Rules

Recommended for you

제조업의 AWS 기반 주요 워크로드 및 고객 사례:: 이현석::AWS Summit Seoul 2018

•by Amazon Web Services Korea

aws summit seoul 20180419

Data and AI reference architecture

This document discusses IBM's reference architecture for data and AI. It provides guidance on designing systems that use AI and analyze large amounts of data. The reference architecture covers strategies for collecting, storing, processing and analyzing data at large scales using technologies like Apache Spark, Hadoop and containers. It is intended to help organizations build systems that extract insights from data.

•by Willy Marroquin (WillyDevNET)

Netflix Data Pipeline With Kafka

This document summarizes Netflix's use of Kafka in their data pipeline. It discusses how Netflix evolved from using S3 and EMR to introducing Kafka and Kafka producers and consumers to handle 400 billion events per day. It covers challenges of scaling Kafka clusters and tuning Kafka clients and brokers. Finally, it outlines Netflix's roadmap which includes contributing to open source projects like Kafka and testing failure resilience.

•by Allen (Xiaozhong) Wang

awskafkadata pipeline

Connecting to Azure Blob Storage
Connecting to SQL Server
Data Virtualization using Polybase
Data Virtualization using Polybase

Recommended for you

KB국민은행은 시작했다 - 쉽고 빠른 클라우드 거버넌스 적용 전략 - 강병억 AWS 솔루션즈 아키텍트 / 장강홍 클라우드플랫폼단 차장, ...

클라우드 서비스를 사용하기 위한 안전성 확보 조치들을 다양한 워크로드가 추가될 경우에도 쉽고 빠르게 적용시킬 수 있는 다중 계정 기반의 클라우드 거버넌스 구성 전략을 소개해 드립니다. 그리고 KB국민은행에서는 어떻게 클라우드를 도입하게 되었으며 금융 회사에 클라우드를 도입하기 위해서 지켜야 하는 규제 사항들을 어떻게 대응하였지를 살펴보고, KB국민은행에서 구성한 클라우드 거버넌스 환경을 이용하여 클라우드 워크로드 확산을 어떻게 효과적으로 준비하고 있는지 살펴봅니다.

•by Amazon Web Services Korea

aws summit seoul 2021

Designing a modern data warehouse in azure

This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing, modeling and serving data on Azure. Finally, it discusses architectures like the lambda architecture and common data models.

•by Antonios Chatzipavlis

Azure Synapse Analytics

Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.

•by WinWire Technologies Inc

azure synapse analyticssynapse analyticsazure synapse

Connect / Explore / Learn
Polybase
Vs.
Linked Servers
PolyBase External Table Linked Server
Object scope
Database level, focusing on a
single table
Instance level
Operational
intent
Read-only Read and write
Scale-out Able to use Scale-Out Groups No scale-out capabilities
Expected data
size
Large tables with analytic
workloads
OLTP-style workloads
querying a small number of
rows

Metadata DMVs
Service and Node Resources DMVs
Data Movement Service DMVs
Troubleshooting Queries DMVs
Data Virtualization using Polybase
Polybase DMVs

Connect / Explore / Learn
use PolybaseDemo;
select * from
sys.external_data_sources;
select * from
sys.external_file_formats;
select * from
sys.external_tables;
go
Metadata DMVs

Connect / Explore / Learn
use master;
select * from
sys.dm_exec_compute_nodes;
select * from
sys.dm_exec_compute_node_status;
select * from
sys.dm_exec_compute_node_errors;
go
Service and
Node Resources
DMVs

Recommended for you

AWS Cloud 환경으로 DB Migration 전략 수립하기

BespinGlobal 컨설팅 본부 최정식 위원(js.choi@bespinglobal.com) 데이터 마이그레이션 세미나 - 데이터로 날자 Helping You Adopt Cloud | 가트너 선정 아시아 No.1 클라우드 MSP, 성공적인 클라우드 도입을 위한 전략, 구축, 운영 및 관리 서비스 제공

•by BESPIN GLOBAL

awsclouddb migration

Databricks Fundamentals

This document is a training presentation on Databricks fundamentals and the data lakehouse concept by Dalibor Wijas from November 2022. It introduces Wijas and his experience. It then discusses what Databricks is, why it is needed, what a data lakehouse is, how Databricks enables the data lakehouse concept using Apache Spark and Delta Lake. It also covers how Databricks supports data engineering, data warehousing, and offers tools for data ingestion, transformation, pipelines and more.

•by Dalibor Wijas

databrickssparkdatawarehouse

Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"

This document discusses MS SQL Server 2019's capabilities for big data processing through PolyBase and Big Data Clusters. PolyBase allows SQL queries to join data stored externally in sources like HDFS, Oracle and MongoDB. Big Data Clusters deploy SQL Server on Linux in Kubernetes containers with separate control, compute and storage planes to provide scalable analytics on large datasets. Examples of using these technologies include data virtualization across sources, building data lakes in HDFS, distributed data marts for analysis, and integrated AI/ML tasks on HDFS and SQL data.

•by Lviv Startup Club

Connect / Explore / Learn
use master;
select * from
sys.dm_exec_dms_services;
select * from
sys.dm_exec_dms_workers;
go
Data Movement
Service DMVs

Connect / Explore / Learn
use PolybaseDemo ;
select * from
sys.dm_exec_external_work;
select * from
sys.dm_exec_external_operations;
select * from
sys.dm_exec_distributed_requests;
select * from
sys.dm_exec_distributed_request_steps;
select * from
sys.dm_exec_distributed_sql_requests;
go
Troubleshooting
Queries DMVs

Statistics on External Tables
Predicate Pushdown
Polybase Log Files
Data Issues
Data Virtualization using Polybase
Performance and Troubleshooting

Recommended for you

Migrate SQL Workloads to Azure

Antonios Chatzipavlis presented on migrating SQL workloads to Azure. He discussed modernizing data platforms by discovering, assessing, planning, transforming, optimizing, testing and remediating. Key migration considerations include remaining, rehosting, refactoring, rearchitecting, rebuilding or replacing workloads. Tools for migrating data include Microsoft Assessment and Planning Toolkit, Data Migration Assistant, Database Experimentation Assistant, SQL Server Migration Assistant, and Azure Database Migration Service. Workloads can be migrated to Azure VMs, Azure SQL Databases or Azure SQL Managed Instances.

•by Antonios Chatzipavlis

antonios chatzipavlissqlschool.grazure

Exploring Microsoft Azure Infrastructures

This document provides an agenda and summary for a Data Analytics Meetup (DAM) on March 27, 2018. The agenda covers topics such as disruption opportunities in a changing data landscape, transitioning from traditional to modern BI architectures using Azure, Azure SQL Database vs Data Warehouse, data integration with Azure Data Factory and SSIS, Analysis Services, Power BI reporting, and a wrap-up. The document discusses challenges around data growth, digital transformation, and the shrinking time for companies to adapt to disruption. It provides overviews and comparisons of Azure SQL Database, Data Warehouse, and related Azure services to help modernize analytics architectures.

•by CCG

azuremicrosoftdata

44spotkaniePLSSUGWRO_CoNowegowKrainieChmur

Microsoft Azure zmienia się. Jego częśc poświęcona bazie danych (Windows Azure SQL Database) zmienia się jeszcze szybciej. Podczas tej sesji chciałbym pokazac tym, którzy nie widzieli, oraz przypomniec tym, którzy już coś wiedzą - o co chodzi z WASD, jakie zmiany nastapiły i czego możemy po tej bazie oczekiwać. Dla odważnych będzie okazja podłączenia się do konta w chmurze i przetestowania ych rozwiązań samemu.

•by Tobias Koprowski

sql serversql azuresesja

Connect / Explore / Learn
Statistics on
External Tables
• Fundamentally are the same as statistics on regular
tables
• Because data lives outside of SQL Server
 We cannot automatically create or maintain statistics against external
tables.
 We can create statistics from 100% of data (default) or from a sample of
data.
 Disk space needed during statistics creation because all the data from
external table streamed into temporary table.
• Performance Impact
 External statistics can make a difference when they help the optimizer
decide whether to push down a predicate or reorder joins to other tables,
not in full scans.

Connect / Explore / Learn
Predicate
Pushdown
• Pushdown computation improves the performance of
queries on external data sources.
• In SQL Server 2019 is available in Hadoop, Oracle,
Teradata, MongoDB, ODBC generic types, SQL Server.
• SQL Server allows the following basic expressions and
operators for predicate pushdown.
 Binary comparison operators (<, >, =, !=, <>, >=, <=) for numeric, date,
and time values.
 Arithmetic operators (+, -, *, /, %).
 Logical operators (AND, OR).
 Unary operators (NOT, IS NULL, IS NOT NULL).

Connect / Explore / Learn
Located at
%PROGRAMFILES%Microsoft SQL Server
MSSQL##.MSSQLSERVERMSSQLLogPolybase
Polybase Log Files

Recommended for you

Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)

Learn about SQL Server 2017 new features and upgrade options. This talk was delivered at MS Tech Summit in Toronto Dec. 2017

•by Bob Ward

sql serversqlsql server 2017

KoprowskiT_SQLSat230_Rheinland_SQLAzure-fromPlantoBackuptoCloud

This document provides an overview and summary of SQL Azure and cloud services from Red Gate. The document begins with an introduction to SQL Azure, including compatibility with different SQL Server versions, limitations, and security requirements. It then covers topics like database sizing, naming conventions, migration support, and using indexes. The document next discusses cloud services from Red Gate for backup, restore, and scheduling of SQL Azure databases. It concludes with some example links and a short demo. The overall summary discusses key capabilities and services for managing SQL Azure databases and backups in the cloud.

•by Tobias Koprowski

sql satrheinlandsql saturday

Taming the shrew Power BI

This document discusses techniques for optimizing Power BI performance. It recommends tracing queries using DAX Studio to identify slow queries and refresh times. Tracing tools like SQL Profiler and log files can provide insights into issues occurring in the data sources, Power BI layer, and across the network. Focusing on optimization by addressing wait times through a scientific process can help resolve long-term performance problems.

•by Kellyn Pot'Vin-Gorman

power bioptimizationmicrosoft

Connect / Explore / Learn
Data Issues
• Structural
• Unsupported characters
• Date formats
• Limitations
 The maximum possible row size (full length of variable length columns)
can't exceed 32 KB in SQL Server or 1 MB in Azure Synapse Analytics.
 Text-heavy columns might be limited.

Thank you!
Antonios Chatzipavlis
Data Solutions Consultant & Trainer
• @antoniosch
• @sqlschool
twitter
• ./sqlschoolgr
• ./groups/sqlschool
Facebook
• ./c/SqlschoolGr
YouTube
• SQLschool.gr
• Group
LinkedIn

Recommended for you

Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?

This webinar discusses polyglot persistence, which is the strategy of using multiple data storage technologies together to solve different data problems. It explains that while relational databases are good for transactions and consistency, NoSQL databases are better for scale and unstructured data. The webinar shows how to integrate SQL and NoSQL databases by routing requests based on data type or synchronizing data automatically between the databases. It provides an example architecture using a SQL database for legacy apps and reporting with a NoSQL database for mobile and web apps, and discusses benefits like scalability, accelerated development, and leveraging existing tools.

•by brianlangbecker

polyglot persistence nosql sql fatdb microsoft

SQL Saturday Redmond 2019 ETL Patterns in the Cloud

This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, scaling ETL in the cloud, handling flexible schemas, and using ADF for orchestration. Key points include staging data in low-cost storage before processing, using ADF's integration runtime to process data both on-premises and in the cloud, and building resilient data flows that can handle schema drift.

•by Mark Kromer

microsoft azure data factoryazureetl

Azure Data Factory ETL Patterns in the Cloud

This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, the importance of scale and flexible schemas in cloud ETL, and how Azure Data Factory supports workflows, templates, and integration with on-premises and cloud data. It also provides examples of nightly ETL data flows, handling schema drift, loading dimensional models, and data science scenarios using Azure data services.

•by Mark Kromer

microsoftazuredata factory

What's hot

NoSQL Graph Databases - Why, When and Where

Eugene Hanikblum

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...

Databricks

Many had dubbed 2020 as the decade of data. This is indeed an era of data zeitgeist. From code-centric software development 1.0, we are entering software development 2.0, a data-centric and data-driven approach, where data plays a central theme in our everyday lives. As the volume and variety of data garnered from myriad data sources continue to grow at an astronomical scale and as cloud computing offers cheap computing and data storage resources at scale, the data platforms have to match in their abilities to process, analyze, and visualize at scale and speed and with ease — this involves data paradigm shifts in processing and storing and in providing programming frameworks to developers to access and work with these data platforms. In this talk, we will survey some emerging technologies that address the challenges of data at scale, how these tools help data scientists and machine learning developers with their data tasks, why they scale, and how they facilitate the future data scientists to start quickly. In particular, we will examine in detail two open-source tools MLflow (for machine learning life cycle development) and Delta Lake (for reliable storage for structured and unstructured data). Other emerging tools such as Koalas help data scientists to do exploratory data analysis at scale in a language and framework they are familiar with as well as emerging data + AI trends in 2021. You will understand the challenges of machine learning model development at scale, why you need reliable and scalable storage, and what other open source tools are at your disposal to do data science and machine learning at scale.

AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...

James Serra

Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.

Data Mesh Part 4 Monolith to Mesh

Jeffrey T. Pollock

Data Lake Overview

James Serra

Introduction to Vertica (Architecture & More)

LivePerson

Introduction to Azure Databricks

James Serra

Google BigQuery

Matthias Feys

국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장

BESPIN GLOBAL

Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019

AWSKRUG - AWS한국사용자모임

Data Visualization Tools in Python

Roman Merkulov

Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash

Amazon Web Services

제조업의 AWS 기반 주요 워크로드 및 고객 사례:: 이현석::AWS Summit Seoul 2018

Amazon Web Services Korea

Data and AI reference architecture

Willy Marroquin (WillyDevNET)

Netflix Data Pipeline With Kafka

Allen (Xiaozhong) Wang

KB국민은행은 시작했다 - 쉽고 빠른 클라우드 거버넌스 적용 전략 - 강병억 AWS 솔루션즈 아키텍트 / 장강홍 클라우드플랫폼단 차장, ...

Amazon Web Services Korea

Designing a modern data warehouse in azure

Antonios Chatzipavlis

Azure Synapse Analytics

WinWire Technologies Inc

AWS Cloud 환경으로 DB Migration 전략 수립하기

BESPIN GLOBAL

Databricks Fundamentals

Dalibor Wijas

What's hot (20)

NoSQL Graph Databases - Why, When and Where

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...

AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...

Data Mesh Part 4 Monolith to Mesh

Data Lake Overview

Introduction to Vertica (Architecture & More)

Introduction to Azure Databricks

Google BigQuery

국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장

Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019

Data Visualization Tools in Python

Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash

제조업의 AWS 기반 주요 워크로드 및 고객 사례:: 이현석::AWS Summit Seoul 2018

Data and AI reference architecture

Netflix Data Pipeline With Kafka

KB국민은행은 시작했다 - 쉽고 빠른 클라우드 거버넌스 적용 전략 - 강병억 AWS 솔루션즈 아키텍트 / 장강홍 클라우드플랫폼단 차장, ...

Designing a modern data warehouse in azure

Azure Synapse Analytics

AWS Cloud 환경으로 DB Migration 전략 수립하기

Databricks Fundamentals

Similar to Data virtualization using polybase

Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"

Lviv Startup Club

Migrate SQL Workloads to Azure

Antonios Chatzipavlis

Exploring Microsoft Azure Infrastructures

CCG

44spotkaniePLSSUGWRO_CoNowegowKrainieChmur

Tobias Koprowski

Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)

Bob Ward

KoprowskiT_SQLSat230_Rheinland_SQLAzure-fromPlantoBackuptoCloud

Tobias Koprowski

Taming the shrew Power BI

Kellyn Pot'Vin-Gorman

Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?

brianlangbecker

SQL Saturday Redmond 2019 ETL Patterns in the Cloud

Mark Kromer

Azure Data Factory ETL Patterns in the Cloud

Mark Kromer

Unlocking big data with Hadoop + MySQL

Ricky Setyawan

Taming the shrew, Optimizing Power BI Options

Kellyn Pot'Vin-Gorman

This document provides tips for optimizing performance in Power BI by focusing on different areas like data sources, the data model, visuals, dashboards, and using trace and log files. Some key recommendations include filtering data early, keeping the data model and queries simple, limiting visual complexity, monitoring resource usage, and leveraging log files to identify specific waits and bottlenecks. An overall approach of focusing on time-based optimization by identifying and addressing the areas contributing most to latency is advocated.

Module_01_formation-PowerBI Desktop.pptx

seydi17

Introducing Azure SQL Data Warehouse

James Serra

The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.

Whitepaper tableau for-the-enterprise-0

alok khobragade

This document provides an overview of Tableau for IT managers, covering Tableau's architecture, deployment models, security features, scalability, and data strategy. Tableau has a client-server architecture that allows for highly scalable deployments from simple single-server configurations up to large enterprise clusters. It provides role-based security, data security through user filters, and network security including SSL encryption. Tableau is highly scalable and supports deployments from small teams up to thousands of users at large companies.

Azure - Data Platform

giventocode

This document summarizes key components of Microsoft Azure's data platform, including SQL Database, NoSQL options like Azure Tables, Blob Storage, and Azure Files. It provides an overview of each service, how they work, common use cases, and demos of creating resources and accessing data. The document is aimed at helping readers understand Azure's database and data storage options for building cloud applications.

Power BI with Essbase in the Oracle Cloud

Kellyn Pot'Vin-Gorman

This document discusses connecting Oracle Analytics Cloud (OAC) Essbase data to Microsoft Power BI. It provides an overview of Power BI and OAC, describes various methods for connecting the two including using a REST API and exporting data to Excel or CSV files, and demonstrates some visualization capabilities in Power BI including trends over time. Key lessons learned are that data can be accessed across tools through various connections, analytics concepts are often similar between tools, and while partnerships exist between Microsoft and Oracle, integration between specific products like Power BI and OAC is still limited.

Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)

Trivadis

Azure SQL Database Managed Instance

James Serra

Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.

2014.11.14 Data Opportunities with Azure

Marco Parenzan

Similar to Data virtualization using polybase (20)

Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"

Migrate SQL Workloads to Azure

Exploring Microsoft Azure Infrastructures

44spotkaniePLSSUGWRO_CoNowegowKrainieChmur

Brk2045 upgrade sql server 2017 (on prem, iaa-s and paas)

KoprowskiT_SQLSat230_Rheinland_SQLAzure-fromPlantoBackuptoCloud

Taming the shrew Power BI

Why does Microsoft care about NoSQL, SQL and Polyglot Persistence?

SQL Saturday Redmond 2019 ETL Patterns in the Cloud

Azure Data Factory ETL Patterns in the Cloud

Unlocking big data with Hadoop + MySQL

Taming the shrew, Optimizing Power BI Options

Module_01_formation-PowerBI Desktop.pptx

Introducing Azure SQL Data Warehouse

Whitepaper tableau for-the-enterprise-0

Azure - Data Platform

Power BI with Essbase in the Oracle Cloud

Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)

Azure SQL Database Managed Instance

2014.11.14 Data Opportunities with Azure

More from Antonios Chatzipavlis

SQL server Backup Restore Revealed

Antonios Chatzipavlis

Antonios Chatzipavlis presented on SQL Server backup and restore. The presentation covered database architecture basics including data files, transaction log files, and the buffer cache. It also discussed backup types like full, differential, transaction log, copy only and partial backups. Backup strategies and restore processes were explained, including restoring to a point in time and restoring system databases. The internals of how SQL Server performs backups using buffers and I/O threads was also summarized.

Machine Learning in SQL Server 2019

Antonios Chatzipavlis

Workload Management in SQL Server 2019

Antonios Chatzipavlis

This document summarizes a webinar presentation about workload management in SQL Server 2019. It discusses how SQL Server's Resource Governor feature can be used to provide multitenancy, predictable performance, and isolation for multiple workloads running on a single SQL Server instance. Key concepts covered include resource pools, workload groups, and classification functions to assign sessions to different pools and groups. The presentation also reviews best practices for using lookup tables in classification functions and shows some DMVs for monitoring Resource Governor configuration and statistics.

Loading Data into Azure SQL DW (Synapse Analytics)

Antonios Chatzipavlis

This document provides an overview of loading data into Azure SQL DW (Synapse Analytics). It discusses extracting source data into text files, landing the data into Azure Data Lake Store Gen2, preparing the data for loading into staging tables using PolyBase or COPY commands, transforming the data, and inserting it into production tables. It also compares ETL vs ELT approaches and SSIS vs Azure Data Factory for data integration. The presenter then demonstrates loading data in Synapse SQL pool and invites any questions.

Introduction to DAX Language

Antonios Chatzipavlis

The document provides an overview of the DAX language. It discusses that DAX is the programming language used in Power BI, Power Pivot, and Analysis Services for data modeling, reporting, and analytics. It describes the basic components of a DAX data model including tables, columns, relationships, measures, and hierarchies. It also covers DAX syntax, functions, operators, and how context and filter context work in DAX calculations and queries.

Building diagnostic queries using DMVs and DMFs

Antonios Chatzipavlis

The document introduces Diagnostic Management Views (DMVs) and Dynamic Management Functions (DMFs) in SQL Server. It discusses that DMVs and DMFs return server state information and can be used to monitor server health, diagnose problems, and tune performance. It provides examples of common DMVs and DMFs used for query execution and the query plan cache. Finally, it notes that the presentation will demonstrate troubleshooting with DMVs and DMFs.

Exploring T-SQL Anti-Patterns

Antonios Chatzipavlis

This document summarizes common T-SQL anti-patterns that can negatively impact query performance, including using SELECT *, functions in predicates, OR operators, implicit conversions, unnecessary sorts, correlated subqueries, and dynamic SQL execution. The presentation provides explanations of why each anti-pattern hurts performance and recommendations for more optimized alternatives such as using indexes, temporary tables, parameterization, and execution plan analysis.

Designing a modern data warehouse in azure

Antonios Chatzipavlis

This document discusses designing a modern data warehouse in Azure. It provides an overview of traditional vs. self-service data warehouses and their limitations. It also outlines challenges with current data warehouses around timeliness, flexibility, quality and findability. The document then discusses why organizations need a modern data warehouse based on criteria like customer experience, quality assurance and operational efficiency. It covers various approaches to ingesting, storing, preparing and modeling data in Azure. Finally, it discusses architectures like the lambda architecture and common data models.

Modernizing your database with SQL Server 2019

Antonios Chatzipavlis

Modernizing Your Database with SQL Server 2019 discusses SQL Server 2019 features that can help modernize a database, including: - The Hybrid Buffer Pool which supports persistent memory to improve performance on read-heavy workloads. - Memory-Optimized TempDB Metadata which stores TempDB metadata in memory-optimized tables to avoid certain blocking issues. - Intelligent Query Processing features like Adaptive Query Processing, Batch Mode processing on rowstores, and Scalar UDF Inlining which improve query performance. - Approximate Count Distinct, a new function that provides an estimated count of distinct values in a column faster than a precise count. - Lightweight profiling, enabled by default, which provides query plan

SQLServer Database Structures

Antonios Chatzipavlis

The document provides details about an SQL expert's background and certifications. It summarizes the expert's career starting in 1982 working with computers and 1988 starting in the computer industry. In 1996, they started working with SQL Server 6.0 and have since earned multiple Microsoft certifications. The expert now provides training and consultation services, and created an online school called SQL School Greece to teach SQL Server.

Sqlschool 2017 recap - 2018 plans

Antonios Chatzipavlis

Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018

Antonios Chatzipavlis

Azure SQL Database is a managed database service hosted in Microsoft's Azure cloud. Some key differences from SQL Server include: the service is paid by the hour based on the selected service tier; users can dynamically scale resources up or down; backups and high availability are managed by the service provider; and common administration tasks are handled by the provider rather than the user. The service offers automatic backups, point-in-time restore, and geo-restore capabilities along with built-in high availability through replication across three copies in the primary region.

Microsoft SQL Family and GDPR

Antonios Chatzipavlis

The document discusses technologies within the Microsoft SQL family and Azure SQL that can help organizations address requirements of the General Data Protection Regulation (GDPR). It covers features for discovering and classifying personal data, managing access and controlling how data is used, and protecting data through encryption, auditing and other security controls. Built-in technologies like dynamic data masking, row-level security, authentication options, and transparent data encryption are described as ways SQL Server and Azure SQL Database can help organizations comply with GDPR.

Statistics and Indexes Internals

Antonios Chatzipavlis

The document provides biographical information about Antonios Chatzipavlis, a SQL Server expert and evangelist. It then summarizes his presentation on statistics and index internals in SQL Server, which covers topics like cardinality estimation, inspecting and updating statistics, index structure and types, and identifying missing indexes. The presentation includes demonstrations of analyzing cardinality estimation and picking the right index key.

Introduction to Azure Data Lake

Antonios Chatzipavlis

This document provides an introduction and overview of Azure Data Lake. It describes Azure Data Lake as a single store of all data ranging from raw to processed that can be used for reporting, analytics and machine learning. It discusses key Azure Data Lake components like Data Lake Store, Data Lake Analytics, HDInsight and the U-SQL language. It compares Data Lakes to data warehouses and explains how Azure Data Lake Store, Analytics and U-SQL process and transform data at scale.

Azure SQL Data Warehouse

Antonios Chatzipavlis

This document provides an overview of Azure SQL Data Warehouse. It discusses what Azure SQL Data Warehouse is, how it is provisioned and scaled, best practices for designing tables in Azure SQL DW including distribution keys and data types, and methods for loading and querying data including PolyBase and labeling queries for monitoring. The presentation also covers tuning aspects like statistics, indexing, and resource classes.

Introduction to azure document db

Antonios Chatzipavlis

This document provides an introduction and overview of Azure DocumentDB. It discusses how DocumentDB is a fully managed NoSQL database service that provides fast and predictable performance for JSON data through SQL querying capabilities. It also describes how DocumentDB offers features like elastic scaling, high availability, global distribution and ease of development. The document then provides information on starting with DocumentDB, writing queries, and programming capabilities within DocumentDB like stored procedures and triggers.

Introduction to Machine Learning on Azure

Antonios Chatzipavlis

This document provides an introduction and overview of machine learning concepts and Azure Machine Learning. It defines machine learning as finding patterns in data and using those patterns to predict the future. It outlines the machine learning workflow and lifecycle, including preparing data, applying algorithms to find patterns, iterating to create the best model, and deploying the final model. It also describes machine learning concepts like supervised and unsupervised learning, and different problem types like regression, classification, and clustering. Finally, it discusses options for using Azure Machine Learning, including free and full-featured paid accounts, and demonstrates its use.

Introduction to sql database on azure

Antonios Chatzipavlis

This document provides an introduction and background about the presenter along with information about SQL Database. The presenter has over 30,000 hours of training experience with SQL Server and various Microsoft certifications. They created SQL School Greece as a resource for IT professionals and others interested in SQL Server. The presentation will cover what SQL Database is on Azure, its service tiers including basic, standard, and premium, database transaction units (DTUs), the Azure SQL Database logical server, management tools for SQL Database, and securing SQL Database. It concludes with an invitation to sign up for SQL PASS and follow the presenter on social media.

Implementing Mobile Reports in SQL Sserver 2016 Reporting Services

Antonios Chatzipavlis

The document provides an overview of implementing mobile reports in SQL Server 2016 Reporting Services. It discusses preparing data for mobile reports, using the SQL Server Mobile Report Publisher tool, and publishing mobile reports. The presenter has extensive experience with SQL Server and provides their qualifications. The presentation also provides information on optimizing reports, formatting time data, using filters and Excel files in reports, and designing reports using navigators and visualizations in the Mobile Report Publisher tool. It demonstrates the tool's interface and capabilities.

More from Antonios Chatzipavlis (20)

SQL server Backup Restore Revealed

Machine Learning in SQL Server 2019

Workload Management in SQL Server 2019

Loading Data into Azure SQL DW (Synapse Analytics)

Introduction to DAX Language

Building diagnostic queries using DMVs and DMFs

Exploring T-SQL Anti-Patterns

Designing a modern data warehouse in azure

Modernizing your database with SQL Server 2019

SQLServer Database Structures

Sqlschool 2017 recap - 2018 plans

Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018

Microsoft SQL Family and GDPR

Statistics and Indexes Internals

Introduction to Azure Data Lake

Azure SQL Data Warehouse

Introduction to azure document db

Introduction to Machine Learning on Azure

Introduction to sql database on azure

Implementing Mobile Reports in SQL Sserver 2016 Reporting Services

Recently uploaded

University of Toronto degree offer diploma Transcript

taqyea

学历认证补办制【微信：A575476】【(UofT毕业证）多伦多大学毕业证成绩单offer】【微信：A575476】（留信学历认证永久存档查询）采用学校原版纸张，特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信：A575476】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信：A575476】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份【微信：A575476】 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才 → 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！办理(UofT毕业证）多伦多大学毕业证【微信：A575476】外观非常精致，由特殊纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理(UofT毕业证）多伦多大学毕业证【微信：A575476】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理(UofT毕业证）多伦多大学毕业证【微信：A575476】价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理(UofT毕业证）多伦多大学毕业证【微信：A575476 】是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe

bookmybebe1

Maruti Wagon R on road price in Faridabad - CarDekho

kamli sharma#S10

Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe

nikita dubey$A17

Supervised Learning (Data Science).pptx

TARIKU ENDALE

From Clues to Connections: How Social Media Investigators Expose Hidden Networks

Milind Agarwal

Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

yogita singh$A17

MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT

GaneshGanesh399816

### Data Description and Analysis Summary for Presentation #### 1. **Importing Libraries** Libraries used: - `pandas`, `numpy`: Data manipulation - `matplotlib`, `seaborn`: Data visualization - `scikit-learn`: Machine learning utilities - `statsmodels`, `pmdarima`: Statistical modeling - `keras`: Deep learning models #### 2. **Loading and Exploring the Dataset** **Dataset Overview:** - **Source:** CSV file (`mumbai-monthly-rains.csv`) - **Columns:** - `Year`: The year of the recorded data. - `Jan` to `Dec`: Monthly rainfall data. - `Total`: Total annual rainfall. **Initial Data Checks:** - Displayed first few rows. - Summary statistics (mean, standard deviation, min, max). - Checked for missing values. - Verified data types. **Visualizations:** - **Annual Rainfall Time Series:** Trends in annual rainfall over the years. - **Monthly Rainfall Over Years:** Patterns and variations in monthly rainfall. - **Yearly Total Rainfall Distribution:** Distribution and frequency of annual rainfall. - **Box Plots for Monthly Data:** Spread and outliers in monthly rainfall. - **Correlation Matrix of Monthly Rainfall:** Relationships between different months' rainfall. #### 3. **Data Transformation** **Steps:** - Ensured 'Year' column is of integer type. - Created a datetime index. - Converted monthly data to a time series format. - Created lag features to capture past values. - Generated rolling statistics (mean, standard deviation) for different window sizes. - Added seasonal indicators (dummy variables for months). - Dropped rows with NaN values. **Result:** - Transformed dataset with additional features ready for time series analysis. #### 4. **Data Splitting** **Procedure:** - Split the data into features (`X`) and target (`y`). - Further split into training (80%) and testing (20%) sets without shuffling to preserve time series order. **Result:** - Training set: `(X_train, y_train)` - Testing set: `(X_test, y_test)` #### 5. **Automated Hyperparameter Tuning** **Tool Used:** `pmdarima` - Automatically selected the best parameters for the SARIMA model. - Evaluated using metrics such as AIC and BIC. **Output:** - Best SARIMA model parameters and statistical summary. #### 6. **SARIMA Model** **Steps:** - Fit the SARIMA model using the training data. - Evaluated on both training and testing sets using MAE and RMSE. **Output:** - **Train MAE:** Indicates accuracy on training data. - **Test MAE:** Indicates accuracy on unseen data. - **Train RMSE:** Measures average error magnitude on training data. - **Test RMSE:** Measures average error magnitude on testing data. #### 7. **LSTM Model** **Preparation:** - Reshaped data for LSTM input. - Converted data to `float32`. **Model Building and Training:** - Built an LSTM model with one LSTM layer and one Dense layer. - Trained the model on the training data. **Evaluation:** - Evaluated on both training and testing sets using MAE and RMSE. **Output:** - **Train MAE:** Accuracy on training data. - **T

Streamlining Legacy Complexity Through Modernization

sanjay singh

EGU2020-10385_presentation LSTM algorithm

fatimaezzahraboumaiz2

Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

nehadubay1

Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe

butwhat24

Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe

vasudha malikmonii$A17

LLM powered Contract Compliance Application.pptx

Jyotishko Biswas

Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe

khansayyad1256

Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe

aarusi sexy model

Victoria University degree offer diploma Transcript

taqyea

特殊工艺完全按照原版制作【微信：A575476】【(Victoria毕业证)维多利亚大学毕业证成绩单offer】【微信：A575476】（留信学历认证永久存档查询）采用学校原版纸张（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信：A575476】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信：A575476】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份【微信：A575476】 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才 → 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！办理(Victoria毕业证)维多利亚大学毕业证【微信：A575476】外观非常精致，由特殊纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理(Victoria毕业证)维多利亚大学毕业证【微信：A575476】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理(Victoria毕业证)维多利亚大学毕业证【微信：A575476】价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理(Victoria毕业证)维多利亚大学毕业证【微信：A575476 】是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...

Amazon Web Services Korea

Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe

depikasharma

Amul goes international: Desi dairy giant to launch fresh ...

chetankumar9855

Recently uploaded (20)

University of Toronto degree offer diploma Transcript

Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe

Maruti Wagon R on road price in Faridabad - CarDekho

Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe

Supervised Learning (Data Science).pptx

From Clues to Connections: How Social Media Investigators Expose Hidden Networks

Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT

Streamlining Legacy Complexity Through Modernization

EGU2020-10385_presentation LSTM algorithm

Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe

Nehru Place @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe

Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe

LLM powered Contract Compliance Application.pptx

Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe

Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Arti Singh Top Model Safe

Victoria University degree offer diploma Transcript

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...

Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe

Amul goes international: Desi dairy giant to launch fresh ...

Data virtualization using polybase

1. Antonios Chatzipavlis DATA SOLUTIONS CONSULTANT TRAINER Data Virtualization using Polybase Athens Jan 30,2021 SQL Night #44

2. 1988 2000 2010 Antonios Chatzipavlis Data Solutions Consultant, Trainer v 6.0 60 + Founder

3. A community for professionals who use the Microsoft Data Platform Articles Webinars Videos Presentations Events Resources News ./c/sqlschool.gr Sqlschool.gr Group @antoniosch @sqlschool SQLschool.gr UG & Page Connect Explore Learn

4. Please mute your mic

5. • Overview • Installing and Configure Polybase • Data Virtualization using Polybase • DMVs and Polybase • Performance and Troubleshooting Presentation Content

6. Proliferation of Data Platform technologies What is Data Virtualization? What is Polybase? Data Virtualization using Polybase Overview

7. Connect / Explore / Learn Proliferation of Data Platform technologies Massive increasing amount of data The Problem Technologies RDBMS

8. Connect / Explore / Learn A modern take on the classic problem of ETL. Data appears to come from one source system while under the covers defining links to where the data really lives. End user or analyst: • Can read this data using one SQL dialect. • Join with structured data sets from different systems without needing to know the source of each data set. • No dependencies from database developers to build in ETL flows to move data from one system to the next. What is Data Virtualization?

9. Connect / Explore / Learn Polybase has been available since 2010. General Available in SQL Server 2016. Polybase purpose was to integrate SQL Server with Hadoop by allowing us to run MapReduce jobs against a remote Hadoop cluster and bringing the results back into SQL Server reducing the computational burden on our relatively more expensive SQL Server instances. PolyBase in SQL Server 2019 has grown and adapted to this era of data virtualization and gives us the ability to integrate with a variety of source systems like Hadoop cluster, Azure Blob Storage, other SQL Server instances, Oracle database, Teradata, MongoDB, Cosmos DB, an Apache Spark cluster, Apache Hive tables, and even Microsoft Excel. The best part of it is that developers need only T-SQL. PolyBase is no panacea, and there are trade-offs compared to storing all data natively in one source system, particularly around performance. What is Polybase?

10. Feature Selection Polybase Configuration Java Installation Polybase Services and Accounts Firewall Rules Data Virtualization using Polybase Installing and Configure Polybase

11. Connect / Explore / Learn Feature Selection

12. Connect / Explore / Learn Polybase Configuration Scale-out group rules Each machine hosting SQL Server must be part of the same Active Directory domain. You must use the same Active Directory service account for each installation of the PolyBase Engine and PolyBase Data Movement services. Each machine hosting SQL Server must be able to communicate with all other Scale-Out Group members in close physical proximity and on the same network, avoiding geographically distributed servers and communications through the Internet. Each SQL Server instance must be running the same major version of SQL Server PolyBase services are machine-level rather than instance-level services.

13. Connect / Explore / Learn Java Installation

14. Connect / Explore / Learn Polybase Services and Accounts

15. Connect / Explore / Learn Setup Complete

16. Connect / Explore / Learn Firewall Rules

17. Polybase Configuration

18. Connecting to Azure Blob Storage Connecting to SQL Server Data Virtualization using Polybase Data Virtualization using Polybase

19. Connecting to Azure Blob Storage

20. Connecting to SQL Server

21. Connect / Explore / Learn Polybase Vs. Linked Servers PolyBase External Table Linked Server Object scope Database level, focusing on a single table Instance level Operational intent Read-only Read and write Scale-out Able to use Scale-Out Groups No scale-out capabilities Expected data size Large tables with analytic workloads OLTP-style workloads querying a small number of rows

22. Metadata DMVs Service and Node Resources DMVs Data Movement Service DMVs Troubleshooting Queries DMVs Data Virtualization using Polybase Polybase DMVs

23. Connect / Explore / Learn use PolybaseDemo; select * from sys.external_data_sources; select * from sys.external_file_formats; select * from sys.external_tables; go Metadata DMVs

24. Connect / Explore / Learn use master; select * from sys.dm_exec_compute_nodes; select * from sys.dm_exec_compute_node_status; select * from sys.dm_exec_compute_node_errors; go Service and Node Resources DMVs

25. Connect / Explore / Learn use master; select * from sys.dm_exec_dms_services; select * from sys.dm_exec_dms_workers; go Data Movement Service DMVs

26. Connect / Explore / Learn use PolybaseDemo ; select * from sys.dm_exec_external_work; select * from sys.dm_exec_external_operations; select * from sys.dm_exec_distributed_requests; select * from sys.dm_exec_distributed_request_steps; select * from sys.dm_exec_distributed_sql_requests; go Troubleshooting Queries DMVs

27. Polybase DMVs

28. Statistics on External Tables Predicate Pushdown Polybase Log Files Data Issues Data Virtualization using Polybase Performance and Troubleshooting

29. Connect / Explore / Learn Statistics on External Tables • Fundamentally are the same as statistics on regular tables • Because data lives outside of SQL Server  We cannot automatically create or maintain statistics against external tables.  We can create statistics from 100% of data (default) or from a sample of data.  Disk space needed during statistics creation because all the data from external table streamed into temporary table. • Performance Impact  External statistics can make a difference when they help the optimizer decide whether to push down a predicate or reorder joins to other tables, not in full scans.

30. Connect / Explore / Learn Predicate Pushdown • Pushdown computation improves the performance of queries on external data sources. • In SQL Server 2019 is available in Hadoop, Oracle, Teradata, MongoDB, ODBC generic types, SQL Server. • SQL Server allows the following basic expressions and operators for predicate pushdown.  Binary comparison operators (<, >, =, !=, <>, >=, <=) for numeric, date, and time values.  Arithmetic operators (+, -, *, /, %).  Logical operators (AND, OR).  Unary operators (NOT, IS NULL, IS NOT NULL).

31. Predicate Pushdown

32. Connect / Explore / Learn Located at %PROGRAMFILES%Microsoft SQL Server MSSQL##.MSSQLSERVERMSSQLLogPolybase Polybase Log Files

33. Connect / Explore / Learn Data Issues • Structural • Unsupported characters • Date formats • Limitations  The maximum possible row size (full length of variable length columns) can't exceed 32 KB in SQL Server or 1 MB in Azure Synapse Analytics.  Text-heavy columns might be limited.

34. Any questions?

35. Thank you! Antonios Chatzipavlis Data Solutions Consultant & Trainer • @antoniosch • @sqlschool twitter • ./sqlschoolgr • ./groups/sqlschool Facebook • ./c/SqlschoolGr YouTube • SQLschool.gr • Group LinkedIn

Editor's Notes

Hello and welcome to another SQL Night I am Antonios Chatzipavlis I am a Data Solutions Consultant and Trainer and I have been in the Information Technology Industry since 1988 I have been an MCT since 2000 and Microsoft Data platform MVP since 2010. I started using SQL Server since version 6.0 this means I have more than 25 of experience with this product in large scale environments. I have more than 60 (sixty) of certifications mostly in MS products Finally I am the founder of SQLschool.gr
SQLschool.gr is a community for Greek professionals who use the Microsoft Data Platform. In this you will find Articles, Webinars, Videos, Resources, news about Microsoft Data Platform. You can join us as a member or follow us in social media to keep up with our community This year SQLschool.gr became 10 years old and I would like to thank you all for your participation and support.
SQLschool.gr is a community for Greek professionals who use the Microsoft Data Platform. In this you will find Articles, Webinars, Videos, Resources, news about Microsoft Data Platform. You can join us as a member or follow us in social media to keep up with our community This year SQLschool.gr became 10 years old and I would like to thank you all for your participation and support.
There are two components selected aside from Database Engine Services: the PolyBase Query Service for External Data and the Java connector for HDFS data sources. The Java connector for HDFS data sources provides us support for connecting to Hadoop and Azure Blob Storage, which were the two endpoints available with PolyBase in SQL Server 2016 and SQL Server 2017; I refer to this throughout the book as PolyBase V1. SQL Server 2019 also adds the PolyBase Query Service for External Data component, which includes support for services like Oracle, Teradata, MongoDB, Cosmos DB, and even other SQL Server instances. In order to install this component, SQL Server’s installer will also install the Microsoft Visual C++ 2017 Redistributable.
Before you begin installation, it is important to know whether you want to install PolyBase as a standalone service or as part of a Scale-Out Group because you will not be able to switch between the two afterward without uninstalling and reinstalling the PolyBase features. If you are using SQL Server on Linux, the only option available to you at this time is to install standalone; SQL Server on Windows allows for both installation methods. All other things equal, a Scale-Out Group is preferable to a standalone installation. The reason for this is that PolyBase is a Massively Parallel Processing (MPP) technology. This means we can scale PolyBase horizontally, improving performance by adding additional servers. But that only works if you incorporate your machine as part of a Scale-Out Group, however; as a standalone installation, your SQL Server instance will not be able to enlist the support of other SQL Server instances when using PolyBase to perform queries. The preceding text makes sense when all other things are equal, but installing PolyBase as part of a Scale-Out Group has some requirements which standalone PolyBase does not. To wit, in order to install PolyBase as part of a Scale-Out Group, all of the following must be true:
The first option is to install the Azul Zulu Open JRE. This is a distribution of Oracle’s Open Java Runtime Environment which Azul Systems supports. Your license for SQL Server includes support for this particular distribution of Open JRE, meaning that you could contact Microsoft support for issues related to the JRE. The link on the installation page includes more information on this licensing agreement. If you are already a licensed Oracle Standard Edition (SE) customer, you can of course install the Oracle SE version of the Java Runtime Environment. To do so, select the “Provide the location of a different version that has been installed to on this computer” option and navigate to your already-installed version of the Java Runtime Environment. SQL Server 2016 and 2017 supported JRE version 7 update 51 and later, as well as JRE version 8. SQL Server 2019 supports later versions of the Java Runtime Environment, including version 11. If you are not a licensed Oracle SE customer, you can also install Oracle’s Open JRE. The downside to this is that your support options are limited to public forum access.
Configuration.sql
PolybaseBlob.sql
PolybaseSQL.sql
Linked servers are a classic technique database administrators and developers can use to query another server’s data from the local server. On the plus side, there is extensive OLEDB driver support, and linked servers can reach out to technologies like Oracle, Apache Hive, other SQL Server instances, and even Excel. On the minus side, linked servers have an oft-deserved reputation for bringing over too much data from the remote server during queries and a somewhat undeserved reputation for being a security issue. Still, introducing the idea of an alternative for linked servers should excite many a DBA. Here is where I have mixed news for you: PolyBase can be superior to linked servers in some circumstances, but you will not want to replace all of your linked servers with external tables, as there are some cases where linked servers will be superior. Instead, think of these as two complementary technologies with considerable overlap. Object Scopes Linked servers are scoped at the instance level, which means that when you create a linked server, any database on that instance has access to the linked server. Furthermore, on the remote side, linked servers allow you to query any table or view on any database where the remote login has rights. The advantage to the linked server model is its flexibility: you can use linked servers for any number of queries across an indefinite number of remote tables or views. The biggest disadvantage of this approach is that it promotes the idea that perhaps you ought to make that cross-server join of two very large tables. By contrast, PolyBase requires more deliberation: a database administrator or developer needs to create the external table link on a table-by-table or view-by-view basis before anybody can use it. This additional effort should make the creator think about whether a cross-server link is really necessary and can provide a bit of extra documentation about which tables the staff intend to use for cross-server queries. The downside to this is, if you have a large number of tables to query, it means writing a large number of external table definitions and also maintaining these definitions across table changes. This makes PolyBase a better choice for more stable data models and linked servers for more dynamic data models. Operational Intent Linked servers allow for reads as well as inserts, updates, and deletes. With PolyBase V1, we were able to read and insert but could not update or delete data. For the PolyBase V2 types, we are able to read but the engine prohibits any data modification, including inserts. If you attempt a data modification statement against a PolyBase V2 external table, you will get an error message similar to that Msg 46519 – DML Operations are not supported with external tables Scale-Out Capabilities Linked servers offer no ability to scale out. One SQL Server instance may read from one SQL Server instance. If you experience performance problems, there is no way to add additional SQL Server instances to the mix to share the load. PolyBase, meanwhile, offers Scale-Out Groups for cases when three or four servers are better than one. In this regard, PolyBase is strictly superior. Data Sizes Tying in with scale-out capabilities, linked servers and PolyBase have different expectations for ideal data size. If you intend to pull back one row or a few rows from a small table, linked servers will generally be a superior option because there are fewer moving parts. As you get more complicated queries with larger data sets, PolyBase tends to do at least as well and often better. Over the rest of this chapter, we will test the performance of PolyBase vs. linked servers in several scenarios to see when PolyBase succeeds and when linked servers come out ahead.
There are 13 Dynamic Management Views available in SQL Server 2019 which relate to PolyBase. In this section, we will review each of these at a high level, starting with basic metadata resources, followed by the DMVs which help with service and node setup, and finishing with DMVs for query troubleshooting.
External Data Sources Returns one row per external data source External File Formats Shows each of the most important settings for an external file External Tables Inherits several columns from sys.objects. Contains PolyBase-specific columns. External table The useful external table columns include external data source and external file format IDs, allowing us to tie these three tables together. For PolyBase V2 tables, the file format ID will be 0, as we do not use external file formats for these data sources.
Compute Nodes returns one row for the head node and one row for each PolyBase compute node, including the server name and port, as well as its IP address. If you have a standalone installation of PolyBase, you will get two rows back: one for the head and one for the local instance’s compute node. If you are using a scale-out cluster, you will get back the two rows in a standalone installation as well as one row for each scale-out compute node you have in the cluster. The sys.dm_exec_compute_node_status DMV connects to each compute node in order to determine if it is available. It retrieves server-level information such as allocated and available memory (in bytes), process and total CPU utilization (in ticks), the last communication time per node, and the latest error to have occurred as well. Figure 10-5 shows an example of some of the columns in this DMV. When it comes to errors, however, we can see the value of all of the columns while on-premises by querying sys.dm_exec_compute_node_errors. This DMV holds a history of error messages and is a good place to look when troubleshooting failures on a system. In addition to its unique ID data type, the data in sys.dm_exec_compute_node_errors will persist even after we restart the SQL Server services. Most Dynamic Management Views—for example, wait stat measures—reset when the database engine restarts, but compute node errors will stick around.
The first of these is sys.dm_exec_dms_services . This view returns one row per compute node—including one row for the head instance’s compute node—and the status for each of these nodes. Figure 10-7 shows the output of this DMV. We also have the ability to see the outputs of data movement service operations using the sys.dm_exec_dms_workers DMV. This gives us one row for each execution ID and execution step and includes performance measures, including bytes and rows, total elapsed time, CPU utilization time, and more To clear up potential confusion, the total elapsed time and query time values are in milliseconds, whereas CPU time is in ticks, where 10,000 ticks add up to a millisecond. Therefore, to get a clearer measure across the board, we want to divide the CPU time column by 10,000 to get a better picture of just how much CPU time we are actually using in relation to total elapsed time. In addition to these measures, we are also able to see the source SQL query for these operations, as well as the error ID if an operation fails. Unlike the compute node errors DMV, the DMS workers Dynamic Management View resets every time you restart the PolyBase engine service.
The final five Dynamic Management Views help us learn more about the SQL queries users run on our instances. Like the server and node resource DMVs we just looked at, these are all instance-level DMVs, meaning we will get the same results when running in any database. These views break down into two types, based on their names: external work and distributed requests. The external work results reset each time we restart the PolyBase engine, whereas the distributed requests DMVs persist even after service restarts. First up in our set of views is sys.dm_exec_external_work. This Dynamic Management View returns one row for each of the last 1000 PolyBase queries we have run since the last time the PolyBase engine started, as well as any active queries currently running. This DMV contains information on the current status of each execution, including the latest step for each compute node and Data Movement Service step. We can see the type of operation, which is “File Split” for PolyBase V1 queries and “ODBC Data Split” for PolyBase V2 queries. The input name tells us which file, folder, or table we are reading—for the SQL Server example on the first line, the input name is sqlserver://sqlcontrol/PolyBaseRevealed.dbo.Person. If we are reading from a file, the read_location field gives us the starting offset from 0 bytes. In the three cases in Figure 10-9, we read the file starting from the beginning. We can see the actual ODBC command next in the read_command column, which is a new field for SQL Server 2019. Finally, there are some columns containing top-level metrics, including bytes processed, file length (when reading files), start and end dates, the total elapsed time in milliseconds, and the status of each request. This status will be one of the following values: Pending, Processing, Done, Failed, or Aborted. If you perform a predicate pushdown operation against a Hadoop cluster, the sys.dm_exec_external_operations Dynamic Management View will give you a rundown of these pushdown operations. Figure 10-10 shows an example of a pushdown MapReduce job which failed—we can see that the map and reduce progress values are both at 0%. The sys.dm_exec_distributed_requests view returns one line per distributed operation. It provides us with one extremely helpful piece of information: a SQL handle, which we can use to return query text or an execution plan for our PolyBase queries. Figure 10-11 shows several rows from this table, including QID2260, which failed in the prior figure. The sys.dm_exec_distributed_request_steps view returns one row per execution ID and step. It is particularly useful when you already know an execution ID and want to understand what happened at each step along the way. Figure 10-12 gives us a glimpse at some of the most important columns here. The sys.dm_exec_distributed_sql_requests Dynamic Management View is our final DMV of note. It contains one row per SQL-related step on each compute node and for each distribution. Figure 10-13 shows an example of this for execution ID QID2148. This view makes clear the distributed nature of PolyBase: each distributed request step has eight separate SPIDs running on a single compute node. As with the distributed request steps DMV, we will look at this DMV in some greater detail next.
The first step creates the name of our temp table, TEMP_ID_XX. This appears to be an incrementing value, and the operation runs on the head node. The second step has us create a temporary table on each compute node named TEMP_ID_XX . The shape of this table is the set of columns that we will need for our query: population type, year, and population. The third step adds an extended property named IS_EXTERNAL_STREAMING_TABLE to each of the temp tables, presumably to make it easier to track which temp tables are used for loading external data. The step 4 runs a statistics update, updating statistics and telling SQL Server that we expect the temp table will have 566 rows. Our fifth step (i.e., step index 4) runs on the head node once more and is a MultiStreamOperation. There is no official documentation on this step, but it takes up 848 of the 919 total milliseconds of elapsed time and appears to be the operation which causes our compute nodes to do work. From there, we see a HadoopShuffleOperation on the Data Movement Service. This returns all 13,607 rows in the population table. We can see from the cleaned-up query in Listing 10-3 that this is a simple query of all rows from our population table. While we shuffle data across our compute nodes’ Data Movement Services, the next step runs, a StreamingReturnOperation . We can tell these are running concurrently because the shuffle operation takes 845 milliseconds and the streaming return operation 804 milliseconds, yet our entire query finished in under a second. This streaming query, which again runs on each of the compute nodes, queries TEMP_ID_73 and performs the aggregation we requested. Of interest is the fact that this query does not follow exactly the same shape as what we sent the database engine.
Polybase Log Files C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Log\Polybase DMS Errors The DMS error log gives stack traces when an exception occurs in the data movement service. One of the more common errors you might find when reading through this log is System.Data.SqlClient.SqlException: Operation cancelled by user. This exception occurs when a user or application stops a query, such as when a user hits the “Stop” button in Azure Data Studio. You can safely ignore this error. This particular log file tends to give you a high-level view of when errors occur but little information on the root cause or even the specific error. One of the more common errors I tend to see in this log is Internal Query Processor Error: The query processor encountered an unexpected error during the processing of a remote query phase. This phrase will not help me diagnose the problem, but this log file does tend to include information like the query ID and plan ID, which I can use to figure out which queries are failing. DMS Movement The data movement service writes a good amount of information to the DMS Movement log and includes detailed information on what data moves over from Azure Blob Storage or Hadoop to SQL Server. This includes the SQL queries the PolyBase data movement service generates, configuration settings such as the number of readers the DMS will use to migrate data, and detailed operation at each step. Combined with the DMS error log, we can start to piece together our errors. DWEngine Errors Like the DMS error log, the DWEngine error log gives a higher-level overview of when errors occur, as well as stack traces. This file can help you pinpoint when an error occurs. The errors in this file tend to be a bit more descriptive than the ones in the DMS error log. For example, we can find errors relating to the maximum reject threshold in this file: Query aborted-- the maximum reject threshold (1 rows) was reached while reading from an external source: 2 rows rejected out of total 2 rows processed. DWEngine Movement This log provides us with more detail on queries and errors which the DWEngine error log captures. In some cases, this file has enough information to drive to the root cause. In Figure 5-3, we see an example of a clear error message where I defined a column in an ORC file as a string data type but am trying to use an integer data type to access it via PolyBase. DWEngine Server The DWEngine Server log contains a few pieces of useful information. One of the most useful is that it contains the create statements for external data sources, file formats, and tables. We can use this log to determine what our external resources looked like at the time of exception, just in case somebody changed one of them during troubleshooting. This log also contains information on failed external table access attempts. If you have firewall or connection problems, this should be your first log to review. Figure 5-4 shows an example of a common HDFS bridge error whose root cause is insufficient permissions granted to the PolyBase pdw_user account. DMS PolyBase The DMS PolyBase log shows us something extremely important: any data translation failure. Figure 5-5 gives us three examples of data translation errors, including column conversion errors, data length errors, and string delimiter errors. We can also find cases where values are NULL, but the external table requires a non-nullable field, invalid date conversion attempts, and more. DWEngine PolyBase This file is much less interesting than most of the other logs. In my work, I have not seen it stretch to more than a few lines, and the most interesting thing in this log is the location of new Hadoop clusters as you create external data sources.
Structural Mismatch The first common data problem is structural mismatch—that is, when you define your external table one way but the data does not comport to that structure. For example, you might define an external table as having eight columns, but the underlying data set has seven or nine columns. In that case, the PolyBase engine will reject rows because they do not fit the expected structure. Caution In production Hadoop systems, developers are liable to change the structure of files and leave old files as is. For example, a report with eight columns might suddenly populate with nine columns on a certain date. The PolyBase engine cannot support multiple data structures for the same external table and will reject at least one of the two structures. This might cause a previously working external table query suddenly and unexpectedly to fail. Aside from column totals, there are several other mismatch problems which can cause queries to fail. For example, text files might have different schemas or delimiters: one type might be comma-delimited and another pipe-delimited. Some text files might use the quotation mark as a string delimiter, and others might use brackets or tildes. Any lack of consistency will cause the PolyBase engine to fail processing. If you do run into this scenario, an easy solution would be to create several external tables—one for each distinct file structure—and use a view to combine them together as one logical unit. Unsupported Characters or Formats PolyBase supports only a limited number of date formats. The safest route is to limit your text file dates to use supported formats. You can find these on Microsoft Docs (https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql). PolyBase also struggles with newlines in text fields, so strip those out before trying to load data. Even within a quoted delimiter, newlines will cause the PolyBase engine to think it is starting a new record. PolyBase Data Limitations PolyBase also has limits to what data it can support. From Microsoft Docs (https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-versioned-feature-summary), we can see that the maximum row size cannot exceed 32KB for SQL Server or 1MB for Azure Synapse Analytics. In addition, if you save your data in ORC format, you might receive Java out-of-memory exceptions due to data size. For text-heavy files, it might be best to keep them as delimited files rather than ORC files. The maximum possible row size, which includes the full length of variable length columns, can't exceed 32 KB in SQL Server or 1 MB in Azure Synapse Analytics. When data is exported into an ORC file format from SQL Server or Azure Synapse Analytics, text-heavy columns might be limited. They can be limited to as few as 50 columns because of Java out-of-memory error messages. To work around this issue, export only a subset of the columns. PolyBase can't connect to a Hortonworks instance if Knox is enabled. If you use Hive tables with transactional = true, PolyBase can't access the data in the Hive table's directory.

Data virtualization using polybase

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Data virtualization using polybase

Similar to Data virtualization using polybase (20)

More from Antonios Chatzipavlis

More from Antonios Chatzipavlis (20)

Recently uploaded

Recently uploaded (20)

Data virtualization using polybase

Editor's Notes