SlideShare a Scribd company logo
Building Cloud Self-Service
Analytical Solutions
By Dmitry Anoshin, Data Engineer, Abebooks (Amazon Subsidiary)
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Outline
• About Myself
• About Abebooks
• Choosing ETL for the Cloud
• Data Acquisition Patterns with Matillion ETL
• Set Self-Service BI
• Lessons Learned during the journey to the Cloud
About Myself
• Work with BI since 2007
• Implemented BI in Russia/Europe/Canada

Recommended for you

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics

This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.

azurecloudanalytics
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks

Leading brands such as Pepsi and Macy’s use Celtra’s technology platform for brand advertising. To inform better product design and resolve issues faster, Celtra relies on Databricks to gather insights from large-scale, diverse, and complex raw event data. Learn how Celtra uses Databricks to simplify their Spark deployment, achieve faster project turnaround time, and empower people to make data-driven decisions. In this webinar, you will learn how Databricks helps Celtra to: - Utilize Apache Spark to power their production analytics pipeline. - Build a “Just-in-Time” data warehouse to analyze diverse data sources such as Elastic Load Balancer access logs, raw tracking events, operational data, and reportable metrics. - Go beyond simple counting and group events into sequences (i.e., sessionization) and perform more complex analysis such as funnel analytics.

advertisingdata scientistdatabricks
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI

The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.

azuremachine learningai
Technical Skills Matrix
2015
2010
2007
Databases
(Oracle,
Teradata,
Vertica,
Snowflake,
Redshift,
Mysql,
Postgresql,
MS SQL
Server)
ETL
(Pentaho DI,
Informatica,
Matillion
ETL)
BI
(SAP
BusinessObje
cts, Tableau,
Microstrateg
y, Pentaho
BI, SAS BI)
Bigdata
(Cloudera
Hadoop, Hive,
Hue,
Splunk, Hunk,
ElasticSearch)
Digital
Marketing
(GA, Piwik,
Tealium,
Adjust,
Adobe,)
Data
Analytics
(R, Python)
2018
My Books
#dimaworkplace
About Abebooks
• Online marketplace for books, art & collectibles.
• Amazon subsidiary since 2008 we are a
marketplace for used books and increasingly non-
book-collectibles
• 350 Mln listings
• 3 in ‘DB Team’
• 2 locations: Victoria, BC and Dusseldorf

Recommended for you

Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines

View this presentation to learn about how to automate data pipelines for analytics from the experts at Microsoft and Attunity.

microsoftattunitydata pipeline
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through Metadata

In this presentation, you will learn why data lineage is so important and what benefits data lineage provides.

data lineagedata governancedataops
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite

Cortana Analytics Suite is a fully managed big data and advanced analytics suite that transforms your data into intelligent action. It is comprised of data storage, information management, machine learning, and business intelligence software in a single convenient monthly subscription. This presentation will cover all the products involved, how they work together, and use cases.

cortana analytics suiteazure ml
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Abebooks Data Flows
• Built by DBAs - db links, PL/SQL, external tables, shell scripts
• even before 2015 Redshift was a strategic but ETL re-write too expensive
DW
Storage Layer Access LayerSource Layer
ETL (PL/SQL)
Ad-hoc SQL
SALES
INVENTORY
CS
SFTP
Choosing ETL Tool for Cloud
Use Cases
• OLTP to S3
• S3 to Redshift
• SFTP/API to Redshift
• Data Transformation
• Dimensional Modelling
Tools
• Pentaho DI
• Informatica
• AWD Data Pipeline
• Talend
• Matillion
ETL Criteria
High:
• Support native
Redshift driver
• Easily capture
from relational
db, CDC
• Ease of Use for
BI/DW
• Cover use cases
• On-Premise
Medium:
• Support NoSQL
• Company “Winner”
• Deployment/Architecture
• Encryption
• Ease of Use for non BI/DW
• Data Transformations
• Management
• Pricing
• Performance
Low:
• Version Control
• Linux OS
• ETL Monitoring
• Logging
• R/Pyhton

Recommended for you

Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview

Amazon QuickSight is a business intelligence service that allows users to connect to data sources, create interactive dashboards, and securely share them across organizations. It offers auto-scaling, high availability, integration with AWS services, and pay-per-use pricing starting at $5/month for readers. QuickSight provides machine learning capabilities like anomaly detection and forecasting. It also allows embedding dashboards in applications. Customers like Capital One, Comcast, and the NFL use QuickSight for self-service analytics, embedded analytics, and delivering insights to large numbers of users through its reader role and usage-based pricing.

Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals

a talk about azure synapse aimed to help people who are not data experts understand what synapse is and how you can integrate it with other technologies

azuresynapsedata
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics

Azure Synapse is the evolution of Azure SQL Data Warehouse, combining big data, data storage and data integration into a single service for end-to-end cloud scale analytics. It provides unlimited analytics with unparalleled speed to gain insights. Azure Synapse brings together enterprise data warehousing and big data analytics to give a unified experience with the advantages of both worlds.

Why We Picked Matillion
• specific redshift support, built around Redshift platform
• speed of ETL operations
• speed of development
• wide range of data sources supported
• ease of use outside of DE/DBA expertise
• Native with AWS
• $$$
• The biggest risk – putting our eggs in the Matillion future, betting on a small and
new player.
Data acquisition
patterns with
Matillion ELT
Abebooks Cloud Analytics Architecture
Source Systems
Amazon
Athena
Amazon EMR
Amazon
Redshift
Abebooks DW Account
DynamoDB
Amazon
RDS
Amazon
Redshift
Spectrum
Amazon
Elastic Load
Balance
S3 Data Lake
SQS SNS
Amazon
Chime
Event/Notification ServicesExternal API
SFTP
APPs
Matillion ELT EC2
M4.large
2 vCPU
8 Gb Ram
Tableau Server
Tableau Web
Tableau Desktop
Ad-hock SQL
End Users Access
Pattern 1: getting data via SFTP
• Scan SFTP, get all files names, load into Redshift
• Identify only new files
• Load one ${file_name} per time (using IF we can
choose right stream)
• Insert processed ${file_name} into Redshift
• Load next file
Takeaways:
• Python BOTO library for managing S3
• Matillion variables ${variable}
• Using Matillion Iterators
• Execute SQL via Python
• If file is missing, try again later

Recommended for you

Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era

Enterprises are rapidly adopting stream computing backbones, in-memory data stores, change data capture, and other low-latency approaches for end-to-end applications. As businesses modernize their data architectures over the next several years, they will begin to evolve toward all-streaming architectures. In this webcast, Wikibon, Attunity, and MemSQL will discuss how enterprise data professionals should migrate their legacy architectures in this direction. They will provide guidance for migrating data lakes, data warehouses, data governance, and transactional databases to support all-streaming architectures for complex cloud and edge applications. They will discuss how this new architecture will drive enterprise strategies for operationalizing artificial intelligence, mobile computing, the Internet of Things, and cloud-native microservices. Link to the Wikibon report - wikibon.com/wikibons-2018-big-data-analytics-trends-forecast Link to Attunity Streaming CDC Book Download - http://www.bit.ly/cdcbook Link to MemSQL's Free Data Pipeline Book - http://go.memsql.com/oreilly-data-pipelines

wikibonattunitymemsql
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence

May's RDX Insights Series Presentation focuses on Microsoft's BI products. We begin with an overview of Power BI, SSIS, SSAS and SSRS and how the products integrate with each other. The webinar continues with a detailed discussion on how to use Power BI to capture, model, transform, analyze and visualize key business metrics. We’ll finish with a Power BI demo highlighting some of its most beneficial and interesting features.

ssaspower bi servicepower bi
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included

This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.

Pattern 2: getting data via API
• Connect API via Python script
• Get data via calls and save to CSV at EC2
• Upload CSV into S3
• Load CSV into Redshift
Takeaways:
• Using Python to connect external API
• Using AWS KMS to encrypt credentials
• Using SNS for email notification
• Using Matillion system variable for ETL
Logs
Pattern 3: getting data from DynamoDB
Takeaways:
• Using DynamoDB component (generate COPY command for you)
• You can’t easily get incremental changes, i.e. full reload
• Speed depends depends on two things, the "read ratio" and the per-table "read
capacity". The actual rows per hour value is going to be based on readRatio *
tableReadCapacity.
• 51m rows with 35% read ratio and 300 read capacity = 9 hours
• 211m rows with 66% read ratio and 1500 read capacity = 4 hours
• Reloading once a week
Pattern 4: getting data from external S3*
Getting data from another VPC – change policy of the bucket and you can see it in the
list of buckets through Matillion
Pattern 5: Matillion connectors for Apps

Recommended for you

Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI

This document discusses the future of data and the Azure data ecosystem. It highlights that by 2025 there will be 175 zettabytes of data in the world and the average person will have over 5,000 digital interactions per day. It promotes Azure services like Power BI, Azure Synapse Analytics, Azure Data Factory and Azure Machine Learning for extracting value from data through analytics, visualization and machine learning. The document provides overviews of key Azure data and analytics services and how they fit together in an end-to-end data platform for business intelligence, artificial intelligence and continuous intelligence applications.

power bimicrosoftazure synapse analtyics
Modern Data Warehouse Overview
Modern Data Warehouse OverviewModern Data Warehouse Overview
Modern Data Warehouse Overview

A modern data warehouse lets you bring together all your data at any scale easily, and to get insights through analytical dashboards, operational reports, or advanced analytics for all your users.

cloudazuremicrosoft azure
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist

This document summarizes a presentation given by Alberto Diaz Martin on Azure Databricks for data scientists. The presentation covered how Databricks can be used for infrastructure management, data exploration and visualization at scale, reducing time to value through model iterations and integrating various ML tools. It also discussed challenges for data scientists and how Databricks addresses them through features like notebooks, frameworks, and optimized infrastructure for deep learning. Demo sections showed EDA, ML pipelines, model export, and deep learning modeling capabilities in Databricks.

azuredatababricksdata scientists
Pattern 6: Using SQS for Triggering Job
Using SQS service we can trigger almost anything in Matillion or AWS
Improving end
users experience
BI Survey
• ETL was a black box
• A lack of notifications
• A lack of documentation and trainings
• A lack of automation
• No dependency between reports and ETL process
• High dependency from BI/DW team
BI Champions
The BI champion is the sheriff, ensuring the townspeople (or business users) be
productive and can make analytics fast and smoothly.
The BI Champion is meant to be both an
evangelist and subject matter expert for BI
within the organization. The champion should
be well versed in the data important to their
team, and knowledgeable in the core BI
technologies and patterns used within
AbeBooks.

Recommended for you

Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf

The document summarizes a presentation about data vault automation at a Dutch department store chain called de Bijenkorf. It discusses the project objectives of having a single source of reports and integrating with production systems. An architectural overview is provided, including the use of AWS services, a Snowplow event tracker, and Vertica data warehouse. Automation was implemented for loading data from over 250 source tables into the data vault and then into information marts. This reduced ETL development time and improved auditability. The data vault supports customer analysis, personalization, and business intelligence uses at de Bijenkorf. Drivers of the project's success included the AWS infrastructure, automation approach, and Pentaho ETL framework.

data warehousingawsdata vault
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics

The document discusses using Attunity Replicate to accelerate loading and integrating big data into Microsoft's Analytics Platform System (APS). Attunity Replicate provides real-time change data capture and high-performance data loading from various sources into APS. It offers a simplified and automated process for getting data into APS to enable analytics and business intelligence. Case studies are presented showing how major companies have used APS and Attunity Replicate to improve analytics and gain business insights from their data.

attunitymicrosoftbig data
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS

The document provides an overview of big data concepts and Amazon Web Services (AWS) products for big data and analytics. It describes challenges of big data including unpredictable resource demand and job orchestration complexities. It then summarizes AWS products for data collection, storage, processing, analytics and machine learning. Specific examples are given using AWS services like Redshift, EMR, Kinesis and DynamoDB for scenarios like data warehousing, real-time streaming and Hadoop workloads. Core principles and common challenges of big data implementations on AWS are also outlined.

big dataamazon web servicesaws
ETL Monitor and notifications
SNS Topic will send
email. In addition we can
add any number of
Matillion variables
Using Amazon Chime
Webhook we can
execute CURL command
via bash script and send
message to the business
users
ETL Monitor
Using Matillion system variables we are tracking all events and then visualize via Tableau for end users as well as
allow to create alerts in case of failure.
ETL Trigger for Tableau
Task: Refresh Tableau Data Source (Semantic Layer) & Workbooks when FACT tables are refreshed.
Solution: Deploy Tableau CLI tool on EC2 Matillion and run via Bash Script
Self-Service BI
• Change Management: from report-writing culture to data-driven company
• The clear Authority: Support of Executive
• The analytic culture: Business executives must have a vision for analytics and the willingness to invest in the
people, processes, and technologies for the long haul to ensure a successful outcome.
• The right people (data engineers, BI engineers, business analysts)
• The right organizational structure: BI Center of Excellence, that establishes and inculcates best practices for
building analytical applications
• The right data and architecture
• The right tools: Redshift, Matillion and Tableau are best for Self-Serve

Recommended for you

AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...

Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift GE Power & Water develops advanced technologies to help solve some of the world’s most complex challenges related to water availability and quality. They had amassed billions of rows of data on on-premises databases, but decided to migrate some of their core big data projects to the AWS Cloud. When they decided to transform and store it all in Amazon Redshift, they knew they needed an ETL/ELT tool that could handle this enormous amount of data and safely deliver it to its destination. In this session, Ryan Oates, Enterprise Architect at GE Water, shares his use case, requirements, outcomes and lessons learned. He also shares the details of his solution stack, including Amazon Redshift and Matillion ETL for Amazon Redshift in AWS Marketplace. You learn best practices on Amazon Redshift ETL supporting enterprise analytics and big data requirements, simply and at scale. You learn how to simplify data loading, transformation and orchestration on to Amazon Redshift and how build out a real data pipeline. Get the insights to deliver your big data project in record time.

introductory (200 level)reinvent2016amazon web services
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI

This document discusses techniques for optimizing Power BI performance. It recommends tracing queries using DAX Studio to identify slow queries and refresh times. Tracing tools like SQL Profiler and log files can provide insights into issues occurring in the data sources, Power BI layer, and across the network. Focusing on optimization by addressing wait times through a scientific process can help resolve long-term performance problems.

power bioptimizationmicrosoft
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation

This presentation contains following slides, Introduction To OLAP Data Warehousing Architecture The OLAP Cube OLTP Vs. OLAP Types Of OLAP ROLAP V/s MOLAP Benefits Of OLAP Introduction - Apache Kylin Kylin - Architecture Kylin - Advantages and Limitations Introduction - Druid Druid - Architecture Druid vs Apache Kylin References For any queries Contact Us:- argonauts007@gmail.com

kylinbig dataolap
Report Automatization
• Central BI Portal
• Reusable Tableau Data Sources a.k.a. Business Layer
• Common WBR Format
• Eliminate manual work
• No spreadsheets and ad-hoc SQL queries
• Data Discovery
• ETL Integration
• Friendly drag and drop GUI
TL;DR: CTRL+C, CTRL+V, IT dependency
• Lots of SQL and Excel routine
• Each team define own style and format of report
• Multiple metrics definition
• No visualization, no alerts
• Slow data discovery, hypothesis evaluation
Lessons Learned
from moving DW into
AWS (Cloud)
Five Points of Guidance for Redshift (SET DW)
1. Sort Keys:
• Choose up to 3 columns
• Ordered in increasing order of specificity, balanced with likelihood of use.
• Leave INTERLEAVED sort keys for 1 year anniversary.
2. Column Encoding:
• Compress all columns except for (at least) the first sort key.
3. Table Maintenance:
• VACUUM and ANALYZE tables weekly (use STL_ALERT_EVENT_LOG as a guide for frequency).
• ANALYZE PREDICATE COLUMNS is very useful for quick daily stats refresh.
4. Choose a Distribution Key that:
• Follows the common join pattern for the table.
• Evenly distributes the data across the database slices on the cluster.
• DISTSTYLE ALL is a great go-to for dimension tables < ~3 million rows.
• DISTSTYLE EVEN is a good fail-safe, but guarantees inter-node data redistribution.
5. Workload Management (WLM) and Query Monitoring Rules (QMR):
• Start with up to 3 queues, (in addition to what Redshift provides automatically).
• Put ETL in its own queue with very low active_statement count (perhaps as low as 1 or 2). Monitor commit queuing.
• Split up the memory across the queues. Monitor the percent of each queue’s workload going to disk.
• Expect to change WLM settings to match the workload changes (day|night, weekday|weekend)
Lesson One. CHOOSE RIGHT MIGRATION STRATEGY
Lift & Shift
• Typical Approach
• Move all-at-once
• Target platform then evolve
• Approach gets you to the cloud quickly
• Relatively small barrier to learning new
technology since it tends to be a close fit

Recommended for you

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift

Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.

amazon-web-servicescloud-computingamazon-redshift
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development

The document provides an overview of modern scalable web development trends. It discusses the motivation to build systems that can handle large amounts of data quickly and reliably. It then summarizes the evolution of software architectures from monolithic to microservices. Specific techniques covered include reactive system design, big data analytics using Hadoop and MapReduce, machine learning workflows, and cloud computing services. The document concludes with an overview of the technologies used in the Septeni tech stack.

ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics

The document discusses Socialmetrix's evolution of their real-time social media analytics architecture over 4 iterations to meet growing customer and data demands. It describes how they moved from a monolithic to distributed setup using technologies like AWS, Spark, Kafka and Cassandra to improve scalability, costs and resilience while adding new data sources and features. Key lessons included automating deployments, monitoring systems, and using AWS services like S3, EMR and DynamoDB to enable rapid prototyping and reprocessing as needed to support real-time and batch analytics.

reinvent aws
Lesson One. CHOOSE RIGHT MIGRATION STRATEGY
Split & Flip
• Split application into logical functional
data layers
• Match the data functionality with the
right technology
• Leverage the wide selection of tools
on AWS to best fit the need
• Move data in phases — prototype,
learn and perfect
Lesson Two. CHANGE YOUR MINDSET
Take the time to learn
• Critical to train and learn the new technologies that
are being used
• Easy to think about translating or converting
• Made many such changes — relational vs non-
relational, batch vs streaming, service based vs
procedural, etc.
Lesson Two. CHANGE YOUR MINDSET
Traditional DW — faster runtime is better
Cloud — if runtime is slower, it is easy to scale
Reality
Query #1 uses 64 cores & Query #2 uses 1 core
Practical limitation to scale — fixed budget
#1 RUNS IN 1 MIN
RUNS IN 2 MINS
DB
DB#2
Lesson Two. CHANGE YOUR MINDSET
We Optimized For Cost in RedShift
• What is the most amount of work that can be done using the given
fixed budget?
• Focus is on the total amount of work versus optimizing for a single
user
• Everything you use comes at a cost on the Cloud
 DynomoDB performance
 Redshift vs Spectrum (S3)
Cost is just one example of the many mindset changes that we made

Recommended for you

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture

Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020. Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms. Data lakes will be built in cloud object storage. We’ll discuss the options there as well. Get this data point for your data lake journey.

dataversitydataversity webinarsdata
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options

This document provides tips for optimizing performance in Power BI by focusing on different areas like data sources, the data model, visuals, dashboards, and using trace and log files. Some key recommendations include filtering data early, keeping the data model and queries simple, limiting visual complexity, monitoring resource usage, and leveraging log files to identify specific waits and bottlenecks. An overall approach of focusing on time-based optimization by identifying and addressing the areas contributing most to latency is advocated.

power bimethod optimizingsql profiler
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)

PCM18 (Big Data Analytics) slides of Pentaho Community Meetup in Bologna about Big Data OLAP using Pentaho, Vertica and Kylin

pentahobig dataolap
Lesson Three. DO NOT SCARRY OPEN BLACK BOX
• All business logic is hidden in legacy ETL scripts
• Tradeoff between fast project and business users
expectation
• Learn about your business
• Discover and fix the issues
Lesson Four. BE AGILE AND INVOLVE BUSINESS
Agile Benefits
• See results earlier
• Feedback Constantly
• Serves your users
• Flexibility
• Quality Assurance
Lesson Five. PLAN YOUR EVOLUTION
Handling Less Efficient Queries
• Provide separate cluster as a SandBox
• App Developers design new queries that will fit the
constraints of a hands-off operations
Example.
Create roll-up summary
tables in RedShift
SUMMARY
TABLE
Q&A
Contact details: anoshind@amazon.com

Recommended for you

AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics

Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.

awsreinvent2014reinvent
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing

Cloud computing is no longer a fad that is going around. It is for real and is perhaps the most talked about subject. Various players in the cloud eco-system have provided a definition that is closely aligned to their sweet spot –let it be infrastructure, platforms or applications. This presentation will provide an exposure of a variety of cloud computing techniques, architecture, technology options to the participants and in general will familiarize cloud fundamentals in a holistic manner spanning all dimensions such as cost, operations, technology etc

cloud computingcloud trainingcloud
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL

Serverless SQL provides a serverless analytics platform that allows users to analyze data stored in object storage without having to manage infrastructure. Key features include seamless elasticity, pay-per-query consumption, and the ability to analyze data directly in object storage without having to move it. The platform includes serverless storage, data ingest, data transformation, analytics, and automation capabilities. It aims to create a sharing economy for analytics by allowing various users like developers, data engineers, and analysts flexible access to data and analytics.

serverlesssqlcloud

More Related Content

What's hot

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through Metadata
MANTA
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
James Serra
 
Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
Lam Le
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
Michael Stephenson
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
Eduardo Castro
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
Attunity
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
Christopher Foot
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
Kellyn Pot'Vin-Gorman
 
Modern Data Warehouse Overview
Modern Data Warehouse OverviewModern Data Warehouse Overview
Modern Data Warehouse Overview
John Chang
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
Rob Winters
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
Attunity
 

What's hot (20)

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Modernizing Data Management Through Metadata
Modernizing Data Management Through MetadataModernizing Data Management Through Metadata
Modernizing Data Management Through Metadata
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Module 3 - QuickSight Overview
Module 3 - QuickSight OverviewModule 3 - QuickSight Overview
Module 3 - QuickSight Overview
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Modern Data Warehouse Overview
Modern Data Warehouse OverviewModern Data Warehouse Overview
Modern Data Warehouse Overview
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 

Similar to Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution

Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
2nd Watch
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
Amazon Web Services
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
Kellyn Pot'Vin-Gorman
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
argonauts007
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
Sebastian Montini
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
Kellyn Pot'Vin-Gorman
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
Stratebi
 
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
Socialmetrix
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Clustrix
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
Amazon Web Services
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
Simon Belak
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
Alexandra Sasha Blumenfeld
 

Similar to Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution (20)

Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
ARC202:real world real time analytics
ARC202:real world real time analyticsARC202:real world real time analytics
ARC202:real world real time analytics
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 

More from Dmitry Anoshin

Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...
Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...
Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...
Dmitry Anoshin
 
Victoria Tableau User Group - Getting started with Tableau
Victoria Tableau User Group - Getting started with TableauVictoria Tableau User Group - Getting started with Tableau
Victoria Tableau User Group - Getting started with Tableau
Dmitry Anoshin
 
Hey, what is about data?
Hey, what is about data?Hey, what is about data?
Hey, what is about data?
Dmitry Anoshin
 
Tableau API
Tableau APITableau API
Tableau API
Dmitry Anoshin
 
My experience of writing technical books
My experience of writing technical booksMy experience of writing technical books
My experience of writing technical books
Dmitry Anoshin
 
Business objects activities web intelligence
Business objects activities web intelligenceBusiness objects activities web intelligence
Business objects activities web intelligence
Dmitry Anoshin
 
Splunk 6.2 new features
Splunk 6.2 new featuresSplunk 6.2 new features
Splunk 6.2 new features
Dmitry Anoshin
 
Business Analytics Paradigm Change
Business Analytics Paradigm ChangeBusiness Analytics Paradigm Change
Business Analytics Paradigm Change
Dmitry Anoshin
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practices
Dmitry Anoshin
 
Exploring Splunk
Exploring SplunkExploring Splunk
Exploring Splunk
Dmitry Anoshin
 
Splunk Digital Intelligence
Splunk Digital IntelligenceSplunk Digital Intelligence
Splunk Digital Intelligence
Dmitry Anoshin
 
Role of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery MarketRole of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery Market
Dmitry Anoshin
 
SAP Lumira - Building visualizations
SAP Lumira - Building visualizationsSAP Lumira - Building visualizations
SAP Lumira - Building visualizations
Dmitry Anoshin
 
SAP Lumira - Acquiring data
SAP Lumira - Acquiring dataSAP Lumira - Acquiring data
SAP Lumira - Acquiring data
Dmitry Anoshin
 
SAP Lumira - Enriching data
SAP Lumira - Enriching dataSAP Lumira - Enriching data
SAP Lumira - Enriching data
Dmitry Anoshin
 
Microstrategy for Retailer Company
Microstrategy for Retailer CompanyMicrostrategy for Retailer Company
Microstrategy for Retailer Company
Dmitry Anoshin
 
SAP BusinessObjects 4.1 Web Intelligence Report Development
SAP BusinessObjects 4.1 Web Intelligence Report DevelopmentSAP BusinessObjects 4.1 Web Intelligence Report Development
SAP BusinessObjects 4.1 Web Intelligence Report Development
Dmitry Anoshin
 
Sap BusinessObjects 4
Sap BusinessObjects 4Sap BusinessObjects 4
Sap BusinessObjects 4
Dmitry Anoshin
 
Business objects web intelligence training tasks
Business objects web intelligence training tasksBusiness objects web intelligence training tasks
Business objects web intelligence training tasks
Dmitry Anoshin
 
Sap business objects 4 quick start manual
Sap business objects 4 quick start manualSap business objects 4 quick start manual
Sap business objects 4 quick start manual
Dmitry Anoshin
 

More from Dmitry Anoshin (20)

Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...
Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...
Cloud Analytics Use Cases and Architecture, Math Marketing Conference, Russia...
 
Victoria Tableau User Group - Getting started with Tableau
Victoria Tableau User Group - Getting started with TableauVictoria Tableau User Group - Getting started with Tableau
Victoria Tableau User Group - Getting started with Tableau
 
Hey, what is about data?
Hey, what is about data?Hey, what is about data?
Hey, what is about data?
 
Tableau API
Tableau APITableau API
Tableau API
 
My experience of writing technical books
My experience of writing technical booksMy experience of writing technical books
My experience of writing technical books
 
Business objects activities web intelligence
Business objects activities web intelligenceBusiness objects activities web intelligence
Business objects activities web intelligence
 
Splunk 6.2 new features
Splunk 6.2 new featuresSplunk 6.2 new features
Splunk 6.2 new features
 
Business Analytics Paradigm Change
Business Analytics Paradigm ChangeBusiness Analytics Paradigm Change
Business Analytics Paradigm Change
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practices
 
Exploring Splunk
Exploring SplunkExploring Splunk
Exploring Splunk
 
Splunk Digital Intelligence
Splunk Digital IntelligenceSplunk Digital Intelligence
Splunk Digital Intelligence
 
Role of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery MarketRole of Tableau on the Data Discovery Market
Role of Tableau on the Data Discovery Market
 
SAP Lumira - Building visualizations
SAP Lumira - Building visualizationsSAP Lumira - Building visualizations
SAP Lumira - Building visualizations
 
SAP Lumira - Acquiring data
SAP Lumira - Acquiring dataSAP Lumira - Acquiring data
SAP Lumira - Acquiring data
 
SAP Lumira - Enriching data
SAP Lumira - Enriching dataSAP Lumira - Enriching data
SAP Lumira - Enriching data
 
Microstrategy for Retailer Company
Microstrategy for Retailer CompanyMicrostrategy for Retailer Company
Microstrategy for Retailer Company
 
SAP BusinessObjects 4.1 Web Intelligence Report Development
SAP BusinessObjects 4.1 Web Intelligence Report DevelopmentSAP BusinessObjects 4.1 Web Intelligence Report Development
SAP BusinessObjects 4.1 Web Intelligence Report Development
 
Sap BusinessObjects 4
Sap BusinessObjects 4Sap BusinessObjects 4
Sap BusinessObjects 4
 
Business objects web intelligence training tasks
Business objects web intelligence training tasksBusiness objects web intelligence training tasks
Business objects web intelligence training tasks
 
Sap business objects 4 quick start manual
Sap business objects 4 quick start manualSap business objects 4 quick start manual
Sap business objects 4 quick start manual
 

Recently uploaded

Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeMalviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
butwhat24
 
University of the Sunshine Coast degree offer diploma Transcript
University of the Sunshine Coast  degree offer diploma TranscriptUniversity of the Sunshine Coast  degree offer diploma Transcript
University of the Sunshine Coast degree offer diploma Transcript
taqyea
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
vasudha malikmonii$A17
 
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
nikita dubey$A17
 
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeRK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Alisha Pathan $A17
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
aashuverma204
 
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
shoeb2926
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
fatimaezzahraboumaiz2
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
kumkum tuteja$A17
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
sapna sharmap11
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
GaneshGanesh399816
 
University of Toronto degree offer diploma Transcript
University of Toronto  degree offer diploma TranscriptUniversity of Toronto  degree offer diploma Transcript
University of Toronto degree offer diploma Transcript
taqyea
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
depikasharma
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
chetankumar9855
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
taqyea
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
ASISHSABAT3
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
khansayyad1256
 

Recently uploaded (20)

Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model SafeMalviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
Malviya Nagar @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Jina Singh Top Model Safe
 
University of the Sunshine Coast degree offer diploma Transcript
University of the Sunshine Coast  degree offer diploma TranscriptUniversity of the Sunshine Coast  degree offer diploma Transcript
University of the Sunshine Coast degree offer diploma Transcript
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
 
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeVasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Vasant Kunj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
 
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeRK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
RK Puram @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeMahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Mahipalpur @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeNehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Nehru Place @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
 
University of Toronto degree offer diploma Transcript
University of Toronto  degree offer diploma TranscriptUniversity of Toronto  degree offer diploma Transcript
University of Toronto degree offer diploma Transcript
 
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model SafeRohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
Rohini @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Megha Singla Top Model Safe
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
 
Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ginni Singh Top Model Safe
 

Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution

  • 1. Building Cloud Self-Service Analytical Solutions By Dmitry Anoshin, Data Engineer, Abebooks (Amazon Subsidiary)
  • 3. Outline • About Myself • About Abebooks • Choosing ETL for the Cloud • Data Acquisition Patterns with Matillion ETL • Set Self-Service BI • Lessons Learned during the journey to the Cloud
  • 4. About Myself • Work with BI since 2007 • Implemented BI in Russia/Europe/Canada
  • 5. Technical Skills Matrix 2015 2010 2007 Databases (Oracle, Teradata, Vertica, Snowflake, Redshift, Mysql, Postgresql, MS SQL Server) ETL (Pentaho DI, Informatica, Matillion ETL) BI (SAP BusinessObje cts, Tableau, Microstrateg y, Pentaho BI, SAS BI) Bigdata (Cloudera Hadoop, Hive, Hue, Splunk, Hunk, ElasticSearch) Digital Marketing (GA, Piwik, Tealium, Adjust, Adobe,) Data Analytics (R, Python) 2018
  • 8. About Abebooks • Online marketplace for books, art & collectibles. • Amazon subsidiary since 2008 we are a marketplace for used books and increasingly non- book-collectibles • 350 Mln listings • 3 in ‘DB Team’ • 2 locations: Victoria, BC and Dusseldorf
  • 10. Abebooks Data Flows • Built by DBAs - db links, PL/SQL, external tables, shell scripts • even before 2015 Redshift was a strategic but ETL re-write too expensive DW Storage Layer Access LayerSource Layer ETL (PL/SQL) Ad-hoc SQL SALES INVENTORY CS SFTP
  • 11. Choosing ETL Tool for Cloud Use Cases • OLTP to S3 • S3 to Redshift • SFTP/API to Redshift • Data Transformation • Dimensional Modelling Tools • Pentaho DI • Informatica • AWD Data Pipeline • Talend • Matillion
  • 12. ETL Criteria High: • Support native Redshift driver • Easily capture from relational db, CDC • Ease of Use for BI/DW • Cover use cases • On-Premise Medium: • Support NoSQL • Company “Winner” • Deployment/Architecture • Encryption • Ease of Use for non BI/DW • Data Transformations • Management • Pricing • Performance Low: • Version Control • Linux OS • ETL Monitoring • Logging • R/Pyhton
  • 13. Why We Picked Matillion • specific redshift support, built around Redshift platform • speed of ETL operations • speed of development • wide range of data sources supported • ease of use outside of DE/DBA expertise • Native with AWS • $$$ • The biggest risk – putting our eggs in the Matillion future, betting on a small and new player.
  • 15. Abebooks Cloud Analytics Architecture Source Systems Amazon Athena Amazon EMR Amazon Redshift Abebooks DW Account DynamoDB Amazon RDS Amazon Redshift Spectrum Amazon Elastic Load Balance S3 Data Lake SQS SNS Amazon Chime Event/Notification ServicesExternal API SFTP APPs Matillion ELT EC2 M4.large 2 vCPU 8 Gb Ram Tableau Server Tableau Web Tableau Desktop Ad-hock SQL End Users Access
  • 16. Pattern 1: getting data via SFTP • Scan SFTP, get all files names, load into Redshift • Identify only new files • Load one ${file_name} per time (using IF we can choose right stream) • Insert processed ${file_name} into Redshift • Load next file Takeaways: • Python BOTO library for managing S3 • Matillion variables ${variable} • Using Matillion Iterators • Execute SQL via Python • If file is missing, try again later
  • 17. Pattern 2: getting data via API • Connect API via Python script • Get data via calls and save to CSV at EC2 • Upload CSV into S3 • Load CSV into Redshift Takeaways: • Using Python to connect external API • Using AWS KMS to encrypt credentials • Using SNS for email notification • Using Matillion system variable for ETL Logs
  • 18. Pattern 3: getting data from DynamoDB Takeaways: • Using DynamoDB component (generate COPY command for you) • You can’t easily get incremental changes, i.e. full reload • Speed depends depends on two things, the "read ratio" and the per-table "read capacity". The actual rows per hour value is going to be based on readRatio * tableReadCapacity. • 51m rows with 35% read ratio and 300 read capacity = 9 hours • 211m rows with 66% read ratio and 1500 read capacity = 4 hours • Reloading once a week
  • 19. Pattern 4: getting data from external S3* Getting data from another VPC – change policy of the bucket and you can see it in the list of buckets through Matillion
  • 20. Pattern 5: Matillion connectors for Apps
  • 21. Pattern 6: Using SQS for Triggering Job Using SQS service we can trigger almost anything in Matillion or AWS
  • 23. BI Survey • ETL was a black box • A lack of notifications • A lack of documentation and trainings • A lack of automation • No dependency between reports and ETL process • High dependency from BI/DW team
  • 24. BI Champions The BI champion is the sheriff, ensuring the townspeople (or business users) be productive and can make analytics fast and smoothly. The BI Champion is meant to be both an evangelist and subject matter expert for BI within the organization. The champion should be well versed in the data important to their team, and knowledgeable in the core BI technologies and patterns used within AbeBooks.
  • 25. ETL Monitor and notifications SNS Topic will send email. In addition we can add any number of Matillion variables Using Amazon Chime Webhook we can execute CURL command via bash script and send message to the business users
  • 26. ETL Monitor Using Matillion system variables we are tracking all events and then visualize via Tableau for end users as well as allow to create alerts in case of failure.
  • 27. ETL Trigger for Tableau Task: Refresh Tableau Data Source (Semantic Layer) & Workbooks when FACT tables are refreshed. Solution: Deploy Tableau CLI tool on EC2 Matillion and run via Bash Script
  • 28. Self-Service BI • Change Management: from report-writing culture to data-driven company • The clear Authority: Support of Executive • The analytic culture: Business executives must have a vision for analytics and the willingness to invest in the people, processes, and technologies for the long haul to ensure a successful outcome. • The right people (data engineers, BI engineers, business analysts) • The right organizational structure: BI Center of Excellence, that establishes and inculcates best practices for building analytical applications • The right data and architecture • The right tools: Redshift, Matillion and Tableau are best for Self-Serve
  • 29. Report Automatization • Central BI Portal • Reusable Tableau Data Sources a.k.a. Business Layer • Common WBR Format • Eliminate manual work • No spreadsheets and ad-hoc SQL queries • Data Discovery • ETL Integration • Friendly drag and drop GUI TL;DR: CTRL+C, CTRL+V, IT dependency • Lots of SQL and Excel routine • Each team define own style and format of report • Multiple metrics definition • No visualization, no alerts • Slow data discovery, hypothesis evaluation
  • 30. Lessons Learned from moving DW into AWS (Cloud)
  • 31. Five Points of Guidance for Redshift (SET DW) 1. Sort Keys: • Choose up to 3 columns • Ordered in increasing order of specificity, balanced with likelihood of use. • Leave INTERLEAVED sort keys for 1 year anniversary. 2. Column Encoding: • Compress all columns except for (at least) the first sort key. 3. Table Maintenance: • VACUUM and ANALYZE tables weekly (use STL_ALERT_EVENT_LOG as a guide for frequency). • ANALYZE PREDICATE COLUMNS is very useful for quick daily stats refresh. 4. Choose a Distribution Key that: • Follows the common join pattern for the table. • Evenly distributes the data across the database slices on the cluster. • DISTSTYLE ALL is a great go-to for dimension tables < ~3 million rows. • DISTSTYLE EVEN is a good fail-safe, but guarantees inter-node data redistribution. 5. Workload Management (WLM) and Query Monitoring Rules (QMR): • Start with up to 3 queues, (in addition to what Redshift provides automatically). • Put ETL in its own queue with very low active_statement count (perhaps as low as 1 or 2). Monitor commit queuing. • Split up the memory across the queues. Monitor the percent of each queue’s workload going to disk. • Expect to change WLM settings to match the workload changes (day|night, weekday|weekend)
  • 32. Lesson One. CHOOSE RIGHT MIGRATION STRATEGY Lift & Shift • Typical Approach • Move all-at-once • Target platform then evolve • Approach gets you to the cloud quickly • Relatively small barrier to learning new technology since it tends to be a close fit
  • 33. Lesson One. CHOOSE RIGHT MIGRATION STRATEGY Split & Flip • Split application into logical functional data layers • Match the data functionality with the right technology • Leverage the wide selection of tools on AWS to best fit the need • Move data in phases — prototype, learn and perfect
  • 34. Lesson Two. CHANGE YOUR MINDSET Take the time to learn • Critical to train and learn the new technologies that are being used • Easy to think about translating or converting • Made many such changes — relational vs non- relational, batch vs streaming, service based vs procedural, etc.
  • 35. Lesson Two. CHANGE YOUR MINDSET Traditional DW — faster runtime is better Cloud — if runtime is slower, it is easy to scale Reality Query #1 uses 64 cores & Query #2 uses 1 core Practical limitation to scale — fixed budget #1 RUNS IN 1 MIN RUNS IN 2 MINS DB DB#2
  • 36. Lesson Two. CHANGE YOUR MINDSET We Optimized For Cost in RedShift • What is the most amount of work that can be done using the given fixed budget? • Focus is on the total amount of work versus optimizing for a single user • Everything you use comes at a cost on the Cloud  DynomoDB performance  Redshift vs Spectrum (S3) Cost is just one example of the many mindset changes that we made
  • 37. Lesson Three. DO NOT SCARRY OPEN BLACK BOX • All business logic is hidden in legacy ETL scripts • Tradeoff between fast project and business users expectation • Learn about your business • Discover and fix the issues
  • 38. Lesson Four. BE AGILE AND INVOLVE BUSINESS Agile Benefits • See results earlier • Feedback Constantly • Serves your users • Flexibility • Quality Assurance
  • 39. Lesson Five. PLAN YOUR EVOLUTION Handling Less Efficient Queries • Provide separate cluster as a SandBox • App Developers design new queries that will fit the constraints of a hands-off operations Example. Create roll-up summary tables in RedShift SUMMARY TABLE

Editor's Notes

  1. company a 'winner' will this tool be supported and fully usable in 3-5 years will this be adopted by Amazon, will there be a community of use recommendations within Amazon (such as AWS SA) years in business, customers, profitability management - scheduling built in - intuitive views of DW processes, models, schedules - does it help someone understand DW data flows deployment / architectures - AWS better than local - linux better than windows - must be patchable platform within Amazon guideline
  2. Biggest risk was the investment in a tool from a small player Porting ETL processes from Matillion would be no less expensive than from PL/SQL and dblinks