SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thiago Paulino
Enterprise Solutions Architect,
Big Data AWS
Executive Briefing Conference
20 de Setembro de 2018
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
O que iremos abordar?
• Big Data and why organization care
• Principais desafios, O que? Como? Por que?
• Demo
• Big Data e Machine Learning
• Arquiteturas
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
VisualizaçãoVariação
Big Data é definido por váriosV’s
Volume Velocidade Variedade Veracidade Valor
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
O que os analistas dizem?
https://www.promptcloud.com
https://john-popelaars.blogspot.com
https://ww.signiant.com
https://www.linkedin.com/pulse/world-today-data-rich-information-poor-guru-p-mohapatra-pmp/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Organizações que geral valor de negócio a partir dos
seus dados tendem a superar seus concorrentes. Uma
pesquisa da Aberdeen aponta que as empresas que
implementaram um Data Lake e conseguiram extrair
valor desses dados, obtiveram um crescimento
orgânico de faturamento em 9% maior que seus
concorrentes.
24%
15%
Lider Seguidores
Crescimeno Orgânico
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
O que realmente importa: Obter valor de dados
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Mudança dos dados Adoção de analytics
Capture e armazene
novos dados na escala
de PB-EB
Faça novos tipos de
análises com baixo custo
• Machine learning
• Processamento Big data
• Análise em tempo real
• Full-text search
Novos tipos de
análises
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data Lakes Uma extensão do tradicional
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Dados relacionais e não relacionais
• Escalas emTBs–EBs
• Utilize diferente ferramentas para análise
• Armazenamento de baixo custo e análise
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Data Lakes na nuvem AWS
Analytics
• Alta durabilidade e disponibilidade na escala de ExaBytes
• Mais Seguro, Compliance e capacidade de auditoria dos
acessos e execuções
• Controle fino a nível de objeto armazenado
• Melhor desempenho ao acessar os dados
• Difersas maneiras de enviar e armazenar seus dados
• 2x mais integrações com parceiros
• Analyze with broadest set of analytics & ML services
Machine
learning
Real-time dataOn-premises
Data Lake
on AWS
movementdata movement
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Managed ML Service
Deep Learning AMIs
Video and Image Recognition
Conversational Interfaces
Deep-Learning Video Camera
Natural Language Processing
LanguageTranslation
Speech Recognition
Text-to-Speech
Interactive Analysis
Hadoop & Spark
Data Warehousing
Full-text search
Real-time analytics
Dashboards & Visualizations
Dedicated Network connection
Secure appliances
Ruggedized Shipping Container
Database migration
Connect Devices to AWS
Real-time Data Streams
Real-time Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
Data Lakes, Analytics, e soluções de IoT da AWS
Maior quantidade de serviços analíticos
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Componentes Data Lake
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight
Central Storage
Secure, cost-effective
Storage in Amazon S3
Metadata Acesso de usuários
Segurança/Gorvernança
Movimentação de dados Analytics e Machine Learning
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Componentes Data Lake
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight A
Central Storage
Secure, cost-effective
Storage in Amazon S3
Glue ETL
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Desafios comum do Big Data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Qual ferramenta eu deveria usar?
Uma ferramenta não atende
todos os desafios…
Organize seu repert��rio de
ferramentas e use cada uma
delas para sua devida finalidade
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Finalidade específica.
Ferramenta certa pro
trabalho certo.
Qual ferramenta eu deveria usar?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Qual ferramenta eu deveria usar?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Qual Dado eu tenho?
Gartner:
“Até 2018, 80% dos data lakes não terão um recursos eficazes de
gerenciamento efetivo de metadados, tornando-os ineficientes"
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Job AuthoringData Catalog Job Execution
CompatívelApache Hive Metastore
Integrado com os serviços AWS
CrawlingAutomático
Discover
Gera automaticamente
Código ETL
Python and Apache Spark
edit, debug and compartilhe
Develop
Execução sem servidores
(Servless)
Agendamento flexível
Monitoramento e alerta
Deploy
Qual Dado eu tenho? – AWS Glue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
On-premises data
Web app data
Amazon RDS
Other databases
Streaming data
Your data
AMAZON
QUICKSIGHT
Qual Dado eu tenho?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Outras maneiras de popular seu catálogo de
dados
Chama de api AWS Glue CreateTable
Crie tabelas manualmente Execute Hive DDL statement
Apache Hive
Metastore
AWS GLUE ETL AWS GLUE
DATA CATALOG
Importe do Apache hive o metastore
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
MAIS Importante: Selecionando um Agile Framework? Comece com uma ferramenta que servirá para o propósito?
Experiência, teste e avaliação, adotar a ferramenta.
Vamos olhar para um exemplo:
Como Eu posso começar?
Evolução do Netflix Data Pipeline
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agregue e carregue eventos para o Hadoop
/ Hive para processamento em lote
EXPERIMENTE coisas novas
Batch  Batch+ Real-time
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chukwa front-end  Kafka
Kafka front-endKafka
ADOTE sua solução
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
“O Amazon Kinesis Streams processa vários terabytes de dados
de log por dia, mas os eventos aparecem em nossa análise em
segundos”, Bennett. “Podemos descobrir e responder a
problemas em tempo real, garantindo alta disponibilidade e uma
ótima experiência ao cliente.” ”
FOCO no valor do negócio
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Escolha um ambiente o qual lhe permita testar diversas ferramentas
Concentre-se em ferramentas que você pode fazer o máximo possível com foco
em análise…
AGILIDADE para o negócio
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agilidade no analytics
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agilidade - Hadoop/Spark Analytics
• Processamento distribuído
• Diversos tipos de análise
• Batch/Script (Hive/Pig)
• Interactive (Spark, Presto)
• Real-time (Spark)
• Machine Learning (Spark)
• NoSQL (HBase)
• Para diversos casos de uso
• Log and clickstream analysis
• Machine learning
• Real-time analytics
• Large-scale analytics
• Genomics
• ETL
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agilidade - Hadoop/Spark Analytics na AWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Gerenciado: Hadoop/Spark
Object Storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon S3 – Fonte da verdade, multiplos
clusters
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
HDF
S
EC2 Instance Memory
Intermediates stored
on local disk or
HDFS
Local
HDF
S
EC2 Instance Memory
Intermediates stored
on local disk or
HDFS
Local
Transient ETL Job
Source ofTruth
HDFS
HDFS
HDFS
Local Intermediate HDFS/Storage
Local Intermediate HDFS/Storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Catalo de dados comum entre os clusters
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
Transient ETL Job
Source ofTruth
HDFS
Describes Data in S3
MySQL DB
instance
Customershaveoptions
Glue Data
Catalog
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Amazon Athena é um serviço de querie
interativo que torna fácil analisar dados
armazenados diretamente no Amazon S3
usando SQL like.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Demo
Catálogo e análise de dados.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Machine Learning e Big Data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Big Data acima de Machine Learning
Better
Decisions
Object Storage
Databases
Data warehouse
Streaming analytics
BI
Hadoop
Spark/Presto
Elasticsearch
Better
Products Machine Learning
Deep Learning/ AI
More
Users
More
Data
Click stream
User activity
Generated content
Purchases
Clicks
Likes
Sensor data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Agilidade no Machine Learning
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Machine Learning precisa de novas ferramentas
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
Catalog
Central
Storage
Sagemaker
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
FINRA – Dado é nossa principal missão
Reconstruir um Mercado com trilhões de
eventos
• Dados de corretoras e trocas
• Ações, Opções, Renda Fixa
• Construir um gráfico de eventos de ordem de
mercado
Analisar o dado focado em fraude
financeira
• Negociação com informações privilegiadas,
manipulação de produtos, e muito mais
• Procurar uma agulha no palheiro
4
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
FINRA – Escalando o Data Lake
Database1
Storage
Query/Compute
Catalog
Database2
Storage
Query/Compute
Catalog
Databasen
Storage
Query/Compute
Catalog
Storage
Query/
Compute
Catalog
EMR Spark LambdaEMR Presto EMR HBase
herd Hive
metastore
FINRA in Data Center FINRA in AWS
Scales Silo
Amazon
S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Permita o uso de Machine Learning no Data Lake
Data
Scientist
Logical ‘Database’
EMR Cluster
Fonte única
de dados
Spark Cluster
DS-in-a-box
AuthN
Data
Scientist
Data
Scientist
Catalog
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UDSP – Invetário – Muito mais além de R
• R 3.2.5, Python (2.7.12 and 3.4.3)
• Packages
• R: 300+ Python: 100+
• Tools for Building Packages
• gcc, gfortran, make, java, maven, ant…
• IDEs
• Jupyter, RStudio Server
• Deep Learning
• CUDA, CuDNN (if GPU present)
• Theano, Caffe, Torch
• TensorFlow
16
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Melhores práticas
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Principais pontos
• Remover aclopamento, garantindo alto desempenho
• Armazenamento, Analytics, Gestão de Metadata, etc..
• Futuro da prova de análise (Como lidar com novos ambientes)
• Escolher a melhor ferramenta para cada tarefa
• Elasticidade e múltiplos clusters para fins dedicados
• Substituir o planejamento de capacidade por um modelo de consumo
• Não se esqueça de gerenciar o metado
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use o serviço de armazenamento correto
Dado estruturado→ schema fixo, JSON, chave-valor
Padrões de acesso → Armazene os dados no formato em que serão acessados.
Características do dado→ quente, morno, frio
Custo → Melhor custo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Obrigado!

More Related Content

Big Data - EBC on the road Brazil Edition [Portuguese]

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thiago Paulino Enterprise Solutions Architect, Big Data AWS Executive Briefing Conference 20 de Setembro de 2018
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark O que iremos abordar? • Big Data and why organization care • Principais desafios, O que? Como? Por que? • Demo • Big Data e Machine Learning • Arquiteturas
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark VisualizaçãoVariação Big Data é definido por váriosV’s Volume Velocidade Variedade Veracidade Valor
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark O que os analistas dizem? https://www.promptcloud.com https://john-popelaars.blogspot.com https://ww.signiant.com https://www.linkedin.com/pulse/world-today-data-rich-information-poor-guru-p-mohapatra-pmp/
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Organizações que geral valor de negócio a partir dos seus dados tendem a superar seus concorrentes. Uma pesquisa da Aberdeen aponta que as empresas que implementaram um Data Lake e conseguiram extrair valor desses dados, obtiveram um crescimento orgânico de faturamento em 9% maior que seus concorrentes. 24% 15% Lider Seguidores Crescimeno Orgânico *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence O que realmente importa: Obter valor de dados
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Mudança dos dados Adoção de analytics Capture e armazene novos dados na escala de PB-EB Faça novos tipos de análises com baixo custo • Machine learning • Processamento Big data • Análise em tempo real • Full-text search Novos tipos de análises
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data Lakes Uma extensão do tradicional Data warehouse Business intelligence OLTP ERP CRM LOB • Dados relacionais e não relacionais • Escalas emTBs–EBs • Utilize diferente ferramentas para análise • Armazenamento de baixo custo e análise Devices Web Sensors Social Data lake Big data processing, real-time, machine learning
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data Lakes na nuvem AWS Analytics • Alta durabilidade e disponibilidade na escala de ExaBytes • Mais Seguro, Compliance e capacidade de auditoria dos acessos e execuções • Controle fino a nível de objeto armazenado • Melhor desempenho ao acessar os dados • Difersas maneiras de enviar e armazenar seus dados • 2x mais integrações com parceiros • Analyze with broadest set of analytics & ML services Machine learning Real-time dataOn-premises Data Lake on AWS movementdata movement
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Managed ML Service Deep Learning AMIs Video and Image Recognition Conversational Interfaces Deep-Learning Video Camera Natural Language Processing LanguageTranslation Speech Recognition Text-to-Speech Interactive Analysis Hadoop & Spark Data Warehousing Full-text search Real-time analytics Dashboards & Visualizations Dedicated Network connection Secure appliances Ruggedized Shipping Container Database migration Connect Devices to AWS Real-time Data Streams Real-time Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement Data Lakes, Analytics, e soluções de IoT da AWS Maior quantidade de serviços analíticos
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Componentes Data Lake Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Central Storage Secure, cost-effective Storage in Amazon S3 Metadata Acesso de usuários Segurança/Gorvernança Movimentação de dados Analytics e Machine Learning
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Componentes Data Lake Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight A Central Storage Secure, cost-effective Storage in Amazon S3 Glue ETL
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Desafios comum do Big Data
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Qual ferramenta eu deveria usar? Uma ferramenta não atende todos os desafios… Organize seu repertório de ferramentas e use cada uma delas para sua devida finalidade
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Finalidade específica. Ferramenta certa pro trabalho certo. Qual ferramenta eu deveria usar?
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Qual ferramenta eu deveria usar?
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Qual Dado eu tenho? Gartner: “Até 2018, 80% dos data lakes não terão um recursos eficazes de gerenciamento efetivo de metadados, tornando-os ineficientes" Data Lake on AWS Storage | Archival Storage | Data Catalog
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Job AuthoringData Catalog Job Execution CompatívelApache Hive Metastore Integrado com os serviços AWS CrawlingAutomático Discover Gera automaticamente Código ETL Python and Apache Spark edit, debug and compartilhe Develop Execução sem servidores (Servless) Agendamento flexível Monitoramento e alerta Deploy Qual Dado eu tenho? – AWS Glue
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark On-premises data Web app data Amazon RDS Other databases Streaming data Your data AMAZON QUICKSIGHT Qual Dado eu tenho?
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Outras maneiras de popular seu catálogo de dados Chama de api AWS Glue CreateTable Crie tabelas manualmente Execute Hive DDL statement Apache Hive Metastore AWS GLUE ETL AWS GLUE DATA CATALOG Importe do Apache hive o metastore
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark MAIS Importante: Selecionando um Agile Framework? Comece com uma ferramenta que servirá para o propósito? Experiência, teste e avaliação, adotar a ferramenta. Vamos olhar para um exemplo: Como Eu posso começar? Evolução do Netflix Data Pipeline
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agregue e carregue eventos para o Hadoop / Hive para processamento em lote EXPERIMENTE coisas novas Batch  Batch+ Real-time
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chukwa front-end  Kafka Kafka front-endKafka ADOTE sua solução
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “O Amazon Kinesis Streams processa vários terabytes de dados de log por dia, mas os eventos aparecem em nossa análise em segundos”, Bennett. “Podemos descobrir e responder a problemas em tempo real, garantindo alta disponibilidade e uma ótima experiência ao cliente.” ” FOCO no valor do negócio
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Escolha um ambiente o qual lhe permita testar diversas ferramentas Concentre-se em ferramentas que você pode fazer o máximo possível com foco em análise… AGILIDADE para o negócio
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agilidade no analytics Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agilidade - Hadoop/Spark Analytics • Processamento distribuído • Diversos tipos de análise • Batch/Script (Hive/Pig) • Interactive (Spark, Presto) • Real-time (Spark) • Machine Learning (Spark) • NoSQL (HBase) • Para diversos casos de uso • Log and clickstream analysis • Machine learning • Real-time analytics • Large-scale analytics • Genomics • ETL YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agilidade - Hadoop/Spark Analytics na AWS YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS Amazon S3 Amazon EMR Gerenciado: Hadoop/Spark Object Storage
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon S3 – Fonte da verdade, multiplos clusters Amazon S3 Interactive Spark Cluster Amazon EMR Amazon EMR HDFS HDF S EC2 Instance Memory Intermediates stored on local disk or HDFS Local HDF S EC2 Instance Memory Intermediates stored on local disk or HDFS Local Transient ETL Job Source ofTruth HDFS HDFS HDFS Local Intermediate HDFS/Storage Local Intermediate HDFS/Storage
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Catalo de dados comum entre os clusters Amazon S3 Interactive Spark Cluster Amazon EMR Amazon EMR HDFS Transient ETL Job Source ofTruth HDFS Describes Data in S3 MySQL DB instance Customershaveoptions Glue Data Catalog
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Amazon Athena é um serviço de querie interativo que torna fácil analisar dados armazenados diretamente no Amazon S3 usando SQL like.
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Demo Catálogo e análise de dados.
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Machine Learning e Big Data
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Big Data acima de Machine Learning Better Decisions Object Storage Databases Data warehouse Streaming analytics BI Hadoop Spark/Presto Elasticsearch Better Products Machine Learning Deep Learning/ AI More Users More Data Click stream User activity Generated content Purchases Clicks Likes Sensor data
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agilidade no Machine Learning Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Machine Learning precisa de novas ferramentas Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage Sagemaker
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark FINRA – Dado é nossa principal missão Reconstruir um Mercado com trilhões de eventos • Dados de corretoras e trocas • Ações, Opções, Renda Fixa • Construir um gráfico de eventos de ordem de mercado Analisar o dado focado em fraude financeira • Negociação com informações privilegiadas, manipulação de produtos, e muito mais • Procurar uma agulha no palheiro 4
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark FINRA – Escalando o Data Lake Database1 Storage Query/Compute Catalog Database2 Storage Query/Compute Catalog Databasen Storage Query/Compute Catalog Storage Query/ Compute Catalog EMR Spark LambdaEMR Presto EMR HBase herd Hive metastore FINRA in Data Center FINRA in AWS Scales Silo Amazon S3
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Permita o uso de Machine Learning no Data Lake Data Scientist Logical ‘Database’ EMR Cluster Fonte única de dados Spark Cluster DS-in-a-box AuthN Data Scientist Data Scientist Catalog
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. UDSP – Invetário – Muito mais além de R • R 3.2.5, Python (2.7.12 and 3.4.3) • Packages • R: 300+ Python: 100+ • Tools for Building Packages • gcc, gfortran, make, java, maven, ant… • IDEs • Jupyter, RStudio Server • Deep Learning • CUDA, CuDNN (if GPU present) • Theano, Caffe, Torch • TensorFlow 16
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Melhores práticas
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Principais pontos • Remover aclopamento, garantindo alto desempenho • Armazenamento, Analytics, Gestão de Metadata, etc.. • Futuro da prova de análise (Como lidar com novos ambientes) • Escolher a melhor ferramenta para cada tarefa • Elasticidade e múltiplos clusters para fins dedicados • Substituir o planejamento de capacidade por um modelo de consumo • Não se esqueça de gerenciar o metado
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use o serviço de armazenamento correto Dado estruturado→ schema fixo, JSON, chave-valor Padrões de acesso → Armazene os dados no formato em que serão acessados. Características do dado→ quente, morno, frio Custo → Melhor custo
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Obrigado!