SlideShare a Scribd company logo
Case Study
Elasticsearch Ingest @ Cisco Intercloud
Agenda
• Express Overview of StreamSets Data Collector
Kirit Basu, Product Management, StreamSets
• Introduction to Elastic
CatherineJohnson, Solutions Architect, Elastic
• Implementing Shipped Analytics Using StreamSets and Elasticsearch
Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group
Group
Performance Management
for Data Flows
© 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc.
History Founded by Informatica and Cloudera veterans.
Mission Bring operational excellence to managing data in motion.
Challenge Move data efficiently and with quality in the face of change.
Solution Open source software enabling performance management of
data flows.
Use cases Hadoop Ingest, Search Ingest, Message Broker Enablement,
Log Shipping, Cloud Migration, IoT, ...
Momentum Thousands of downloads, hundreds of companies using.
StreamSets At a Glance
© 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc.
StreamSets Data Collector
Adaptable Flows for Efficiency
Design ingest pipelines with minimal coding and
maximum flexibility.
Data Flow KPIs for Control
Monitor and act on data flow performance and
data quality.
Containerized Architecture for Agility
Operate continuously in the face of constant
change.
Open source software for the rapid
development and reliably operation of
complex data flows.
Get Started with StreamSets
http://streamsets.com/opensource
https://github.com/streamsets/datacollector/
#streamsets
March 2016
Introduction to Elastic
Software that makes massive amounts of
structured and unstructured data usable for
search, logging, analytics, and more in mission
critical systems and applications
Examples: Elastic Stack Use Cases
Logging
IT Operations
Application Management
Security Analytics
Analytics Search
Marketing Insights
Business Development
Customer Sentiment
Website Search
Internal/Intranet Search
URL Search
Internal Systems/Applications External Systems/Applications
Developers IT/Ops Business Users
Elastic Solves Many Developer Use Cases
Social
Location
User-
Activity
Machine
(Log files)
Documents
Handles Complex
& Diverse Data
Meets Today’s Core
Developer Requirements
Developer requirements
Many users / use cases
Fast data processing
Large data volumes
Data quality & integrity
Cross-source insights
Solves Critical
Use Cases
Application
Search
Embedded
Search
Logging
Security
Analytics
Operational
Analytics
More …
The Elastic Stack
Ingest
Store, Index,
& Analyze
User Interface
Plugins Monitoring Security Alerting
Elastic Cloud: Hosted Elasticsearch
Thank you!
www.elastic.co
Implementing Shipped Analytics Using
Streamsets and Elasticsearch
Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group
Tymofii Polekhin, Software Engineer
Agenda
MANTL & Shipped
Shipped Analytics for Shipped
Why we need Shipped Analytics?
Archtecture and Data Flow
Streamsets Pipelines
End to end dataflow and performance with Elasticsearch
Benefits of Streamsets
Demo
Microservices managed and scaled separately
Microservices managed by Mesos in a single platform
Microservices architecture for Mesos frameworks and other components
CIS/AWS/Metastack/vSphere/UCS…
Terraform
Spark
Executor N
Spark
Executor 1
Spark
Scheduler
Kafka
Broker N
Kafka
Broker 1
Kafka
Scheduler
Docker Docker
TraefikMicroservices …
REST API
REST API
Scripted provisioning
Direct provisioning
Policy, Auto-scaling
VM1
or
BM1
VM2
or
BM2
VM3
or
BM3
VM4
or
BM4
VM5
or
BM5
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Shipped Analytics Cluster
Probe
Probe
Probe
• Both Shipped and Shipped Analytics running on MANTL
• Shipped Analytics – infra and app logs and metrics analysis
mesos-master
mesos-slave
marathon
zookeeper
consul
syslog
frameworks
collectd
cpu
memory
interface
disk
df
load
docker
zookeeper
marathon
mesos-slave
mesos-master
CollectD and Filebeat processes
running on every node in the
cluster.
Infrastructure Layer
Zookeeper Cluster Consul Cluster
Mesos Cluster
Marathon Framework
Kafka Cluster
topbeat filebeat
journalbeat dockerbeat
• Experimenting with Elastic Beats (unified arch., closer to micro-services model)
• Elastic Beats to replace collectd plugins and cAdvisor for containers
<file | top | *>beat collectd
logstash
DNS SRV beats.logstash.service.consul
Data normalization
Tagging
Cluster name decoration
Logstash is a single process per
cluster, discoverable with
standard inter-cluster
discovery mechanism, which
will get metrics from collectd
on every slave and logs from
filebeat on every slave,
normalize data and send to
desired output
DNS SRV collectd.logstash.service.consul
NOTE: currently Logstash is running in Docker container on every node, will be moving to Filebeat and Logstash mesos framework soon
logstash
Kafka 0.9.0.0 supports SSL
authentication and data
encryption for producers.
This is must-have security
when sending data to external
destination through WAN.
Sending data to central SA
cluster for long-term analytics
SSL encryption
WAN
kafka
SSL authentication
Shipped cluster
Shipped Analytics
StreamSets running in Mesos
Spark Cluster mode processing
data from multiple source
Shipped clusters and storing it
in Elasticsearch cluster.
kafka
elasticsearch
Streamsets Spark Streaming Cluster
Spark Job
Master instance
Spark Job Spark Job Spark Job
Lambda Reference Architecture
Monitoring / Analytics Cluster (local, Texas-3)
Global Monitoring / Analytics Cluster (global, Texas-1)
Monitoring / Analytics Cluster (local, Ams. -1 )
Monitoring / Analytics Cluster (local, Lon.-1)
Local components and deployment is the same as global, just smaller
Real-time and batch processing (Lambda), anomaly detection, visualization
SSL
Kafka
SSL
SSL
MQTT
Divide nodes by role for more
stable cluster operation and
ease of scalability
3 master/search nodes
5 live data nodes
3 archive data nodes
master/
search
master/
search
master/
search
live/
data
live/
data
live/
data
live/
data
live/
data
archive
/data
archive
/data
archive
/data
Shards=5 Replicas=4 Shards=5 Replicas=1
archive
/data
archive
/data
CPU=4
RAM=30GB
HDD=4TB
CPU=4
RAM=30GB
HDD=4TB
CPU=4
RAM=30GB
HDD=4TB
Streamsets pipelines process
incoming messages and
transform them according to
business logic requirements,
normalizing metrics and
parsing log lines; popping up
important information using
GROK filters or scripts.
Cluster Name
Decorator
Fields Type
Normalization
Metrics/Logs
Stream Splitter
ES Logs Output
General GROK
Filters
Float Value
Truncate
ES Metrics
Output
Shipped GROK
Logic
Marathon
• Streamsets instances running in docker containers in Marathon
o Easy deployment and scaling
o Fast upgrade to newer version
• Issues we faced with this approach:
o Containers were killed by marathon
o Needed to re-import pipeline every time we launch container
Marathon
• Working with Streamsets trying to resolve the OOM issue we increased
container memory and SDC heap size
• At first, all looked normal and we thought that it was just
starving on resources, but several days later we had SDC killed again
• We increased MEM and HEAP even more – to 16G, but we bought just
another day or two before is was killed again
• Looked like SDC heap were constantly filling with data
that don’t go away and eventually it kills the container
• Also GC was working hard and sometimes we got freezes
up to 60 seconds
• Decided to move out from Docker
Marathon
• Streamsets reading JSON messages from Kafka cluster and output
to Elasticsearch cluster
o De-serializing and serializing JSON was very slow with single
threaded process
o Consuming from Kafka performance test showed:
 JSON format: 5k records/sec avg
 Text format: 50k records/sec avg
 Binary format: 250k records/sec avg
• Streamsets team were very proactive with this issues
and in 2 days we received a fix for multi-threaded JSON parsing
o New testing showed:
 JSON format: 66k records/sec avg
Marathon
• Streamsets has never failed because of any internal logic bugs
but we kept seeing this oom-killer popping up and recovering was
not automated
• We decided to leave docker and run SDC natively on host,
still using Marathon for scaling and failover
• Without docker, we now can upload our pipeline on SDC startup,
and it will start working as soon as instance has loaded
We can freely scale up/down whenever we need
Also, we got rid of oom-killer issue as well
Each one of our 3 SDC instances already processes ~3B messages, with no issues!
• Streamsets pipeline consume metrics gathered by collectd
and logs gathered by logstash from 4 different clusters
(including self), transform and decorate them and send to
Elasticsearch for storage and analytics.
• First of all we consume messages from Kafka topic at
average of 5,000 messages per second. The consumer
itself parses JSON-format and sends further.
• Next stage is a JavaScript script that decorates messages
with cluster name, based on a instance hostname in that
message
• Finally, we exclude Marathon events from stream sending
them directly to ES
• Next stage will splits stream into 2 parts: logs and metrics
• Metrics are send straight to ES without any transformation
• Logs are the most interesting part:
o We pop docker container logs from stream and
delete “time” field that’s duplicate timstamp and
sending them to ES
o We separate logs from specific clusters, because we
need to apply special logic for them
o Separation is done though mapping IP’s to clusters in
the pipeline realtime
• Collecting data from several Mesos clusters and need to
correlate container metrics with it’s logs
• Use appID taskID and runID to identify specific containers
logs
• Container logs itself have all three of this, while mesos-
master and mesos-agent logs lacks runID
• All unidentified data is discarded
Current ShippedAnalytics prod cluster configuration:
Kafka Cluster: 7 brokers with 4CPU and 16GB RAM each
Logstash topic for all incoming messages with 7 partitions and 2 replicas
Current data flow is avg 5000 messages/sec to Kafka
Current data size is avg 1,2MB/sec to Kafka
Streamsets: 3 instances with identical pipeline configuration reading from Kafka cluster
7 partitions are split between 3 instances like 3/2/2
All 3 instances running natively on host (non-docker) with Marathon
Marathon restarts failed instance with automatic pipeline upload and start
Elasticsearch: 7 nodes with 4CPU, 16GB RAM and 2TB storage each
Each metrics is written to its own index, total of 15 indexes
Each index has 5 primary shards and 5 replica shards
Total Doc count: 17,5B Total Doc size: 3.84TB
1 Day rate count: ~500M 1 Day rate size: ~120GB
Streamsets is a great product to work with, also team is super helpful and works fast
• Lots of input and output connectors, huge processing capabilities
• Very intuitive and rich User Interface
• Easy to create pipelines visually, instead of writing code
• Clear data flow paths
• Small resource consumption compared to performance
• Easily can handle up to 10k records/sec to Elasticsearch with 1CPU 2GB RAM
• Simple configuration and deployment process
• Opensource(!)
• Fast logic changes with minimum downtime
• Preview mode(!) – check every stage before throwing all your data it
• Rich data transformation possibilities
• GROK filters – easy to migrate from Logstash
• Smart Errors handling
• Reliable: not once did Streamets crashed by itself – only Docker, Marathon, Mesos issues
Thank You!

More Related Content

What's hot

Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
Databricks
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
confluent
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
Ana Rebelo
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
Serg Masyutin
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
HostedbyConfluent
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
Yahoo Developer Network
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Julien Le Dem
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
SingleStore
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Databricks
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.io
Piotr Czarnas
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
 

What's hot (20)

Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.io
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 

Viewers also liked

Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
kawamuray
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big Data
Streamsets Inc.
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
Cask Data
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
Andreas Chatzakis
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
DataStax
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
Anirvan Chakraborty
 
Demystifying salesforce for developers
Demystifying salesforce for developersDemystifying salesforce for developers
Demystifying salesforce for developers
Heitor Souza
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
Anant Rustagi
 
Extreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes WebinarExtreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes Webinar
Salesforce Developers
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Cassandra Day Atlanta 2016 - Monitoring Cassandra
Cassandra Day Atlanta 2016  - Monitoring CassandraCassandra Day Atlanta 2016  - Monitoring Cassandra
Cassandra Day Atlanta 2016 - Monitoring Cassandra
aaronmorton
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Spark Summit
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
Brian Brazil
 
Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by Salesforce
Thinqloud
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Data Con LA
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
Salesforce REST API
Salesforce  REST API Salesforce  REST API
Salesforce REST API
Bohdan Dovhań
 

Viewers also liked (20)

Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big Data
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Demystifying salesforce for developers
Demystifying salesforce for developersDemystifying salesforce for developers
Demystifying salesforce for developers
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
Extreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes WebinarExtreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes Webinar
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Cassandra Day Atlanta 2016 - Monitoring Cassandra
Cassandra Day Atlanta 2016  - Monitoring CassandraCassandra Day Atlanta 2016  - Monitoring Cassandra
Cassandra Day Atlanta 2016 - Monitoring Cassandra
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 
Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by Salesforce
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Salesforce REST API
Salesforce  REST API Salesforce  REST API
Salesforce REST API
 

Similar to Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud

Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
Torsten Steinbach
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
Yoni Farin
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
Amazon Web Services
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
Torsten Steinbach
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Amazon Web Services
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
Mathew Beane
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
SolarWinds Loggly
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Cisco DevNet
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
Daniel Hochman
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
Ken Owens
 

Similar to Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud (20)

Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 

Recently uploaded

BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
RajdeepPaul47
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
sapna sharmap11
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
fatimaezzahraboumaiz2
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
Amazon Web Services Korea
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript
taqyea
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
vasudha malikmonii$A17
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
sanjay singh
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
jiya khan$A17
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
shoeb2926
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeLaxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
yogita singh$A17
 
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
nehadubay1
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
kumkum tuteja$A17
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
bookmybebe1
 
Sunshine Coast University diploma
Sunshine Coast University diplomaSunshine Coast University diploma
Sunshine Coast University diploma
cwavvyy
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
kamli sharma#S10
 

Recently uploaded (20)

BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)Sin Involves More Than You Might Think (We'll Explain)
Sin Involves More Than You Might Think (We'll Explain)
 
EGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithmEGU2020-10385_presentation LSTM algorithm
EGU2020-10385_presentation LSTM algorithm
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S03] Amazon DynamoDB design puzzlers
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Australian Catholic University degree offer diploma Transcript
Australian Catholic University  degree offer diploma TranscriptAustralian Catholic University  degree offer diploma Transcript
Australian Catholic University degree offer diploma Transcript
 
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model SafePitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
Pitampura @ℂall @Girls ꧁❤ 9873777170 ❤꧂Fabulous sonam Mehra Top Model Safe
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model SafeLajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
Lajpat Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Ruhi Singla Top Model Safe
 
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
Greater Kailash @ℂall @Girls ꧁❤ 9873777170 ❤꧂Glamorous sonam Mehra Top Model ...
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeLaxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Laxmi Nagar @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model SafeDaryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
Daryaganj @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Yogita Mehra Top Model Safe
 
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model SafeNoida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
Noida Extension @ℂall @Girls ꧁❤ 9873940964 ❤꧂VIP Vishakha Singla Top Model Safe
 
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model SafeKarol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
Karol Bagh @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Jya Khan Top Model Safe
 
Sunshine Coast University diploma
Sunshine Coast University diplomaSunshine Coast University diploma
Sunshine Coast University diploma
 
Maruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekhoMaruti Wagon R on road price in Faridabad - CarDekho
Maruti Wagon R on road price in Faridabad - CarDekho
 

Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud

  • 1. Case Study Elasticsearch Ingest @ Cisco Intercloud
  • 2. Agenda • Express Overview of StreamSets Data Collector Kirit Basu, Product Management, StreamSets • Introduction to Elastic CatherineJohnson, Solutions Architect, Elastic • Implementing Shipped Analytics Using StreamSets and Elasticsearch Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group Group
  • 4. © 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc. History Founded by Informatica and Cloudera veterans. Mission Bring operational excellence to managing data in motion. Challenge Move data efficiently and with quality in the face of change. Solution Open source software enabling performance management of data flows. Use cases Hadoop Ingest, Search Ingest, Message Broker Enablement, Log Shipping, Cloud Migration, IoT, ... Momentum Thousands of downloads, hundreds of companies using. StreamSets At a Glance
  • 5. © 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc. StreamSets Data Collector Adaptable Flows for Efficiency Design ingest pipelines with minimal coding and maximum flexibility. Data Flow KPIs for Control Monitor and act on data flow performance and data quality. Containerized Architecture for Agility Operate continuously in the face of constant change. Open source software for the rapid development and reliably operation of complex data flows.
  • 6. Get Started with StreamSets http://streamsets.com/opensource https://github.com/streamsets/datacollector/ #streamsets
  • 8. Software that makes massive amounts of structured and unstructured data usable for search, logging, analytics, and more in mission critical systems and applications
  • 9. Examples: Elastic Stack Use Cases Logging IT Operations Application Management Security Analytics Analytics Search Marketing Insights Business Development Customer Sentiment Website Search Internal/Intranet Search URL Search Internal Systems/Applications External Systems/Applications Developers IT/Ops Business Users
  • 10. Elastic Solves Many Developer Use Cases Social Location User- Activity Machine (Log files) Documents Handles Complex & Diverse Data Meets Today’s Core Developer Requirements Developer requirements Many users / use cases Fast data processing Large data volumes Data quality & integrity Cross-source insights Solves Critical Use Cases Application Search Embedded Search Logging Security Analytics Operational Analytics More …
  • 11. The Elastic Stack Ingest Store, Index, & Analyze User Interface Plugins Monitoring Security Alerting Elastic Cloud: Hosted Elasticsearch
  • 13. Implementing Shipped Analytics Using Streamsets and Elasticsearch Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group Tymofii Polekhin, Software Engineer
  • 14. Agenda MANTL & Shipped Shipped Analytics for Shipped Why we need Shipped Analytics? Archtecture and Data Flow Streamsets Pipelines End to end dataflow and performance with Elasticsearch Benefits of Streamsets Demo
  • 15. Microservices managed and scaled separately Microservices managed by Mesos in a single platform Microservices architecture for Mesos frameworks and other components CIS/AWS/Metastack/vSphere/UCS… Terraform Spark Executor N Spark Executor 1 Spark Scheduler Kafka Broker N Kafka Broker 1 Kafka Scheduler Docker Docker TraefikMicroservices … REST API REST API Scripted provisioning Direct provisioning Policy, Auto-scaling VM1 or BM1 VM2 or BM2 VM3 or BM3 VM4 or BM4 VM5 or BM5
  • 17. Shipped Analytics Cluster Probe Probe Probe • Both Shipped and Shipped Analytics running on MANTL • Shipped Analytics – infra and app logs and metrics analysis
  • 19. Infrastructure Layer Zookeeper Cluster Consul Cluster Mesos Cluster Marathon Framework Kafka Cluster topbeat filebeat journalbeat dockerbeat • Experimenting with Elastic Beats (unified arch., closer to micro-services model) • Elastic Beats to replace collectd plugins and cAdvisor for containers
  • 20. <file | top | *>beat collectd logstash DNS SRV beats.logstash.service.consul Data normalization Tagging Cluster name decoration Logstash is a single process per cluster, discoverable with standard inter-cluster discovery mechanism, which will get metrics from collectd on every slave and logs from filebeat on every slave, normalize data and send to desired output DNS SRV collectd.logstash.service.consul NOTE: currently Logstash is running in Docker container on every node, will be moving to Filebeat and Logstash mesos framework soon
  • 21. logstash Kafka 0.9.0.0 supports SSL authentication and data encryption for producers. This is must-have security when sending data to external destination through WAN. Sending data to central SA cluster for long-term analytics SSL encryption WAN kafka SSL authentication Shipped cluster Shipped Analytics
  • 22. StreamSets running in Mesos Spark Cluster mode processing data from multiple source Shipped clusters and storing it in Elasticsearch cluster. kafka elasticsearch Streamsets Spark Streaming Cluster Spark Job Master instance Spark Job Spark Job Spark Job
  • 23. Lambda Reference Architecture Monitoring / Analytics Cluster (local, Texas-3) Global Monitoring / Analytics Cluster (global, Texas-1) Monitoring / Analytics Cluster (local, Ams. -1 ) Monitoring / Analytics Cluster (local, Lon.-1) Local components and deployment is the same as global, just smaller Real-time and batch processing (Lambda), anomaly detection, visualization SSL Kafka SSL SSL MQTT
  • 24. Divide nodes by role for more stable cluster operation and ease of scalability 3 master/search nodes 5 live data nodes 3 archive data nodes master/ search master/ search master/ search live/ data live/ data live/ data live/ data live/ data archive /data archive /data archive /data Shards=5 Replicas=4 Shards=5 Replicas=1 archive /data archive /data CPU=4 RAM=30GB HDD=4TB CPU=4 RAM=30GB HDD=4TB CPU=4 RAM=30GB HDD=4TB
  • 25. Streamsets pipelines process incoming messages and transform them according to business logic requirements, normalizing metrics and parsing log lines; popping up important information using GROK filters or scripts. Cluster Name Decorator Fields Type Normalization Metrics/Logs Stream Splitter ES Logs Output General GROK Filters Float Value Truncate ES Metrics Output Shipped GROK Logic
  • 26. Marathon • Streamsets instances running in docker containers in Marathon o Easy deployment and scaling o Fast upgrade to newer version • Issues we faced with this approach: o Containers were killed by marathon o Needed to re-import pipeline every time we launch container
  • 27. Marathon • Working with Streamsets trying to resolve the OOM issue we increased container memory and SDC heap size • At first, all looked normal and we thought that it was just starving on resources, but several days later we had SDC killed again • We increased MEM and HEAP even more – to 16G, but we bought just another day or two before is was killed again • Looked like SDC heap were constantly filling with data that don’t go away and eventually it kills the container • Also GC was working hard and sometimes we got freezes up to 60 seconds • Decided to move out from Docker
  • 28. Marathon • Streamsets reading JSON messages from Kafka cluster and output to Elasticsearch cluster o De-serializing and serializing JSON was very slow with single threaded process o Consuming from Kafka performance test showed:  JSON format: 5k records/sec avg  Text format: 50k records/sec avg  Binary format: 250k records/sec avg • Streamsets team were very proactive with this issues and in 2 days we received a fix for multi-threaded JSON parsing o New testing showed:  JSON format: 66k records/sec avg
  • 29. Marathon • Streamsets has never failed because of any internal logic bugs but we kept seeing this oom-killer popping up and recovering was not automated • We decided to leave docker and run SDC natively on host, still using Marathon for scaling and failover • Without docker, we now can upload our pipeline on SDC startup, and it will start working as soon as instance has loaded We can freely scale up/down whenever we need Also, we got rid of oom-killer issue as well
  • 30. Each one of our 3 SDC instances already processes ~3B messages, with no issues!
  • 31. • Streamsets pipeline consume metrics gathered by collectd and logs gathered by logstash from 4 different clusters (including self), transform and decorate them and send to Elasticsearch for storage and analytics. • First of all we consume messages from Kafka topic at average of 5,000 messages per second. The consumer itself parses JSON-format and sends further. • Next stage is a JavaScript script that decorates messages with cluster name, based on a instance hostname in that message • Finally, we exclude Marathon events from stream sending them directly to ES
  • 32. • Next stage will splits stream into 2 parts: logs and metrics • Metrics are send straight to ES without any transformation • Logs are the most interesting part: o We pop docker container logs from stream and delete “time” field that’s duplicate timstamp and sending them to ES o We separate logs from specific clusters, because we need to apply special logic for them o Separation is done though mapping IP’s to clusters in the pipeline realtime
  • 33. • Collecting data from several Mesos clusters and need to correlate container metrics with it’s logs • Use appID taskID and runID to identify specific containers logs • Container logs itself have all three of this, while mesos- master and mesos-agent logs lacks runID • All unidentified data is discarded
  • 34. Current ShippedAnalytics prod cluster configuration: Kafka Cluster: 7 brokers with 4CPU and 16GB RAM each Logstash topic for all incoming messages with 7 partitions and 2 replicas Current data flow is avg 5000 messages/sec to Kafka Current data size is avg 1,2MB/sec to Kafka Streamsets: 3 instances with identical pipeline configuration reading from Kafka cluster 7 partitions are split between 3 instances like 3/2/2 All 3 instances running natively on host (non-docker) with Marathon Marathon restarts failed instance with automatic pipeline upload and start Elasticsearch: 7 nodes with 4CPU, 16GB RAM and 2TB storage each Each metrics is written to its own index, total of 15 indexes Each index has 5 primary shards and 5 replica shards Total Doc count: 17,5B Total Doc size: 3.84TB 1 Day rate count: ~500M 1 Day rate size: ~120GB
  • 35. Streamsets is a great product to work with, also team is super helpful and works fast • Lots of input and output connectors, huge processing capabilities • Very intuitive and rich User Interface • Easy to create pipelines visually, instead of writing code • Clear data flow paths • Small resource consumption compared to performance • Easily can handle up to 10k records/sec to Elasticsearch with 1CPU 2GB RAM • Simple configuration and deployment process • Opensource(!) • Fast logic changes with minimum downtime • Preview mode(!) – check every stage before throwing all your data it • Rich data transformation possibilities • GROK filters – easy to migrate from Logstash • Smart Errors handling • Reliable: not once did Streamets crashed by itself – only Docker, Marathon, Mesos issues