Streaming meetup
- 2. Symantec
Corpora-on
•
Symantec
– Symantec
is
the
world
leader
in
providing
security
so;ware
for
both
enterprises
and
end
users
– There
are
300
million
devices
(PCs,
Tablets
and
Phones)
that
rely
on
Symantec
for
their
security
needs
– There
are
also
1000’s
of
Enterprises
that
rely
on
Symantec
to
help
them
secure
their
assets
from
aHacks,
including
their
data
centers,
emails
and
other
sensi,ve
data
•
Cloud
Pla/orm
Engineering
(CPE)
– Cloud
Pla/orm
Engineering
(CPE)
organiza,on
at
Symantec
is
responsible
for
building
the
next
genera,on
cloud
pla/orm
for
Symantec
– We
use
Open
stack
for
building
the
infrastructure
cloud
– We
use
Hadoop/Storm/KaQa/Spark
for
the
building
the
analy,cs
cloud
2
Symantec
Analy,cs
Pla/orm
- 3. About
us
•
Karthik
Karuppaiya
Karthik
is
a
Principal
Cloud
Pla/orm
Engineer,
leading
the
efforts
on
architec,ng
and
implemen,ng
the
Symantec’s
next
genera,on
Big
Data
Analy,cs
Pla/orm.
He
has
extensive
experience
with
designing
and
engineering
large
scale
distributed
systems
on
Big
Data
technologies
since
2010.
hHps://www.linkedin.com/in/karthikkrk
hHps://twiHer.com/karthikkrk
•
Raghavendra
Nandagopal
Raghavendra
Nandagopal
is
a
Principal
Cloud
Pla/orm
Engineer
having
extensive
experience
on
architec,ng
and
engineering
distributed
systems
in
Big
data
space.
He
is
also
a
contributor
to
Apache
Storm
project.
hHps://www.linkedin.com/in/speaktoraghav
hHps://twiHer.com/speaktoraghav
3
Symantec
Analy,cs
Pla/orm
- 4. Agenda
• Analy,cs
Pla/orm
Overview
• Real-‐,me
Streaming
Architecture
• Lessons
Learned
• Cluster
Deployment
Overview
• Performance
Metrics
Collec,on
• Monitoring
• Self
Service
Analy,cs
Cluster
4
Symantec
Analy,cs
Pla/orm
- 5. 5
HDFS
(Hadoop Distributed File System)
YARN
(Cluster Resource Management)
KAFKA
PIG OOZIEHIVE
STORM
Analytics Engines
BARE METAL OPENSTACK VMs
Nodes
QueryXLMM
OPS
VIEW
Monitoring&AlertingServices
KNOX BDSE SPaaS
Gateway Services
GANGLIA
HUE MFC
AMBARIPUPPET
DeploymentAutomation
Analy-cs
Pla0orm
Overview
Symantec
Analy,cs
Pla/orm
- 6. 6
Real-‐-me
Streaming
Architecture
Security
Events
(Kafka Producers)
Alert Events
(Kafka Consumers)
Streaming Cluster
Kafka Kafka
Storm
Logstashcollectd
Upload
MetaData File
LMM
Symantec
Analy,cs
Pla/orm
- 7. Lessons
Learned
• KaQa’s
lack
of
rack
awareness
– With
a
replica,on
of
3,
chances
are
that
all
the
3
replica,ons
for
a
par,,on
resides
on
the
same
rack
• KaQa’s
JBOD
limita,ons
– KaQa
broker
shuts
down
when
a
disk
fails
– E.g.
If
a
broker
as
10
disks
configured
and
due
to
one
disk
failure
all
the
10
disks
will
be
unavailable
– The
,me
taken
to
replicate
the
data
a;er
broker
restarts
will
be
longer
7
Symantec
Analy,cs
Pla/orm
- 8. Lessons
Learned
• Choosing
storm
worker
slots
for
a
cluster
– Rule
of
thumb
used
for
sizing
based
on
the
recommenda,ons
from
the
storm
community
– (M)
Total
Number
Of
Supervisors
=
12
– (C)
Total
Number
Of
CPU
cores
per
machine
=
32
– (X)
I/O-‐CPU-‐bound
factor:
a
value
between
1
(CPU
bound)
to
100
(I/O
bound)
=
10
(Uses
regex)
– (W)
No.
of
workers
in
a
topology
=
33
– (P)
Parallelism
Units
=
(M
*
C
*
X)
-‐
W
=
(12
*
32
*
10)
-‐
33
=
3807
P
will
be
rough
es,mate
of
how
many
parallelism
units
we
have.
We
can
then
distribute
that
number
among
components
in
the
topology
as
parallelism
hints.
8
Symantec
Analy,cs
Pla/orm
- 9. Cluster
Facts
•
KaQa
Nodes
–
10
Nodes
–
Each
with
12
disks
of
4
TB
each
–
Total
48
TB
*
10
=
480
TB
capacity
•
Storm
Nodes
–
12
Supervisor
and
1
Nimbus
–
128
GB
RAM
and
32
cores
–
96
worker
slots
total
•
Processing
300000
events/sec
9
Symantec
Analy,cs
Pla/orm
- 10. Cluster
Deployment
Overview
•
Goals
set
out
for
deployment
–
Fully
automated
–
Use
the
same
deployment
scripts
for
all
the
environments,
to
keep
the
deployments
consistent
–
Easy
deployment
of
Dev
clusters
to
enable
fast
adop,on
–
Use
only
open
source
tools
–
Use
exis,ng
tools
as
much
as
possible
and
fill
the
needed
gaps
10
Symantec
Analy,cs
Pla/orm
- 11. 11
Cluster
Deployment
Overview
Symantec
Analy,cs
Pla/orm
Deployment Automation
Framework (DAO)
Puppet
Ambari Server
Kafka/ZK
1..N
Storm 1..N HDFS 1..N
Ambari Server/
API Node
Install
Ambari
Server
and
Agents
Provision
Hardware
Apply
Blueprint
- 13. Performance
Metrics
Collec-on
•
Easy
to
run
and
collect
metrics
•
Easy
to
test
mul,ple
configura,ons
•
No-‐op
Bolts
in
Storm
•
Primarily
geared
towards
tes,ng
the
KaQa
read/write
performance
–
KaQa
Write
Throughput
Topology
–
KaQa
Read
Throughput
Topology
–
KaQa
Read/Write
Throughput
Topology
•
Generate
as
many
events
as
possible
•
Use
Ganglia
to
collect
metrics
•
The
tool
will
be
open
sourced
soon
Symantec
Analy,cs
Pla/orm
13
- 15. Monitoring
•
OpsView
–
Host
level
monitoring
• CPU, Memory, Disk, Network/Ports.
• Service level monitoring.
•
QueryX
–
Func,onal
Valida,on/Monitoring
• Validation from inside/outside the cloud
•
KaQa
JMX/Consumer
Lag
Monitoring
Symantec
Analy,cs
Pla/orm
15
- 17. Monitoring
•
KaQa
JMX
Metrics
–
We
have
a
collectd
client
that
pulls
metrics
from
KaQa
JMX
–
Runs
every
one
minute
and
pushes
the
metrics
to
LMM
•
LMM
–
Homegrown
tool
for
collec,ng
logs
and
metrics
–
Uses
most
of
the
technologies
that
SPaaS
is
built
on
–
Logstash/Storm/
KaQa/InfluxDB/Elas,cSearch/Kibana/Grafana
–
Easy
to
collect
metrics
and
create
dashboards
Symantec
Analy,cs
Pla/orm
17
- 19. Monitoring
(KaTa
Consumer
Lag
Tool)
•
Why?
–
KaQa
monitoring
tools
available
only
for
tradi,onal
KaQa
consumers
–
One
tool
to
track
both
tradi,onal
and
KaQa
spout
consumers
•
What?
–
Built
into
our
API
layer
–
easy
to
deploy
and
manage
–
Provides
op,on
for
both
JSON
and
HTML
output
–
Stats
are
sent
automa,cally
through
“statsd”
client
–
Statsd
client
pushes
the
metrics
to
LMM
–
We
have
a
Grafana
dashboard
built
on
top
of
this
metrics
Symantec
Analy,cs
Pla/orm
19
- 23. Self
Service
Analy-cs
Cluster
Symantec
Analy,cs
Pla/orm
23
•
Why?
–
How
do
you
enable
1000’s
of
Engineers
to
write
applica,ons
for
Big
Data
Analy,cs
Pla/orm?
–
They
need
a
safe
place
to
experiment
and
learn
•
What?
–
A
cluster
for
each
engineer
–
Easy
to
deploy
Cluster
–
One
click
Deployment
–
Takes
only
few
minutes
to
deploy
the
cluster
–
Engineers
can
build
and
destroy
clusters
at
will
–
Manage
resources
and
quotas
for
each
engineer
- 24. 24
Self
Service
Analy-cs
Cluster
SSA API
POST /cluster/create/5NodeTemplate
Symantec
Analy,cs
Pla/orm
Openstack / Keystone
Identity API
authenticate
Auth Token
Validate Token / Check
Quota
Call Nova / Neutron APIs to
spin up VMs and Networks
Install Ambari
using Puppet
Apply Ambari
Blueprint
Return Ambari URL
- 25. Thank
you!
Copyright
©
2013
Symantec
Corpora-on.
All
rights
reserved.
Symantec
and
the
Symantec
Logo
are
trademarks
or
registered
trademarks
of
Symantec
Corpora,on
or
its
affiliates
in
the
U.S.
and
other
countries.
Other
names
may
be
trademarks
of
their
respec,ve
owners.
This
document
is
provided
for
informa,onal
purposes
only
and
is
not
intended
as
adver,sing.
All
warran,es
rela,ng
to
the
informa,on
in
this
document,
either
express
or
implied,
are
disclaimed
to
the
maximum
extent
allowed
by
law.
The
informa,on
in
this
document
is
subject
to
change
without
no,ce.
25
We are hiring..!
Symantec
Analy,cs
Pla/orm