SlideShare a Scribd company logo
Scaling Security
Workflows in
Government Agencies
September 28, 2017 | 11:00 AM ET
WEBINAR
Housekeeping
2
• Recording
• Today’s Slides
• Attachments
• Questions
• Feedback
Our Presenters
3
Keith Ober
Systems Engineer
Avere Systems
Bernie Behn
Principal Product Engineer
Avere Systems
What’s the problem?
Data, Data Everywhere
Security Analysis Workflow
Security Analysis Workflow
• Acquire and Aggregate Inputs
• Normalize Data
• Archive Raw / Archive Normalized Data
• Analyze for Patterns
• Alert or Remediate
Types of Inputs
• Network Equipment: routers, switches, firewalls, VPN appliances, etc.
• IT: physical servers, VM infrastructure, virtual machines, directory services,
end user desktops/laptops
• Application Layer: log files, access logs for applications, web servers
• Miscellaneous: sensor data
Typical Ingest Workflow
Ingest Node(s)
Normalize/Filter
Storage
Analyze data
Report/Alert
Logs, streamed to
Ingest Nodes
Typical Ingest Workflow
Ingest Node(s)
Normalize/Filter
Storage
I/O at scale can slow
down storage, backing up
entire workflow
Typical Ingest Workflow
Ingest Node(s)
Normalize/Filter
Storage
Analyze data
Report/Alert
If analysis is not co-located,
latency can impede
analysis
Typical Analysis Workflow
Ingest Node(s)
Normalize/Filter
Storage
Analyze data
Report/Alert
If analysis is not co-located,
latency can impede
analysis
The meta-problem: Log File Ingest and Processing
12
All router1 log files from the beginning
of time...well, from when we started
gathering them...
All log files from firewall 1 All log files from server 1Time
and
Volume
NET: A lot of historical data over time, applying pressure
both in terms of storage and processing
Log Ingest Writ Large
TheTrue Scale of
Enterprise Ingest!
Five Big Challenges
Let’s Break It Down.
5 challenges when engineering security workflows
15
1. Ingest Latency and Throughput
2. Vendor Lock-In
3. Life Cycle Management
4. Data Availability and Redundancy
5. Cloud Integration
Challenge 1: Ingest Latency and Throughput
Ingest Node
Normalize/Filter
Storage
I/O at scale can slow down
storage, backing up entire
workflow
Ingest Latency Scales Too
Storage
The scale forces multiple storage sites,
and on some products requires a
replication mechanism, introducing more
cost, overhead and latency.
Ingest Node(s)
Normalize/Filter
Storage
Storage
Storage
Storage
Volume of inputs will
drive the number of
ingest nodes required.
IOT devices are increasing the amount of
log data being generated and ingested.
Challenge 2: Vendor Lock-In
Storage
Ingest Node(s)
Normalize/Filter
Storage
Storage
Storage
Storage
As solution scales:
1. Additional demand for storage increases costs, and
lowers performance.
2. Deficiencies in current storage solution amplifies as
deployment grows, longer upgrade outages.
3. Vendors often limit interoperability with other products
when it comes to replication and tiering.
Vendor Lock-In
Ability to transfer data to new solution is difficult:
• Business Continuity -- How do you stop ingesting and processing logs?
• Interoperability -- How do you ensure that your new/proposed storage solution will
work well in a high-performance environment
: Ingest performance
: Read performance
: Scale
Challenge 3: Data Lifecycle Management
Storage
Ingest Node(s)
Normalize/Filter
Storage
Storage
StorageMore Expensive
Storage
Value of the data
1. Lowering the cost to store means you can store
more and derive greater value.
2. New analytic methods/tools bring a fresh round of
analysis and burst workloads.
3. How can we begin to build AI based workloads?
Cheaper
Storage
Warm Data
Cold Data
Lifecycle Management
Storage Performance
• Ingest & analysis requires higher performance storage = More expensive
storage
• Over time simply too much data to store in performance tier
• Deletion of older data possible?
Challenge 4: Data Availability and Redundancy
Ingest Node
Normalize/Filter
Storage
Ingest Node
Normalize/Filter
Ingest Node
Normalize/Filter
Storage
Storage
Reporting and
Processing
Node
Reporting and
Processing
Node
Reporting and
Processing
Node
Data Availability and Redundancy
• Performance at scale requires distributing the reporting/analysis
• Geographical location of ingest may also be distributed
• Critical data: can’t lose it so avoid single point of failure
• Large data sets with streaming data are extremely difficult to backup with
traditional methods and are cost prohibitive
Challenge 5: Cloud Integration: Storage
Storage
Ingest Node(s)
Normalize/Filter
Storage
Storage
Storage
Storage
1. How to start archiving data to Cloud Storage, in
order to lower cost?
2. How can businesses leverage cloud-based AI
workloads against the same data they have
today?
Cloud Integration: Storage
Cloud Storage Pros
• Cloud Storage can be very
inexpensive
• Reduces the need to own and
maintain additional IT assets
• Public clouds have built-in
redundancy
Cloud Storage Cons
• Cloud storage is eventually
consistent...a query immediately
after a write may not succeed
• Lowest-cost storage is object, and
requires S3-compliant application
access
And, so, what do we do?
Cache It If You Can
How can Avere address those challenges?
27
Challenges Avere Solution
Ingest Latency and Throughput Avere write caching
Avere FXT NVRAM
10GB+ Bandwidth
Vendor Lock-In Global Namespace
Flash Move & Mirror
Life Cycle Management
Data Availability and Redundancy HA, Clustering
Flash Move & Mirror
Cloud Integration Avere vFXT compute-based appliance
Avere FlashCloud for AWS S3 and GCS
Speed Ingest via write-behind caching
• Gather writes (ack’ing clients immediately) and flushing in parallel
• Hardware: NVRAM for write protection and caching
• Clustered caching solution distributes writes across multiple nodes
Accelerate read performance with distributed, read-ahead caching
• Read-ahead a request (read a bit more than what was requested)
• Cache requests for other readers (typical in analytic workloads)
• Writes cached as written, speeding analysis workloads
The Power of High-Performance Caching
Caching Basic Architecture
Ingest Node
Normalize/Filter
Storage
● Ingest nodes write to NFS
mount points distributed across
a caching layer
Ingest Node
Normalize/Filter
Ingest Node
Normalize/Filter
● Writes are flushed to storage over time,
smoothing the ingest
● Writes are protected within the cluster via
HA mirror
Reporting /
Analysis
Node(s)
● Reporting / Analysis nodes access
data via the cluster.
● Reads are cached, eventually aged
● Written data in the near term is cached
and available
Avere FXT Cluster
Data Placement
Ingest Node
Normalize/Filter
Storage
Ingest Node
Normalize/Filter
Ingest Node
Normalize/Filter
Avere can Mirror data to
cloud storage for longer-
term archiving
Data is accessible
through the cluster, as
though it were on the
primary storage
Reporting
Analysis
Node(s)
Does this Really Work?
From the Field Use Case
Avere Security Workflow
FXT Cluster
DMZ Network
Central Control
Container based
applications to Normalize
data
DATA
Syslog/NetFlow/…
DATA
Streaming data to Avere
with no direct access
Core Filers
Splunk
Configure Splunk to eat data from
separate vServer (isolating traffic)
Splunk Data Consumers
Web access to visualize data ingested
and analyzed by Splunk
Mirror/Migrate/Cloud
Core Filers
33
SC17
November 12-17, 2017
Denver, Colorado
AIRI 2017
October 1-4, 2017
Washington DC
AWS re:Invent
Nov. 27- Dec. 1, 2017
Las Vegas, Nevada
Contact Us!
34
Keith Ober
Systems Engineer
Avere Systems
kober@averesystems.com
Bernie Behn
Principal Product Engineer
Avere Systems
bbehn@averesystems.com
AvereSystems.com
888.88 AVERE
askavere@averesystems.com
Twitter: @AvereSystems

More Related Content

Scaling Security Workflows in Government Agencies

  • 1. Scaling Security Workflows in Government Agencies September 28, 2017 | 11:00 AM ET WEBINAR
  • 2. Housekeeping 2 • Recording • Today’s Slides • Attachments • Questions • Feedback
  • 3. Our Presenters 3 Keith Ober Systems Engineer Avere Systems Bernie Behn Principal Product Engineer Avere Systems
  • 4. What’s the problem? Data, Data Everywhere
  • 6. Security Analysis Workflow • Acquire and Aggregate Inputs • Normalize Data • Archive Raw / Archive Normalized Data • Analyze for Patterns • Alert or Remediate
  • 7. Types of Inputs • Network Equipment: routers, switches, firewalls, VPN appliances, etc. • IT: physical servers, VM infrastructure, virtual machines, directory services, end user desktops/laptops • Application Layer: log files, access logs for applications, web servers • Miscellaneous: sensor data
  • 8. Typical Ingest Workflow Ingest Node(s) Normalize/Filter Storage Analyze data Report/Alert Logs, streamed to Ingest Nodes
  • 9. Typical Ingest Workflow Ingest Node(s) Normalize/Filter Storage I/O at scale can slow down storage, backing up entire workflow
  • 10. Typical Ingest Workflow Ingest Node(s) Normalize/Filter Storage Analyze data Report/Alert If analysis is not co-located, latency can impede analysis
  • 11. Typical Analysis Workflow Ingest Node(s) Normalize/Filter Storage Analyze data Report/Alert If analysis is not co-located, latency can impede analysis
  • 12. The meta-problem: Log File Ingest and Processing 12 All router1 log files from the beginning of time...well, from when we started gathering them... All log files from firewall 1 All log files from server 1Time and Volume NET: A lot of historical data over time, applying pressure both in terms of storage and processing
  • 13. Log Ingest Writ Large TheTrue Scale of Enterprise Ingest!
  • 14. Five Big Challenges Let’s Break It Down.
  • 15. 5 challenges when engineering security workflows 15 1. Ingest Latency and Throughput 2. Vendor Lock-In 3. Life Cycle Management 4. Data Availability and Redundancy 5. Cloud Integration
  • 16. Challenge 1: Ingest Latency and Throughput Ingest Node Normalize/Filter Storage I/O at scale can slow down storage, backing up entire workflow
  • 17. Ingest Latency Scales Too Storage The scale forces multiple storage sites, and on some products requires a replication mechanism, introducing more cost, overhead and latency. Ingest Node(s) Normalize/Filter Storage Storage Storage Storage Volume of inputs will drive the number of ingest nodes required. IOT devices are increasing the amount of log data being generated and ingested.
  • 18. Challenge 2: Vendor Lock-In Storage Ingest Node(s) Normalize/Filter Storage Storage Storage Storage As solution scales: 1. Additional demand for storage increases costs, and lowers performance. 2. Deficiencies in current storage solution amplifies as deployment grows, longer upgrade outages. 3. Vendors often limit interoperability with other products when it comes to replication and tiering.
  • 19. Vendor Lock-In Ability to transfer data to new solution is difficult: • Business Continuity -- How do you stop ingesting and processing logs? • Interoperability -- How do you ensure that your new/proposed storage solution will work well in a high-performance environment : Ingest performance : Read performance : Scale
  • 20. Challenge 3: Data Lifecycle Management Storage Ingest Node(s) Normalize/Filter Storage Storage StorageMore Expensive Storage Value of the data 1. Lowering the cost to store means you can store more and derive greater value. 2. New analytic methods/tools bring a fresh round of analysis and burst workloads. 3. How can we begin to build AI based workloads? Cheaper Storage Warm Data Cold Data
  • 21. Lifecycle Management Storage Performance • Ingest & analysis requires higher performance storage = More expensive storage • Over time simply too much data to store in performance tier • Deletion of older data possible?
  • 22. Challenge 4: Data Availability and Redundancy Ingest Node Normalize/Filter Storage Ingest Node Normalize/Filter Ingest Node Normalize/Filter Storage Storage Reporting and Processing Node Reporting and Processing Node Reporting and Processing Node
  • 23. Data Availability and Redundancy • Performance at scale requires distributing the reporting/analysis • Geographical location of ingest may also be distributed • Critical data: can’t lose it so avoid single point of failure • Large data sets with streaming data are extremely difficult to backup with traditional methods and are cost prohibitive
  • 24. Challenge 5: Cloud Integration: Storage Storage Ingest Node(s) Normalize/Filter Storage Storage Storage Storage 1. How to start archiving data to Cloud Storage, in order to lower cost? 2. How can businesses leverage cloud-based AI workloads against the same data they have today?
  • 25. Cloud Integration: Storage Cloud Storage Pros • Cloud Storage can be very inexpensive • Reduces the need to own and maintain additional IT assets • Public clouds have built-in redundancy Cloud Storage Cons • Cloud storage is eventually consistent...a query immediately after a write may not succeed • Lowest-cost storage is object, and requires S3-compliant application access
  • 26. And, so, what do we do? Cache It If You Can
  • 27. How can Avere address those challenges? 27 Challenges Avere Solution Ingest Latency and Throughput Avere write caching Avere FXT NVRAM 10GB+ Bandwidth Vendor Lock-In Global Namespace Flash Move & Mirror Life Cycle Management Data Availability and Redundancy HA, Clustering Flash Move & Mirror Cloud Integration Avere vFXT compute-based appliance Avere FlashCloud for AWS S3 and GCS
  • 28. Speed Ingest via write-behind caching • Gather writes (ack’ing clients immediately) and flushing in parallel • Hardware: NVRAM for write protection and caching • Clustered caching solution distributes writes across multiple nodes Accelerate read performance with distributed, read-ahead caching • Read-ahead a request (read a bit more than what was requested) • Cache requests for other readers (typical in analytic workloads) • Writes cached as written, speeding analysis workloads The Power of High-Performance Caching
  • 29. Caching Basic Architecture Ingest Node Normalize/Filter Storage ● Ingest nodes write to NFS mount points distributed across a caching layer Ingest Node Normalize/Filter Ingest Node Normalize/Filter ● Writes are flushed to storage over time, smoothing the ingest ● Writes are protected within the cluster via HA mirror Reporting / Analysis Node(s) ● Reporting / Analysis nodes access data via the cluster. ● Reads are cached, eventually aged ● Written data in the near term is cached and available Avere FXT Cluster
  • 30. Data Placement Ingest Node Normalize/Filter Storage Ingest Node Normalize/Filter Ingest Node Normalize/Filter Avere can Mirror data to cloud storage for longer- term archiving Data is accessible through the cluster, as though it were on the primary storage Reporting Analysis Node(s)
  • 31. Does this Really Work? From the Field Use Case
  • 32. Avere Security Workflow FXT Cluster DMZ Network Central Control Container based applications to Normalize data DATA Syslog/NetFlow/… DATA Streaming data to Avere with no direct access Core Filers Splunk Configure Splunk to eat data from separate vServer (isolating traffic) Splunk Data Consumers Web access to visualize data ingested and analyzed by Splunk Mirror/Migrate/Cloud Core Filers
  • 33. 33 SC17 November 12-17, 2017 Denver, Colorado AIRI 2017 October 1-4, 2017 Washington DC AWS re:Invent Nov. 27- Dec. 1, 2017 Las Vegas, Nevada
  • 34. Contact Us! 34 Keith Ober Systems Engineer Avere Systems kober@averesystems.com Bernie Behn Principal Product Engineer Avere Systems bbehn@averesystems.com AvereSystems.com 888.88 AVERE askavere@averesystems.com Twitter: @AvereSystems