Data science for infrastructure dev week 2022

Data Science for Infrastructure:
Observe, Understand, Automate
Zain Asgar & Natalie Serrino

https://px.dev
Zain Asgar Natalie Serrino
@nserrino
Principal Engineer - TLM @ New Relic
Prior: Eng @ Observe, Eng @ Trifacta,
Eng @ Intel
@zainasgar
GM @ New Relic
Adjunct Professor of CS @ Stanford
Prior: Co-founder/CEO - Pixie Labs
Eng @ Google, Trifacta, NVIDIA

https://px.dev
We see observability as a data problem
- It’s easy for machines to generate GBs of data per second
- It’s hard to get complete coverage applications, especially in distributed
environments
- It’s hard to make sure this data is relevant
- It’s hard to distill the data into something usable

https://px.dev
What we learned in the data space
- Collecting the right data is half the battle
- Simple models on relevant data usually outperform complex models on a
skewed/incomplete dataset
- Important to be able to audit and inspect your data pipelines

https://px.dev
How to do data-driven automation?
Transform data
into signal!
Do something
based on signal!
Gather
raw data!
⏰ Most time is spent here
Need variety and depth in
input data
👀 Disproportionate
emphasis
Can be a simple rule set or a
statistical/ML model
🤞 Ideally with limits + alerts
Huge possibilities here with the
Kubernetes API

https://px.dev
Transform
data into signal!
Do something
based on signal!
Gather
raw data!
- Logs
- Application metrics
- Raw requests
- Aggregates
- Anomaly detection
- Regex
- Machine learning models
- Ping Slack/JIRA
- Scale deployment up/down
- Allocate more resources

https://px.dev
Transform
data into signal!
Do something
based on signal!
Gather
raw data!
- Logs
- Infrastructure utilization
- Application metrics
- Raw requests
- Application profiles
- Network connections
- Kubernetes state
- Mostly data wrangling...
- Aggregates
- Anomaly detection
- Thresholds
- Regex/pattern-matching
- Linear regression
- Machine learning models
- Ping Slack/JIRA
- Scale deployment up/down
- Restart pod/service
- Page someone
- Allocate more resources
- Roll back
- Disable/enable feature

https://px.dev
We built Pixie to solve these problems
Auto-telemetry using eBPF
100% scriptable & API-driven
Kubernetes native

https://px.dev
Application, network, and infrastructure data
Full-body request traces and ﬂamegraphs!
Low overhead! <5% CPU
Auto-Telemetry using eBPF

https://px.dev
Query Kubernetes entities like pods, services,
deployments, nodes!
Entirely in-cluster data storage and edge
compute
Kubernetes Native

https://px.dev
Infrastructure as code!
Everything is a script and can be accessed via API
Easily integrate with Grafana, Slack, or other tools
API driven & 100% Scriptable

import px
def http_data():
df = px.DataFrame(table='http_events', start_time='-30s')
df.pod = df.ctx['pod']
return df[['pod', 'http_req_path', 'http_resp_latency_ns']]
px.display(http_data())
🔍 Query
⛏ Collect
󰣼 Don’t invent a new language
PxL provides a programmable API for Pixie

● Valid
import px
def http_data():
PxL is an embedded DSL

● Valid
● Valid
import px
def http_data():

● Valid
● Valid
● Built for data analysis and ML
import px
def http_data():

import px
def http_data():
PxL specifies logical
flow of data
(declarative)
Pixie plans &
optimizes the
execution
Operator
Data
PxL is an dataflow language

import px
def http_data():
All transforms = methods on
a PxL dataFrame
Aggregate
Join
Filter
...etc
PxL scripts use transforms to analyze data

import px
def http_data():
Declarative +
Functional +
No implicit side effects
=
Composable
PxL scripts are composable

https://px.dev
PxL provides an interface to work with data
It allows us to construct powerful, composabe workﬂows.
These following demos demonstrate this capability:
1. Slack alert on SQL injection attacks
2. Auto-scale deployment by HTTP request throughput

Demo 1: Slack Alert for SQL Injection Attacks

Demo app: DVWA
https://github.com/digininja/DVWA

https://px.dev
What is a SQL injection?
“SQL injection is a code injection technique used to attack
applications, in which malicious SQL statements are inserted into an
entry ﬁeld for execution.“

https://px.dev
Example SQL injection
User accesses
http://foobar.com?user_id=123
Application executes
SELECT * from users where user_id=123
Malicious actor accesses
http://foobar.com?user_id=123 or 1=1
Application executes
SELECT * from users where user_id=123 or 1=1
��

https://px.dev
How can we detect SQL injections?
💥 Rules 💥
- Parse query to detect prohibited syntax (e.g. unions)
- Regexes to detect prohibited syntax
💭 Complication: What if your app has a legitimate use of union?
💥 Machine learning 💥
- Train model on real world examples
- Can theoretically learn that certain usage of syntax are okay
💭 Complication: Where to get the dataset?

https://px.dev
Vulnerability testing tool 🚀
SQL Vulnerability testing via
github.com/SQLMapproject/SQLMap

https://px.dev
Slack Alert for SQL Injection Attacks
Transform
data into signal!
Do something
based on signal!
Gather
raw data!
Generate alert about
SQL injections
Diagnose SQL
injection events
Collect raw
SQL events

Demo 2: Autoscale deployment by HTTP
request throughput

https://px.dev
Autoscaling
💭 How do you know how many pods your deployment should
have?
💭 How do you know the amount of resources to provision for
those pods?

https://px.dev
Possible autoscaling metrics
- CPU, memory of pod
- Avg / p90 / p99 request latency
- Latency of downstream dependencies
- # of outbound connections
- Application-speciﬁc metrics
- ….. Many more …...

https://px.dev
K8s Autoscalers
- Both “Horizontal” and “Vertical” scaling
- Some built-in autoscaling metrics:
- Pod CPU
- Pod Memory
- Custom metrics API allows to scale on
custom metrics! 😎
https://github.com/kubernetes/metrics
Credit: kubernetes.io

https://px.dev
Very sophisticated demo app

https://px.dev
Other tools supporting this demo
Custom metrics server adapted from this project:
github.com/kubernetes-sigs/custom-metrics-apiserver
👆 Check it out to build your own K8s metrics server!
HTTP load testing via Hey
https://github.com/rakyll/hey

https://px.dev
Autoscale deployment by HTTP request throughput
Transform
data into signal!
Do something
based on signal!
Gather
raw data!
Autoscale # of pods
by HTTP req/s
Calculate HTTP
req/s by pod
Collect raw HTTP
requests

https://px.dev
We’d love to get your feedback
In these demos we showed some simple data workﬂows on Pixie.
- More details about SQL injection here: blog.px.dev/sql-injection
- More details about autoscaling: blog.px.dev/autoscaling-custom-k8s-metric
What’s next:
- We are working on XSS detection.
- We want to learn about more use cases. Find us on GitHub (pixie-io/pixie) or
Slack (slackin.px.dev).

Thanks!
Github: github.com/pixie-io/pixie
Blog: blog.px.dev
Website: px.dev

Data science for infrastructure dev week 2022

Related slideshows

More Related Content

Data science for infrastructure dev week 2022