Benchmarking for postgresql workloads in kubernetes

Benchmarking for
PostgreSQL
workloads in
Kubernetes (part 2)
Gabriele Bartolini
#109 Data on Kubernetes (DOK) Webinar
16 December 2021

2021 Copyright © EnterpriseDB Corporation All Rights Reserved
Today’s speakers
Gabriele Bartolini
VP of Cloud Native at EDB
PostgreSQL user since ~2000
• Community member since 2006
• Co-founder of PostgreSQL Europe
Previously at 2ndQuadrant, from 2008 to 2020
• Co-founder
• Head of Global Support
• Cloud Native Initiative Lead
• Founding member of Barman
DevOps evangelist
2
Twitter: @_GBartolini_ / @EDBPostgres

EDB and Kubernetes
3
Major sponsor
of the PostgreSQL project
Kubernetes Certiﬁed Service
Provider (KCSP)
Silver Member of CNCF &
Linux Foundation
Platinum founding sponsor of
the Data on Kubernetes
Community
We have contributed to the PostgreSQL community every year since 2006, making major feature contributions.
We had 32 contributors in PostgreSQL 14, including 7 code committers and 3 core members.
Bringing PostgreSQL to Kubernetes

Agenda
• Key takeaways from Dok #58
• A day in the life of a Postgres transaction
• Recommended architectures
• Our methodology
• Conclusions

Why Kubernetes? Why PostgreSQL?
7
Cloud native culture with a highly versatile SQL driven database
● “Cloud Native” is much more than tools (Kubernetes)
○ Patterns/architectures (microservices, operators, ...)
○ Principles/culture (devops/lean/agile, velocity, automation, pervasive quality and security processes, …)
● Kubernetes is becoming popular for stateful workloads, including databases:
○ Please refer to dok.community/dokc-2021-report/ for details
○ Reasons: storage classes, local persistent volumes, the operator pattern
● PostgreSQL is based on 25+ years of evolutionary innovation
○ Linux : Operating System = Postgres : Database
○ Database of the year in 2017, 2018, and 2020 at db-engines.com
○ Some of its main features:
■ Native streaming replication, both physical and logical, sync and async, cascading
■ Online Continuous Backup and Point In Time Recovery
■ Declarative Partitioning
■ Parallel queries
■ Extensibility and extensions (e.g. PostGIS)
■ JSON support (SQL/noSQL hybrid databases)
■ ACID transactions

Benchmarking PostgreSQL
8
Workloads, storage and the database
● Storage
○ Write Ahead Log (WAL), or historically xlog
■ Sequential writes and fsync
○ Shared buﬀers cleaning
■ By checkpoint, bgwriter, or the single backend
■ Random writes (and OS cache)
○ Page reads
■ Random reads
○ Optimization: Table scans
■ Sequential reads
○ Capping on cloud environments
● Database
○ Workloads: in-memory, OLTP, and OLAP
○ Initial focus: TPS on large OLTP workloads (RAM < DB size)
○ pgbench
● We introduced cnp-bench

Know your storage
9
You need to trust it
● Make sure you benchmark your storage before you go in production
● Make sure you benchmark your storage before you test your database
○ Storage can become your bottleneck
■ If your storage is slow, your database will be slow
● Please refer to DoK #58 for more information on storage benchmarking
○ Use cnp-bench
○ Use ﬁo directly

A day in the life of a transaction
(VERY simpliﬁed view)

Disclaimer
Postgres internals are more complex than
this. The following is a simpliﬁed view for
clarity.

Disk (pg_wal)
Disk (PGDATA)
12
Shared Buffers (Postgres cache)
8kb 8kb … WAL file segment
Checkpoint
8kb 8kb …
Postgres backend
Another “brick” in the WAL
8kb
8kb
Ready to be recycled
usually 16MB in size
Transaction log
Sequential writes
Fsync-ed
Regularly the
database cache is
ﬂushed on disk
(“dirty pages”)
WAL file segment
DISCLAIMER: simpliﬁed view for didactic purposes
8kb
e.g Random writes
Random reads
Seq scans

Recommended
architectures for
PostgreSQL

PostgreSQL architectures in Business Continuity
14
Always plan for benchmarking the ﬁnal production architecture
● Start with one instance to spot the major bottlenecks
○ e.g. storage
● Then move to a real life production architecture
● Consider your Business Continuity goals
○ Disaster recovery - primarily focused on Recovery Point Objective (RPO)
○ High Availability - primarily focused on Recovery Time Objective (RTO)
○ Plan your production database architectures with both RTO and RPO in mind
● PostgreSQL provides the fundamental blocks for Business Continuity
○ Continuous backup and Point In Time Recovery
■ Base backups
■ WAL archiving
○ Native streaming replication based on the Write Ahead Log (WAL)
● The WAL is central in PostgreSQL
● To keep your data safe, managing the above in Kubernetes requires an operator written by
Postgres experts

Disk (pg_wal)
Disk (PGDATA)
15
Shared Buffers (Postgres cache)
WAL file segment
Postgres backend
The criticality of the WAL in day-to-day Postgres
Ready to be recycled
WAL file segment
DISCLAIMER: simpliﬁed view for didactic purposes
archive_command
wal_sender(s)
streaming replication
WAL
archive
Replicas
(standby)
Ready to be archived
Potential Bottlenecks!

Kubernetes cluster / namespace
Recommended architecture
16
Node
Local storage
Node
Local storage
Node
Local storage
Primary Sync Standby Potential Sync
Standby
zone 1 zone 2 zone 3
base backups
and WAL archive
Continuous backup
(WAL archiving)
restore_command
Streaming
Replication

Some potential bottlenecks and issues
17
A summary of the major issues that bottlenecks can cause
● WAL writing: local storage
○ Slow system
● WAL archiving: serialized process, network, remote storage, compression (if applicable)
○ Bottleneck cause WAL files to pile up on the volume where pg_wal is, causing Postgres to halt
● Streaming replication: network, remote storage
○ wal_keep_segments/wal_keep_size
■ Beyond this threshold, WAL files are recycled on the primary and the standby falls out of sync
○ replication slots
■ The primary keeps track of the location in the WAL needed by a standby and keeps the WAL file
● Same issue as WAL archiving - WAL files pile up and Postgres risks to halt
○ synchronous replication
■ A bottleneck here slows down writes on the primary
■ If all synchronous standby servers are down, the primary stops accepting writes (never use a single synchronous standby)
● Restore command: serialized process, network, remote storage, decompression (if applicable)
○ A standby cannot start streaming replication and relies on WAL files from the archive
○ Delayed standby - possible impact on RPO and RTO in case of failover of the primary
● Standby replay: single process
○ Delayed standby - possible impact on RPO and RTO in case of failover of the primary

Dedicated resources
18
Prefer shared nothing architectures, even in Kubernetes
● If you can, dedicate a Kubernetes node to one Postgres instance only
○ Take advantage of Pod scheduling capabilities and availability zones (where available)
■ pod affinity/anti-affinity
■ node selectors
■ tolerations
○ Properly set resource requests and limits
■ Guaranteed QoS is recommended
● If you can, use local storage on the dedicated node
○ Benchmark throughput
○ In the public cloud, watch out for IOPS limitations
● Costs/benefits analysis
○ One more reason why benchmarking is fundamental in proper and effective capacity planning
○ It’s your choice, and yours only

How we’re benchmarking Postgres on K8s
20
Observe and let numbers and diagrams help you discover issues
● We rely on:
○ cnp-sandbox
■ Prometheus, Grafana, Cloud Native PostgreSQL operator (EDB)
○ cnp-bench
■ on existing clusters
○ pg_stat_statements
● You can use your own PostgreSQL setup
○ Your favourite operator
● You can use your favorite observability tools
○ Your own Prometheus/Grafana
○ Something else (you should know what to look for now!)

A sandbox for Cloud Native PostgreSQL
21
cnp-sandbox is an open source helm chart
● Deploys a sandbox environment in Kubernetes with:
○ Prometheus
○ Grafana
○ Cloud Native PostgreSQL with:
■ a selection of PostgreSQL metrics for the native Prometheus exporter in CNP
■ a custom Grafana dashboard developed by EDB for Cloud Native PostgreSQL
● Main goals:
○ Evaluate Cloud Native PostgreSQL’s observability with Prometheus and Grafana
○ Integrate benchmarks with real-time collected data
● Suitable for pre-production and staging environments
○ Production environments should have their own Prometheus and Grafana installations
○ Metrics and dashboards can be reused
● URL: github.com/EnterpriseDB/cnp-sandbox

Deployment
22
helm repo add cnp-sandbox https://enterprisedb.github.io/cnp-sandbox/
helm repo update
helm upgrade --install cnp-sandbox
cnp-sandbox/cnp-sandbox

cnp-bench
24
Benchmarking the storage and a database PostgreSQL
● Storage benchmarking with ﬁo
● Database benchmarking and stress testing with:
○ pgbench
○ HammerDB
● Can be run against an existing Postgres database
○ Integrated with Cloud Native PostgreSQL, including pgBouncer for connection pooling
● Suitable for pre-production and staging environments
● URL: https://github.com/EnterpriseDB/cnp-bench

An example of pgbench initialization
25
cnp:
existingCluster: true
existingCredentials: pg-14-app
existingHost: pg-14-rw
existingDatabase: pgbench
image: quay.io/enterprisedb/postgresql:14.1
pgbench:
nodeSelector:
workload: pgbench
scaleFactor: 8000
initialize: true

An example of pgbench run
26
cnp:
existingCluster: true
existingCredentials: pg-14-app
existingHost: pg-14-rw
existingDatabase: pgbench
image: quay.io/enterprisedb/postgresql:14.1
pgbench:
nodeSelector:
workload: pgbench
initialize: false
skipVacuum: true
reportLatencies: true
time: 600
clients: 64
jobs: 128

Notes:
● 5 x 10min pgbench tests
● scale factor 8000 (120GB)
● 3 dedicated nodes:
○ AKS Standard_E8s_v4
○ 7 cores/56Gi RAM
○ Guaranteed Qos
○ Premium P80 storage class
● 1 MinSync replication
● Azure Blob Container (backup)
● pgbench on AKS Standard_D64s_v4

Another example showing ~ 13k tps with 32 cores

Bottleneck: serialized WAL archiving
29
Anticipate and avoid this scenario!
36k piled WALs!
What if the primary dies now?

Parallel archiving (1)
30
Remediation: parallel WAL archiving and large segment size (64MB)

Parallel archiving (2)
31
Might be OK (bulk loads or vacuums)

What’s next with cnp-bench
33
Our plan for the H1/2022
● Manage increasing number of client connections
● Manage repetitions
● Support custom pgbench scripts
● Improve support for HammerDB
● Introduce application level benchmarking
○ Web application load generation with hey
○ Front end scalability

About Cloud Native PostgreSQL
34
The Kubernetes operator from EDB
● It is currently closed source
○ Available for trials
● Fully declarative
● Integrated with the Kubernetes API server (no external tool for failover)
● Directly manages persistent volumes
● Our intention is to open source Cloud Native PostgreSQL in 2022
● It is the component that manages PostgreSQL in the data layer of BigAnimal

Conclusions
35
Why benchmarking PostgreSQL is important?
● Data is the most important asset of an organization
● Data can live in Kubernetes, in reliable databases like PostgreSQL
● Don’t leave anything to chance
○ Benchmark your storage and know its limits
○ Benchmark your database and know its limits
● Benchmark before you go to production
○ You might not be able to benchmark when in production
● Highly consider dedicating storage and nodes to a single PostgreSQL instance
○ First benchmark the single node, and focus on the storage primarily
○ Then benchmark the high availability cluster, with continuous backup and replicas
■ Pay attention to WAL archiving, streaming, WAL restore, replay, and so on …
● Evaluate introduction of failover and switchover events in benchmarks (chaos)
○ Observe the cluster and always consider your RPO and RTO goals
● Study Postgres, love Postgres!
○ There are so many features you might not know that Postgres already has!

“
Thank you!
DoK #109 webinar - Benchmarking for PostgreSQL workloads in Kubernetes (part 2)
Gabriele Bartolini - @_GBartolini_
Watch part 1!

Benchmarking for postgresql workloads in kubernetes

More Related Content

Benchmarking for postgresql workloads in kubernetes