SlideShare a Scribd company logo
Postgres the Hardway!
Dave Pitts
Technical Manager, Blackboard Global DBA Ops
(Oracle and Postgres)
Not a typical Oracle DBA ...
"You're typical Oracle DBA spends their day-off downloading and exploring the
last version of Oracle" - Willem Eenink (my new colleague in AMS office)
I've been working (as a Java Developer) with Oracle DBs (8i) since 2001 and as a
DBA since 2004 (8i, 9i, 10g, 11g, 12c …)
I've been working with Postgres since 2015 (pg94 & pg96)
… and on my day-off I'm downloading & setting up Postgres projects (pg10)
running in VMs and demoing Postgres gotchas + cool new features!
Python the HardWay
● Not because Postgres is hard or even Postgres adoption is hard .. more that
hiring experienced Postgres DBAs is a challenge.
● Talking is great but doing is better (well maybe a bit of both is best) .
● One of the key points was too get hands on and explore - don't just cut and
paste
● This book / website has had a significant revamp since I was first learning
python in 2011/12 ...
Python the HardWay
:
Source: https://learnpythonthehardway.org/book/intro.html
A little about Bb from AWS Principal Engineer Jim Mlodgenski
Jim (also lead Philly PUG) with Akbar Basha (Senior DBA Bb Chennai)
Some of the challenges of working in AWS/RDS
● aws cli (bash) - rds & cloudwatch automation
● DevOps tooling - CD via Jenkins, Configuration Management (AWS
Cloudformation & Terraform)
● Also using aws boto2 sdk for python (ruby sdk as well)
● Sizing RDS Instances choices - db.m4 vs db.r4 range
● New db.m5 & db.r5 ranges - big network/ebs throughput gains!
● AWS Cost Explorer - detailed analysis of RDS Spend and cost effectiveness
● Monitoring & Alerting - Cloudwatch (regular & enhanced monitoring)
● Other AWS Products: EC2, s3, Cloudtrail, VPC & Security Groups, Aurora ...
● Storage considerations PIOPs vs GP2 - this is a hard question
Storage considerations PIOPs vs GP2
Source: https://aws.amazon.com/blogs/aws/now-available-16-tb-and-20000-iops-elastic-block-store-ebs-volumes/
vs https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html (5334GB >> 16000 PIOPs - to check)
Some of the challenges of working in AWS/RDS
Source: https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/
Some of the challenges of working in AWS/RDS
Source: https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/
Objectives of Postgres the Hardway - handons training
● Demo some of the Postgres Gotcha - especially for Oracle DBAs
● Get some hand-ons experience of what Postgres looks like behind the scenes
● RDS offers no backend access - which takes some getting used
● Most of my team are very experiences on running Oracle on Linux
(RHEL/Centos)
● I'm a huge vagrant fan (bizarrely my popular github.com project is for
automating the setup a VM to run angular.js via vagrant)
● As well as automating the build, configuration and destruction of VMs, you
can also do cool stuff like setting up virtual disk arrays (my Oracle ASM
project is one of my favorite github projects)
● I also want to build up my teams git cli skills
Who is using Virtualbox and/or Vagrant for training?
Vagrant is an open-source software product for building and maintaining portable virtual software
development environments, e.g. for VirtualBox, Hyper-V, Docker containers, VMware, and AWS. It tries to
simplify software configuration management of virtualizations in order to increase development
productivity.
Written in: Ruby
Operating system: Linux, FreeBSD, macOS, and Microsoft Windows
License: MIT License
Initial release: March 8, 2010; 8 years ago
Original author(s): Mitchell Hashimoto
https://en.wikipedia.org/wiki/Vagrant_(software)
Vagrant 101:
1) vi Vagrantfile
2) vagrant up
3) vagrant ssh
4) vagrant reload
5) vagrant halt
6) vagrant destroy
..
Lets demo vagrant in 60
seconds...
Source: https://www.softqubes.com/blog/introduction-of-vagrant-development/
Quick demo "hello vagrant world" : cat Vagrantfile
~/projects/vagrant-centos7-helloworld $ cat Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--memory", "128"]
vb.customize ["modifyvm", :id, "--cpus", 1]
end
#config.vm.box = "centos/7"
config.vm.box = "https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-1804_02.VirtualBox.box"
config.vm.hostname = "centos7demo"
config.vm.network :forwarded_port, guest: 5432, host: 9511
config.vm.provision :shell, :path => "provision.sh"
end
Source: https://github.com/dgapitts/vagrant-centos7-helloworld
Quick demo "hello vagrant world" : cat provision.sh
~/projects/vagrant-centos7-helloworld# cat provision.sh
#! /bin/bash
if [ ! -f /home/vagrant/already-installed-flag ]
then
echo "Add extra alias to .bash_profile for vagrant and root users"
cat /vagrant/bashprofile.append.txt >> /home/vagrant/.bash_profile
cat /vagrant/bashprofile.append.txt >> /root/.bash_profile
echo 'provision script run at : ' `date` | tee -a /home/vagrant/hello_vagrant_world.txt
else
echo "already installed flag set : /home/vagrant/already-installed-flag"
fi
~/projects/vagrant-centos7-helloworld# head -5 bashprofile.append.txt
PS1='[h:u:w] # '
alias saru30='sar -u | head -3 ; sar -u |tail -30'
alias sarq30='sar -q | head -3 ; sar -q |tail -30'
alias h='history'
alias l40='ls -ltr|tail -40'
Source: https://github.com/dgapitts/vagrant-centos7-helloworld
Quick demo "hello vagrant world" : vagrant up
~/projects/vagrant-centos7-helloworld $ time vagrant up
…
==> default: Setting the name of the VM: vagrant-centos7-helloworld_default_1551645270773_30671
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 5432 (guest) => 9511 (host) (adapter 1)
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default:
default: Vagrant insecure key detected. Vagrant will automatically replace
default: this with a newly generated keypair for better security.
default:
...
Source: https://github.com/dgapitts/vagrant-centos7-helloworld
Quick demo "hello vagrant world" : VirtualBox Manager
==> default: Setting hostname...
==> default: Rsyncing folder: /Users/dpitts/projects/vagrant-centos7-helloworld/ => /vagrant
==> default: Running provisioner: shell…
default: extra alias added to .bash_profile
default: provision script run at : Sun Mar 3 20:34:55 UTC 2019
real 0m33.390s
user 0m5.085s
sys 0m2.377s
Quick demo "hello vagrant world" : vagrant ssh
~/projects/vagrant-centos7-helloworld $ vagrant ssh
[centos7demo:vagrant:~] # cat hello_vagrant_world.txt
provision script run at : Sun Mar 3 20:34:55 UTC 2019
[centos7demo:vagrant:~] # ls -ltr /vagrant/
total 56
-rw-r--r--. 1 vagrant vagrant 537 Mar 3 13:09 Vagrantfile.bak
-rw-r--r--. 1 vagrant vagrant 35149 Mar 3 13:09 LICENSE
-rw-r--r--. 1 vagrant vagrant 279 Mar 3 13:09 bashprofile.append.txt
-rw-r--r--. 1 vagrant vagrant 563 Mar 3 19:55 README.md
-rw-r--r--. 1 vagrant vagrant 536 Mar 3 20:26 Vagrantfile
-rw-r--r--. 1 vagrant vagrant 415 Mar 3 20:33 provision.sh
[centos7demo:vagrant:~] # id
uid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant) ,,,
[centos7demo:vagrant:~] # sudo -i
[centos7demo:root:~] # id
uid=0(root) gid=0(root) groups=0(root) ...
Source: https://github.com/dgapitts/vagrant-centos7-helloworld
Vagrantfile - base for my pg10 centos7 VM
~/projects/vagrant-postgres10 $ cat Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--memory", "512"]
vb.customize ["modifyvm", :id, "--cpus", 2]
end
#config.vm.box = "centos/7"
config.vm.box =
"https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-1804_02.VirtualBox
.box"
config.vm.hostname = "pg10centos7"
config.vm.network :forwarded_port, guest: 5432, host: 9510
config.vm.provision :shell, :path => "provision.sh"
end
Source: https://github.com/dgapitts/vagrant-postgres10
Provisioning via a shell script - focus on db specifics
~/projects/vagrant-postgres10 $ grep -A10 sysstat provision.sh
...
# install sysstat - sar is use for db monitoring
yum -y install sysstat
systemctl start sysstat
systemctl enable sysstat
# Regular pg10 install
rpm -Uvh https://yum.postgresql.org/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm
yum -y update
yum -y install postgresql10-server postgresql10
# Extra packages and devtool for install of https://github.com/ossc-db/pg_plan_advsr
yum -y install postgresql-devel
yum -y install gcc
Query Optimization day: Warsaw pgconf.eu 2017
https://www.postgresql.eu/events/pgconfeu2017/sessions/session/1584-postgresql-query-optimization-techniques-step-by-step/
Thanks again Ilya Kosmodemiansky
& Victor Yegorov (Data Egret)!
"Bad Stuff" (from my notes)
- NOT IN … should use EXIST
- JOIN instead of IN/EXIST
- Unordered LIMIT
- Order by random() << business logic?
NB You can just see Feike (Steenbergen) talk
on pg_stats_statements and track_io_timing (top right)
NB2 Alexander Kukushkin on AWS Gotchas around
Burstable Balance monitoring for EC2 and RDS
:
pg-ora-demo-scripts - postgres-gotcha01-not-in - initial setup
Short and simple script pgbench but with one extra index :
pgbench -i -s 9 -h localhost -p 5432 -U bench1 -d bench1
psql -U bench1 -c "create index pgbench_accounts_bid on pgbench_accounts(bid);"
Also so we can do a fair postgres vs oracle comparison, I export/import the same dataset via a .csv :
copy (SELECT * FROM pgbench_branches) to 'pgbench_branches.csv' with csv
copy (SELECT * FROM pgbench_accounts) to 'pgbench_accounts.csv' with csv
create tables in oracle
create table pgbench_branches(bid number,bbalance number, filler character(88), CONSTRAINT pgbench_branches_pk PRIMARY KEY (bid));
create table pgbench_accounts(aid number,bid number,bbalance number,filler character(88),CONSTRAINT pgbench_accounts_pk PRIMARY KEY(aid));
create index pgbench_accounts_bid on pgbench_accounts(bid);
load data via sqlloader ctl files
$ cat load_accounts.ctl
load data
infile pgbench_accounts.csv
into table pgbench_accounts
fields terminated by ','
(aid, bid, bbalance)
...
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
pg-ora-demo-scripts - postgres-gotcha01-not-in - pg bad plan
Problem version of the query using NOT IN condition :
-bash-4.2$ psql -U bench1 -f postgres-gotcha01-not-in-pgbench-demo.sql
explain (analyze, buffers) select count(bid) from pgbench_branches
where bid NOT IN (select bid from pgbench_accounts);
Aggregate (cost=151426.55..151426.56 rows=1 width=8) (actual time=1099.608..1099.608 rows=1 loops=1)
Buffers: shared hit=14 read=2450, temp read=7704 written=1539
-> Seq Scan on pgbench_branches (cost=0.42..151426.54 rows=4 width=4) (actual time=953.373..1099.602 rows=2
loops=1)
Filter: (NOT (SubPlan 1))
Rows Removed by Filter: 9
Buffers: shared hit=14 read=2450, temp read=7704 written=1539
SubPlan 1
-> Materialize (cost=0.42..31400.42 rows=900000 width=4) (actual time=0.014..57.693 rows=490910 loops=11)
Buffers: shared hit=13 read=2450, temp read=7704 written=1539
-> Index Only Scan using pgbench_accounts_bid on pgbench_accounts
(cost=0.42..23384.42 rows=900000 width=4) (actual time=0.007..109.876 rows=900000 loops=1)
Heap Fetches: 0
Buffers: shared hit=13 read=2450
Planning time: 0.058 ms
Execution time: 1101.056 ms
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
pg-ora-demo-scripts - postgres-gotcha01-not-in - pg good plan
Improved version of the query using NOT EXISTS and explicit "where account.bid = branch.bid" clause :
...
explain (analyze, buffers) select count(bid) from pgbench_branches
where NOT EXISTS (select * from pgbench_accounts account where account.bid = branch.bid);
Aggregate (cost=5.54..5.55 rows=1 width=8) (actual time=0.211..0.211 rows=1 loops=1)
Buffers: shared hit=29
-> Nested Loop Anti Join (cost=0.42..5.54 rows=1 width=4) (actual time=0.208..0.208 rows=0 loops=1)
Buffers: shared hit=29
-> Seq Scan on pgbench_branches branch
(cost=0.00..1.09 rows=9 width=4) (actual time=0.004..0.006 rows=9 loops=1)
Buffers: shared hit=1
-> Index Only Scan using pgbench_accounts_bid on pgbench_accounts account
(cost=0.42..2483.76 rows=100000 width=4) (actual time=0.022..0.022 rows=1 loops=9)
Index Cond: (bid = branch.bid)
Heap Fetches: 0
Buffers: shared hit=28
Planning time: 0.735 ms
Execution time: 0.292 ms
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
pg-ora-demo-scripts - oracle slightly convoluted plan but fast
NESTED LOOPS (ANTI) - Performs a nested loops anti join between two row sources
http://www.juliandyke.com/Optimisation/Operations/NestedLoopsAnti.php
The NOT IN version in Oracle uses "NESTED LOOPS ANTI" and a slightly convoluted plan (PGBENCH_ACCOUNTS FULL TABLE
SCAN and PGBENCH_ACCOUNTS_BID INDEX RANGE SCAN) a bit slower but this still only takes 20ms (0.02 secs):
SQL> select count(bid) from pgbench_branches where bid NOT IN (select bid from pgbench_accounts);
Elapsed: 00:00:00.02
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 24 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
|* 2 | FILTER | | | | | |
| 3 | NESTED LOOPS ANTI SNA| | 3 | 78 | 24 (92)| 00:00:01 |
| 4 | INDEX FULL SCAN | PGBENCH_BRANCHES_PK | 3 | 39 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | PGBENCH_ACCOUNTS_BID | 1 | 13 | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | PGBENCH_ACCOUNTS | 7 | 91 | 22 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
2 - filter( NOT EXISTS (SELECT 0 FROM "PGBENCH_ACCOUNTS" "PGBENCH_ACCOUNTS" WHERE "BID" IS NULL))
5 - access("BID"="BID")
6 - filter("BID" IS NULL)
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
pg-ora-demo-scripts - oracle simple plan with NOT EXIST
NESTED LOOPS (ANTI) - Performs a nested loops anti join between two row sources
http://www.juliandyke.com/Optimisation/Operations/NestedLoopsAnti.php
The NOT EXIST version in Oracle uses "NESTED LOOPS ANTI" again but with simpler INDEX RANGE SCAN|
PGBENCH_ACCOUNTS_BID and takes about 10 ms (i.e. 0.01 secs):
SQL> select count(bid) from pgbench_branches branch where NOT EXISTS (select * from pgbench_accounts account where
account.bid = branch.bid);
Elapsed: 00:00:00.01
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
| 2 | NESTED LOOPS ANTI| | 3 | 78 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PGBENCH_BRANCHES_PK | 3 | 39 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| PGBENCH_ACCOUNTS_BID | 1 | 13 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
4 - access("ACCOUNT"."BID"="BRANCH"."BID")
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
Postges worst practises!? Lisbon pgconf.2018
Date: 2018-10-24 Ilya Kosmodemiansky pgconf.eu 2018
Linux file handles & kernel memory management issues
● EAV (entity attribute value) pain
- schemaless 1000+ tables
● max_connections limit of 1000 - add more
● don't use pgbouncer (easier than pgpool)
Hanging transactions & non-removable dead rows
● Postgres likes long transactions
e.g. external service calls for email
● Attend Bruce's MVCC unmasked talk
and start coding xmin, xmax logic for yourself
pg gotcha - too many tables X too many processes
● "In set theory (and, usually, in other parts of mathematics), a Cartesian product is a mathematical
operation that returns a set (or product set or simply product) from multiple sets." Wikipedia
● Fairly simple to replicate with pgbench
● Custom setup and execution pgbench files for 10,000 tables setup
● Keep adding work processes and over time (as each work process queries more tables)
● Monitor via OS tools like sar and lsof
● I hear this is a common gotcha when migrating to postgres
Background postgres data_directory & pg_class.relfilenode ...
Starting with first table :
postgres=# create table tab1 (pk integer primary key,
col1 varchar(30));
CREATE TABLE
but where is my table physically stored at the OS level?
Well the base directory is data_directory:
postgres=# show data_directory;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
lets snapshot via tree the data_directory:
[pg10centos7:postgres:~/10/data] # tree > /tmp/temp1
and now create a 2nd table:
postgres=# create table tab2 (pk integer primary key,
col1 varchar(30));
CREATE TABLE
and now compare before & after 'snapshot of the
filesystem':
[pg10centos7:postgres:~/10/data] # tree > /tmp/temp2
[pg10centos7:postgres:~/10/data] # diff /tmp/temp1
/tmp/temp2
648a649,650
> │ ├── 16402
993c995
< 26 directories, 964 files
---
> 26 directories, 966 files
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/setup
Background postgres data_directory & pg_class.relfilenode ...
so now we can
less /tmp/temp2
.
├── base
│ ├── 1
│ │ ├── 112
│ │ ├── 113
│ │ ├── 1247
...
└── 12953
│ ├── 112
│ ├── 113
...
│ ├── 16402
and in pg_class
postgres=# SELECT relname, relfilenode FROM pg_class
WHERE relname = 'tab2';
relname | relfilenode
---------+-------------
tab2 | 16402
(1 row)
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/setup
10,000 table test … don't do this in production!
I started with the extremely simple python script
~/projects/pg-ora-demo-scripts/memtest $ cat gen_sql.py
for i in range(10001):
print "create table tab"+str(i)+" (pk integer primary key,
col1 varchar(30)); "
and then generate you CREATE SQL statements via
~/projects/pg-ora-demo-scripts/memtest $ python gen_sql.py >
create_table.sql
~/projects/pg-ora-demo-scripts/memtest $ head -3
create_table.sql
create table tab0 (pk integer primary key, col1 varchar(30));
create table tab1 (pk integer primary key, col1 varchar(30));
create table tab2 (pk integer primary key, col1 varchar(30));
custom pgbench script (generated via simple .py script)
[pg10centos7:postgres:~/pg-ora-demo-scripts/memtest/test
2_no_sleep] # cat custom_bench_nowait.sql
select * from tab0;
select * from tab1;
..
select * from tab10000;
using custom pgbench script with -f option
# pgbench -c 10 -j 10 -T 600 -f custom_bench_nowait.sql
&
Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/setup
10,000 table test … don't do this in production!
now running some OS monitoring via simple uptime and lsof
[root@pg10centos7 ~]# for i in {1..1000};do uptime;lsof | grep postgres | wc -l;sleep 60;done
22:57:39 up 1:26, 3 users, load average: 0.05, 0.03, 0.06
14
22:58:39 up 1:27, 3 users, load average: 0.65, 0.17, 0.10
11505
...
23:02:48 up 1:31, 3 users, load average: 15.78, 7.47, 3.03
22496
23:03:53 up 1:33, 3 users, load average: 24.25, 11.45, 4.70
33499
23:05:00 up 1:34, 3 users, load average: 33.60, 16.57, 6.92
44470
...
NB I do have another nice demo rerunning the same pgbench scripts and showing the gains via pgbouncer (with 20 & 10
connection pool) : https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/test3_pgbouncer_poolsize_10_vs_20
SE-Radio 328: Bruce Momjian - Postgres Query Planner
● I'm a big fan of Software Engineering Radio -
been following this podcast for over a
decade
● I was super excited last June when they had
Bruce Momjian talking about Postgres in
general and then focusing on the Postgres
Query Planner.

Ref:: http://www.se-radio.net/2018/06/se-radio-episode-328-bruce-momjian-on-the-postgres-query-planner
SE-Radio 328: Bruce Momjian - Postgres Query Planner
● I was not disappointed … this was an exceptionally good discussion & interesting broadcast, going
from the base principles of Relational DBs, how SQL has become one of the most popular languages
and onto how the Query Planner works
● One comment (47 min) caught my attention:
"An entire team whose sole job was to study query plans … I'm not sure your going to need to do that ..
and when they did switch [to postgres] those people did not have a useful role in the organization"
● I know exactly what Bruce means and if Postgres could reduce/eliminate this operational overhead,
that is seriously impressively.
Ref:: http://www.se-radio.net/2018/06/se-radio-episode-328-bruce-momjian-on-the-postgres-query-planner
Execution Plan Management and Optimizer Hints
● "Q:How do I a lock a plan to … A: Why you want to do that ... you're probably not going to need do
that in Postgres?" (49 mins).
● Aversion to INDEX HINTs in Postgres Core?
● Very interesting work from Japanese "NTT OSS Center DBMS Development and Support Team"
● https://github.com/ossc-db/pg_hint_plan "Give PostgreSQL ability to manually force some decisions
in execution plans." This plan is already available to RDS users.
● https://github.com/ossc-db/pg_plan_advsr "PostgreSQL extension for automated execution plan
tuning" (Tattsu Yama - presentation in Lisbon - pgconf.eu 2018)
● First challenge for me is just getting this to compile i.e. make >> install >> load
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2132-auto-plan-tuning-using-feedback-loop
Setting up pg_plan_advsr - need devtools?
Tattsu … the next problem is harder:
[pg10centos7:postgres:~/test_install/pg_hint_plan-REL10_1_3_2] # make && make install
gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_SCORE_ADJ=0 -Wall
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -fPIC -I. -I. -I/usr/include/pgsql/server -I/usr/include/pgsql/internal
-D_GNU_SOURCE -I/usr/include/libxml2 -c -o pg_hint_plan.o pg_hint_plan.c
pg_hint_plan.c:54:33: fatal error: access/htup_details.h: No such file or directory
#include "access/htup_details.h"
^
compilation terminated.
make: *** [pg_hint_plan.o] Error 1
the .h doesn't appear to exist on my VM:
[pg10centos7:root:/var/log/pgbouncer] # find / -name 'htup_details.h'
[pg10centos7:root:/var/log/pgbouncer] #
Why do I have pg10 server & pg9.2 client tooling?
after checking pg_config it is not quite to clear
[pg10centos7:postgres:~] # pg_config
..
PKGINCLUDEDIR = /usr/include/pgsql
INCLUDEDIR-SERVER = /usr/include/pgsql/server
..
VERSION = PostgreSQL 9.2.24
So while I think the access issue might relate that my psql client/access version is pg9.2 and not pg10
[pg10centos7:postgres:~] # psql
psql (9.2.24, server 10.6)
postgres=# select version();
version
----------------------------------------------------------------------------------------------
PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28),
64-bit
(1 row)
Why do I have pg10 server & pg9.2 client tooling?
(continued)
and checking (client?) access library instead of htup_details.h I appear to have htup.h:
[pg10centos7:postgres:~] # ls -ltr /usr/include/pgsql/server/access/h*.h
-rw-r--r--. 1 root root 32901 Aug 23 2018 /usr/include/pgsql/server/access/htup.h
-rw-r--r--. 1 root root 1280 Aug 23 2018 /usr/include/pgsql/server/access/hio.h
-rw-r--r--. 1 root root 6397 Aug 23 2018 /usr/include/pgsql/server/access/heapam.h
-rw-r--r--. 1 root root 13178 Aug 23 2018 /usr/include/pgsql/server/access/hash.h
[pg10centos7:postgres:~] #
Explicit pg10 version of postgresql10-libs, postgresql10-devel ...
~/projects/vagrant-postgres10v2:DaveP# git diff provision.sh
diff --git a/provision.sh b/provision.sh
index 08dc65d..d0f8c0e 100644
--- a/provision.sh
+++ b/provision.sh
@@ -23,15 +23,16 @@ then
rpm -Uvh https://yum.postgresql.org/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm
yum -y update
yum -y install postgresql10-server postgresql10
+ yum -y install postgresql10-contrib postgresql10-libs postgresql10-devel
# Extra packages and devtool for install of https://github.com/ossc-db/pg_plan_advsr
- yum -y install postgresql-devel
+ #yum -y install postgresql-devel
yum -y install gcc
...
~/projects/vagrant-postgres10v2:DaveP# git commit -m 'fix pg client tools (e.g. psql and pg_config)
- these were pg9.2 before and now pg10.7'
...
Next steps
● Complete documentation of pg_hint_plan and sample use cases
● Complete automation of the pg_plan_advsr setup
● Complete pgbouncer setup (current minor issue with the authentication)
● Demo pg10 correlated statistics & pg11 functional stats
● Vagrant - Docker - Postgres demo? As suggested at the PUG Amsterdam, March 2019
Q & A ?

More Related Content

Postgres the hardway

  • 1. Postgres the Hardway! Dave Pitts Technical Manager, Blackboard Global DBA Ops (Oracle and Postgres)
  • 2. Not a typical Oracle DBA ... "You're typical Oracle DBA spends their day-off downloading and exploring the last version of Oracle" - Willem Eenink (my new colleague in AMS office) I've been working (as a Java Developer) with Oracle DBs (8i) since 2001 and as a DBA since 2004 (8i, 9i, 10g, 11g, 12c …) I've been working with Postgres since 2015 (pg94 & pg96) … and on my day-off I'm downloading & setting up Postgres projects (pg10) running in VMs and demoing Postgres gotchas + cool new features!
  • 3. Python the HardWay ● Not because Postgres is hard or even Postgres adoption is hard .. more that hiring experienced Postgres DBAs is a challenge. ● Talking is great but doing is better (well maybe a bit of both is best) . ● One of the key points was too get hands on and explore - don't just cut and paste ● This book / website has had a significant revamp since I was first learning python in 2011/12 ...
  • 4. Python the HardWay : Source: https://learnpythonthehardway.org/book/intro.html
  • 5. A little about Bb from AWS Principal Engineer Jim Mlodgenski
  • 6. Jim (also lead Philly PUG) with Akbar Basha (Senior DBA Bb Chennai)
  • 7. Some of the challenges of working in AWS/RDS ● aws cli (bash) - rds & cloudwatch automation ● DevOps tooling - CD via Jenkins, Configuration Management (AWS Cloudformation & Terraform) ● Also using aws boto2 sdk for python (ruby sdk as well) ● Sizing RDS Instances choices - db.m4 vs db.r4 range ● New db.m5 & db.r5 ranges - big network/ebs throughput gains! ● AWS Cost Explorer - detailed analysis of RDS Spend and cost effectiveness ● Monitoring & Alerting - Cloudwatch (regular & enhanced monitoring) ● Other AWS Products: EC2, s3, Cloudtrail, VPC & Security Groups, Aurora ... ● Storage considerations PIOPs vs GP2 - this is a hard question
  • 8. Storage considerations PIOPs vs GP2 Source: https://aws.amazon.com/blogs/aws/now-available-16-tb-and-20000-iops-elastic-block-store-ebs-volumes/ vs https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html (5334GB >> 16000 PIOPs - to check)
  • 9. Some of the challenges of working in AWS/RDS Source: https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/
  • 10. Some of the challenges of working in AWS/RDS Source: https://aws.amazon.com/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/
  • 11. Objectives of Postgres the Hardway - handons training ● Demo some of the Postgres Gotcha - especially for Oracle DBAs ● Get some hand-ons experience of what Postgres looks like behind the scenes ● RDS offers no backend access - which takes some getting used ● Most of my team are very experiences on running Oracle on Linux (RHEL/Centos) ● I'm a huge vagrant fan (bizarrely my popular github.com project is for automating the setup a VM to run angular.js via vagrant) ● As well as automating the build, configuration and destruction of VMs, you can also do cool stuff like setting up virtual disk arrays (my Oracle ASM project is one of my favorite github projects) ● I also want to build up my teams git cli skills
  • 12. Who is using Virtualbox and/or Vagrant for training? Vagrant is an open-source software product for building and maintaining portable virtual software development environments, e.g. for VirtualBox, Hyper-V, Docker containers, VMware, and AWS. It tries to simplify software configuration management of virtualizations in order to increase development productivity. Written in: Ruby Operating system: Linux, FreeBSD, macOS, and Microsoft Windows License: MIT License Initial release: March 8, 2010; 8 years ago Original author(s): Mitchell Hashimoto https://en.wikipedia.org/wiki/Vagrant_(software)
  • 13. Vagrant 101: 1) vi Vagrantfile 2) vagrant up 3) vagrant ssh 4) vagrant reload 5) vagrant halt 6) vagrant destroy .. Lets demo vagrant in 60 seconds... Source: https://www.softqubes.com/blog/introduction-of-vagrant-development/
  • 14. Quick demo "hello vagrant world" : cat Vagrantfile ~/projects/vagrant-centos7-helloworld $ cat Vagrantfile # -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| config.vm.provider :virtualbox do |vb| vb.customize ["modifyvm", :id, "--memory", "128"] vb.customize ["modifyvm", :id, "--cpus", 1] end #config.vm.box = "centos/7" config.vm.box = "https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-1804_02.VirtualBox.box" config.vm.hostname = "centos7demo" config.vm.network :forwarded_port, guest: 5432, host: 9511 config.vm.provision :shell, :path => "provision.sh" end Source: https://github.com/dgapitts/vagrant-centos7-helloworld
  • 15. Quick demo "hello vagrant world" : cat provision.sh ~/projects/vagrant-centos7-helloworld# cat provision.sh #! /bin/bash if [ ! -f /home/vagrant/already-installed-flag ] then echo "Add extra alias to .bash_profile for vagrant and root users" cat /vagrant/bashprofile.append.txt >> /home/vagrant/.bash_profile cat /vagrant/bashprofile.append.txt >> /root/.bash_profile echo 'provision script run at : ' `date` | tee -a /home/vagrant/hello_vagrant_world.txt else echo "already installed flag set : /home/vagrant/already-installed-flag" fi ~/projects/vagrant-centos7-helloworld# head -5 bashprofile.append.txt PS1='[h:u:w] # ' alias saru30='sar -u | head -3 ; sar -u |tail -30' alias sarq30='sar -q | head -3 ; sar -q |tail -30' alias h='history' alias l40='ls -ltr|tail -40' Source: https://github.com/dgapitts/vagrant-centos7-helloworld
  • 16. Quick demo "hello vagrant world" : vagrant up ~/projects/vagrant-centos7-helloworld $ time vagrant up … ==> default: Setting the name of the VM: vagrant-centos7-helloworld_default_1551645270773_30671 ==> default: Clearing any previously set network interfaces... ==> default: Preparing network interfaces based on configuration... default: Adapter 1: nat ==> default: Forwarding ports... default: 5432 (guest) => 9511 (host) (adapter 1) default: 22 (guest) => 2222 (host) (adapter 1) ==> default: Running 'pre-boot' VM customizations... ==> default: Booting VM... ==> default: Waiting for machine to boot. This may take a few minutes... default: SSH address: 127.0.0.1:2222 default: SSH username: vagrant default: SSH auth method: private key default: default: Vagrant insecure key detected. Vagrant will automatically replace default: this with a newly generated keypair for better security. default: ... Source: https://github.com/dgapitts/vagrant-centos7-helloworld
  • 17. Quick demo "hello vagrant world" : VirtualBox Manager ==> default: Setting hostname... ==> default: Rsyncing folder: /Users/dpitts/projects/vagrant-centos7-helloworld/ => /vagrant ==> default: Running provisioner: shell… default: extra alias added to .bash_profile default: provision script run at : Sun Mar 3 20:34:55 UTC 2019 real 0m33.390s user 0m5.085s sys 0m2.377s
  • 18. Quick demo "hello vagrant world" : vagrant ssh ~/projects/vagrant-centos7-helloworld $ vagrant ssh [centos7demo:vagrant:~] # cat hello_vagrant_world.txt provision script run at : Sun Mar 3 20:34:55 UTC 2019 [centos7demo:vagrant:~] # ls -ltr /vagrant/ total 56 -rw-r--r--. 1 vagrant vagrant 537 Mar 3 13:09 Vagrantfile.bak -rw-r--r--. 1 vagrant vagrant 35149 Mar 3 13:09 LICENSE -rw-r--r--. 1 vagrant vagrant 279 Mar 3 13:09 bashprofile.append.txt -rw-r--r--. 1 vagrant vagrant 563 Mar 3 19:55 README.md -rw-r--r--. 1 vagrant vagrant 536 Mar 3 20:26 Vagrantfile -rw-r--r--. 1 vagrant vagrant 415 Mar 3 20:33 provision.sh [centos7demo:vagrant:~] # id uid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant) ,,, [centos7demo:vagrant:~] # sudo -i [centos7demo:root:~] # id uid=0(root) gid=0(root) groups=0(root) ... Source: https://github.com/dgapitts/vagrant-centos7-helloworld
  • 19. Vagrantfile - base for my pg10 centos7 VM ~/projects/vagrant-postgres10 $ cat Vagrantfile # -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| config.vm.provider :virtualbox do |vb| vb.customize ["modifyvm", :id, "--memory", "512"] vb.customize ["modifyvm", :id, "--cpus", 2] end #config.vm.box = "centos/7" config.vm.box = "https://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-1804_02.VirtualBox .box" config.vm.hostname = "pg10centos7" config.vm.network :forwarded_port, guest: 5432, host: 9510 config.vm.provision :shell, :path => "provision.sh" end Source: https://github.com/dgapitts/vagrant-postgres10
  • 20. Provisioning via a shell script - focus on db specifics ~/projects/vagrant-postgres10 $ grep -A10 sysstat provision.sh ... # install sysstat - sar is use for db monitoring yum -y install sysstat systemctl start sysstat systemctl enable sysstat # Regular pg10 install rpm -Uvh https://yum.postgresql.org/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm yum -y update yum -y install postgresql10-server postgresql10 # Extra packages and devtool for install of https://github.com/ossc-db/pg_plan_advsr yum -y install postgresql-devel yum -y install gcc
  • 21. Query Optimization day: Warsaw pgconf.eu 2017 https://www.postgresql.eu/events/pgconfeu2017/sessions/session/1584-postgresql-query-optimization-techniques-step-by-step/ Thanks again Ilya Kosmodemiansky & Victor Yegorov (Data Egret)! "Bad Stuff" (from my notes) - NOT IN … should use EXIST - JOIN instead of IN/EXIST - Unordered LIMIT - Order by random() << business logic? NB You can just see Feike (Steenbergen) talk on pg_stats_statements and track_io_timing (top right) NB2 Alexander Kukushkin on AWS Gotchas around Burstable Balance monitoring for EC2 and RDS :
  • 22. pg-ora-demo-scripts - postgres-gotcha01-not-in - initial setup Short and simple script pgbench but with one extra index : pgbench -i -s 9 -h localhost -p 5432 -U bench1 -d bench1 psql -U bench1 -c "create index pgbench_accounts_bid on pgbench_accounts(bid);" Also so we can do a fair postgres vs oracle comparison, I export/import the same dataset via a .csv : copy (SELECT * FROM pgbench_branches) to 'pgbench_branches.csv' with csv copy (SELECT * FROM pgbench_accounts) to 'pgbench_accounts.csv' with csv create tables in oracle create table pgbench_branches(bid number,bbalance number, filler character(88), CONSTRAINT pgbench_branches_pk PRIMARY KEY (bid)); create table pgbench_accounts(aid number,bid number,bbalance number,filler character(88),CONSTRAINT pgbench_accounts_pk PRIMARY KEY(aid)); create index pgbench_accounts_bid on pgbench_accounts(bid); load data via sqlloader ctl files $ cat load_accounts.ctl load data infile pgbench_accounts.csv into table pgbench_accounts fields terminated by ',' (aid, bid, bbalance) ... Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
  • 23. pg-ora-demo-scripts - postgres-gotcha01-not-in - pg bad plan Problem version of the query using NOT IN condition : -bash-4.2$ psql -U bench1 -f postgres-gotcha01-not-in-pgbench-demo.sql explain (analyze, buffers) select count(bid) from pgbench_branches where bid NOT IN (select bid from pgbench_accounts); Aggregate (cost=151426.55..151426.56 rows=1 width=8) (actual time=1099.608..1099.608 rows=1 loops=1) Buffers: shared hit=14 read=2450, temp read=7704 written=1539 -> Seq Scan on pgbench_branches (cost=0.42..151426.54 rows=4 width=4) (actual time=953.373..1099.602 rows=2 loops=1) Filter: (NOT (SubPlan 1)) Rows Removed by Filter: 9 Buffers: shared hit=14 read=2450, temp read=7704 written=1539 SubPlan 1 -> Materialize (cost=0.42..31400.42 rows=900000 width=4) (actual time=0.014..57.693 rows=490910 loops=11) Buffers: shared hit=13 read=2450, temp read=7704 written=1539 -> Index Only Scan using pgbench_accounts_bid on pgbench_accounts (cost=0.42..23384.42 rows=900000 width=4) (actual time=0.007..109.876 rows=900000 loops=1) Heap Fetches: 0 Buffers: shared hit=13 read=2450 Planning time: 0.058 ms Execution time: 1101.056 ms Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
  • 24. pg-ora-demo-scripts - postgres-gotcha01-not-in - pg good plan Improved version of the query using NOT EXISTS and explicit "where account.bid = branch.bid" clause : ... explain (analyze, buffers) select count(bid) from pgbench_branches where NOT EXISTS (select * from pgbench_accounts account where account.bid = branch.bid); Aggregate (cost=5.54..5.55 rows=1 width=8) (actual time=0.211..0.211 rows=1 loops=1) Buffers: shared hit=29 -> Nested Loop Anti Join (cost=0.42..5.54 rows=1 width=4) (actual time=0.208..0.208 rows=0 loops=1) Buffers: shared hit=29 -> Seq Scan on pgbench_branches branch (cost=0.00..1.09 rows=9 width=4) (actual time=0.004..0.006 rows=9 loops=1) Buffers: shared hit=1 -> Index Only Scan using pgbench_accounts_bid on pgbench_accounts account (cost=0.42..2483.76 rows=100000 width=4) (actual time=0.022..0.022 rows=1 loops=9) Index Cond: (bid = branch.bid) Heap Fetches: 0 Buffers: shared hit=28 Planning time: 0.735 ms Execution time: 0.292 ms Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
  • 25. pg-ora-demo-scripts - oracle slightly convoluted plan but fast NESTED LOOPS (ANTI) - Performs a nested loops anti join between two row sources http://www.juliandyke.com/Optimisation/Operations/NestedLoopsAnti.php The NOT IN version in Oracle uses "NESTED LOOPS ANTI" and a slightly convoluted plan (PGBENCH_ACCOUNTS FULL TABLE SCAN and PGBENCH_ACCOUNTS_BID INDEX RANGE SCAN) a bit slower but this still only takes 20ms (0.02 secs): SQL> select count(bid) from pgbench_branches where bid NOT IN (select bid from pgbench_accounts); Elapsed: 00:00:00.02 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 26 | 24 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 26 | | | |* 2 | FILTER | | | | | | | 3 | NESTED LOOPS ANTI SNA| | 3 | 78 | 24 (92)| 00:00:01 | | 4 | INDEX FULL SCAN | PGBENCH_BRANCHES_PK | 3 | 39 | 1 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | PGBENCH_ACCOUNTS_BID | 1 | 13 | 1 (0)| 00:00:01 | |* 6 | TABLE ACCESS FULL | PGBENCH_ACCOUNTS | 7 | 91 | 22 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): 2 - filter( NOT EXISTS (SELECT 0 FROM "PGBENCH_ACCOUNTS" "PGBENCH_ACCOUNTS" WHERE "BID" IS NULL)) 5 - access("BID"="BID") 6 - filter("BID" IS NULL) Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
  • 26. pg-ora-demo-scripts - oracle simple plan with NOT EXIST NESTED LOOPS (ANTI) - Performs a nested loops anti join between two row sources http://www.juliandyke.com/Optimisation/Operations/NestedLoopsAnti.php The NOT EXIST version in Oracle uses "NESTED LOOPS ANTI" again but with simpler INDEX RANGE SCAN| PGBENCH_ACCOUNTS_BID and takes about 10 ms (i.e. 0.01 secs): SQL> select count(bid) from pgbench_branches branch where NOT EXISTS (select * from pgbench_accounts account where account.bid = branch.bid); Elapsed: 00:00:00.01 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 26 | | | | 2 | NESTED LOOPS ANTI| | 3 | 78 | 2 (0)| 00:00:01 | | 3 | INDEX FULL SCAN | PGBENCH_BRANCHES_PK | 3 | 39 | 1 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN| PGBENCH_ACCOUNTS_BID | 1 | 13 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): 4 - access("ACCOUNT"."BID"="BRANCH"."BID") Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/loadtest/postgres-gotcha01-not-in
  • 27. Postges worst practises!? Lisbon pgconf.2018 Date: 2018-10-24 Ilya Kosmodemiansky pgconf.eu 2018 Linux file handles & kernel memory management issues ● EAV (entity attribute value) pain - schemaless 1000+ tables ● max_connections limit of 1000 - add more ● don't use pgbouncer (easier than pgpool) Hanging transactions & non-removable dead rows ● Postgres likes long transactions e.g. external service calls for email ● Attend Bruce's MVCC unmasked talk and start coding xmin, xmax logic for yourself
  • 28. pg gotcha - too many tables X too many processes ● "In set theory (and, usually, in other parts of mathematics), a Cartesian product is a mathematical operation that returns a set (or product set or simply product) from multiple sets." Wikipedia ● Fairly simple to replicate with pgbench ● Custom setup and execution pgbench files for 10,000 tables setup ● Keep adding work processes and over time (as each work process queries more tables) ● Monitor via OS tools like sar and lsof ● I hear this is a common gotcha when migrating to postgres
  • 29. Background postgres data_directory & pg_class.relfilenode ... Starting with first table : postgres=# create table tab1 (pk integer primary key, col1 varchar(30)); CREATE TABLE but where is my table physically stored at the OS level? Well the base directory is data_directory: postgres=# show data_directory; data_directory ------------------------ /var/lib/pgsql/10/data (1 row) lets snapshot via tree the data_directory: [pg10centos7:postgres:~/10/data] # tree > /tmp/temp1 and now create a 2nd table: postgres=# create table tab2 (pk integer primary key, col1 varchar(30)); CREATE TABLE and now compare before & after 'snapshot of the filesystem': [pg10centos7:postgres:~/10/data] # tree > /tmp/temp2 [pg10centos7:postgres:~/10/data] # diff /tmp/temp1 /tmp/temp2 648a649,650 > │ ├── 16402 993c995 < 26 directories, 964 files --- > 26 directories, 966 files Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/setup
  • 30. Background postgres data_directory & pg_class.relfilenode ... so now we can less /tmp/temp2 . ├── base │ ├── 1 │ │ ├── 112 │ │ ├── 113 │ │ ├── 1247 ... └── 12953 │ ├── 112 │ ├── 113 ... │ ├── 16402 and in pg_class postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'tab2'; relname | relfilenode ---------+------------- tab2 | 16402 (1 row) Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/setup
  • 31. 10,000 table test … don't do this in production! I started with the extremely simple python script ~/projects/pg-ora-demo-scripts/memtest $ cat gen_sql.py for i in range(10001): print "create table tab"+str(i)+" (pk integer primary key, col1 varchar(30)); " and then generate you CREATE SQL statements via ~/projects/pg-ora-demo-scripts/memtest $ python gen_sql.py > create_table.sql ~/projects/pg-ora-demo-scripts/memtest $ head -3 create_table.sql create table tab0 (pk integer primary key, col1 varchar(30)); create table tab1 (pk integer primary key, col1 varchar(30)); create table tab2 (pk integer primary key, col1 varchar(30)); custom pgbench script (generated via simple .py script) [pg10centos7:postgres:~/pg-ora-demo-scripts/memtest/test 2_no_sleep] # cat custom_bench_nowait.sql select * from tab0; select * from tab1; .. select * from tab10000; using custom pgbench script with -f option # pgbench -c 10 -j 10 -T 600 -f custom_bench_nowait.sql & Source: https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/setup
  • 32. 10,000 table test … don't do this in production! now running some OS monitoring via simple uptime and lsof [root@pg10centos7 ~]# for i in {1..1000};do uptime;lsof | grep postgres | wc -l;sleep 60;done 22:57:39 up 1:26, 3 users, load average: 0.05, 0.03, 0.06 14 22:58:39 up 1:27, 3 users, load average: 0.65, 0.17, 0.10 11505 ... 23:02:48 up 1:31, 3 users, load average: 15.78, 7.47, 3.03 22496 23:03:53 up 1:33, 3 users, load average: 24.25, 11.45, 4.70 33499 23:05:00 up 1:34, 3 users, load average: 33.60, 16.57, 6.92 44470 ... NB I do have another nice demo rerunning the same pgbench scripts and showing the gains via pgbouncer (with 20 & 10 connection pool) : https://github.com/dgapitts/pg-ora-demo-scripts/tree/master/memtest/test3_pgbouncer_poolsize_10_vs_20
  • 33. SE-Radio 328: Bruce Momjian - Postgres Query Planner ● I'm a big fan of Software Engineering Radio - been following this podcast for over a decade ● I was super excited last June when they had Bruce Momjian talking about Postgres in general and then focusing on the Postgres Query Planner. Ref:: http://www.se-radio.net/2018/06/se-radio-episode-328-bruce-momjian-on-the-postgres-query-planner
  • 34. SE-Radio 328: Bruce Momjian - Postgres Query Planner ● I was not disappointed … this was an exceptionally good discussion & interesting broadcast, going from the base principles of Relational DBs, how SQL has become one of the most popular languages and onto how the Query Planner works ● One comment (47 min) caught my attention: "An entire team whose sole job was to study query plans … I'm not sure your going to need to do that .. and when they did switch [to postgres] those people did not have a useful role in the organization" ● I know exactly what Bruce means and if Postgres could reduce/eliminate this operational overhead, that is seriously impressively. Ref:: http://www.se-radio.net/2018/06/se-radio-episode-328-bruce-momjian-on-the-postgres-query-planner
  • 35. Execution Plan Management and Optimizer Hints ● "Q:How do I a lock a plan to … A: Why you want to do that ... you're probably not going to need do that in Postgres?" (49 mins). ● Aversion to INDEX HINTs in Postgres Core? ● Very interesting work from Japanese "NTT OSS Center DBMS Development and Support Team" ● https://github.com/ossc-db/pg_hint_plan "Give PostgreSQL ability to manually force some decisions in execution plans." This plan is already available to RDS users. ● https://github.com/ossc-db/pg_plan_advsr "PostgreSQL extension for automated execution plan tuning" (Tattsu Yama - presentation in Lisbon - pgconf.eu 2018) ● First challenge for me is just getting this to compile i.e. make >> install >> load https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2132-auto-plan-tuning-using-feedback-loop
  • 36. Setting up pg_plan_advsr - need devtools? Tattsu … the next problem is harder: [pg10centos7:postgres:~/test_install/pg_hint_plan-REL10_1_3_2] # make && make install gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_SCORE_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fPIC -I. -I. -I/usr/include/pgsql/server -I/usr/include/pgsql/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pg_hint_plan.o pg_hint_plan.c pg_hint_plan.c:54:33: fatal error: access/htup_details.h: No such file or directory #include "access/htup_details.h" ^ compilation terminated. make: *** [pg_hint_plan.o] Error 1 the .h doesn't appear to exist on my VM: [pg10centos7:root:/var/log/pgbouncer] # find / -name 'htup_details.h' [pg10centos7:root:/var/log/pgbouncer] #
  • 37. Why do I have pg10 server & pg9.2 client tooling? after checking pg_config it is not quite to clear [pg10centos7:postgres:~] # pg_config .. PKGINCLUDEDIR = /usr/include/pgsql INCLUDEDIR-SERVER = /usr/include/pgsql/server .. VERSION = PostgreSQL 9.2.24 So while I think the access issue might relate that my psql client/access version is pg9.2 and not pg10 [pg10centos7:postgres:~] # psql psql (9.2.24, server 10.6) postgres=# select version(); version ---------------------------------------------------------------------------------------------- PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit (1 row)
  • 38. Why do I have pg10 server & pg9.2 client tooling? (continued) and checking (client?) access library instead of htup_details.h I appear to have htup.h: [pg10centos7:postgres:~] # ls -ltr /usr/include/pgsql/server/access/h*.h -rw-r--r--. 1 root root 32901 Aug 23 2018 /usr/include/pgsql/server/access/htup.h -rw-r--r--. 1 root root 1280 Aug 23 2018 /usr/include/pgsql/server/access/hio.h -rw-r--r--. 1 root root 6397 Aug 23 2018 /usr/include/pgsql/server/access/heapam.h -rw-r--r--. 1 root root 13178 Aug 23 2018 /usr/include/pgsql/server/access/hash.h [pg10centos7:postgres:~] #
  • 39. Explicit pg10 version of postgresql10-libs, postgresql10-devel ... ~/projects/vagrant-postgres10v2:DaveP# git diff provision.sh diff --git a/provision.sh b/provision.sh index 08dc65d..d0f8c0e 100644 --- a/provision.sh +++ b/provision.sh @@ -23,15 +23,16 @@ then rpm -Uvh https://yum.postgresql.org/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm yum -y update yum -y install postgresql10-server postgresql10 + yum -y install postgresql10-contrib postgresql10-libs postgresql10-devel # Extra packages and devtool for install of https://github.com/ossc-db/pg_plan_advsr - yum -y install postgresql-devel + #yum -y install postgresql-devel yum -y install gcc ... ~/projects/vagrant-postgres10v2:DaveP# git commit -m 'fix pg client tools (e.g. psql and pg_config) - these were pg9.2 before and now pg10.7' ...
  • 40. Next steps ● Complete documentation of pg_hint_plan and sample use cases ● Complete automation of the pg_plan_advsr setup ● Complete pgbouncer setup (current minor issue with the authentication) ● Demo pg10 correlated statistics & pg11 functional stats ● Vagrant - Docker - Postgres demo? As suggested at the PUG Amsterdam, March 2019
  • 41. Q & A ?