I currently have a high traffic nginx server. It seems generally one of the requests goes so slow or delayed and the other go okay.
When I run watch -d --no-title -n4 'netstat -s | egrep -i "lock|socket\ buf"'
, I get the following output:
118804 packets pruned from receive queue because of socket buffer overrun
1 ICMP packets dropped because socket was locked
10648 delayed acks further delayed because of locked socket
In my /etc/sysctl.conf
I have the following :
fs.file-max = 70000
vm.overcommit_memory=1
net.core.somaxconn=165535
net.core.wmem_default=2129920
I also tried this config, but the results are the same:
fs.file-max = 70000
vm.overcommit_memory=1
net.core.somaxconn=165535
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_fin_timeout = 10
The box is a 40 core box running ubuntu 20 with nginx 1.18, and my the relevant nginx conf looks like so:
user user;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 25000;
events {
worker_connections 1024;
# multi_accept on;
}
http {
##
# Basic Settings
##
access_log off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 0;
types_hash_max_size 2048;
# server_tokens off;
# server_names_hash_bucket_size 64;
# server_name_in_redirect off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
##
# Logging Settings
##
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
#limit_req_zone $binary_remote_addr zone=mylimit:100m rate=10r/m;
##
# Gzip Settings
##
gzip on;
gzip_disable "msie6";
##
# Virtual Host Configs
##
upstream backend {
least_conn;
server 1.2.3.4:3292 fail_timeout=0 weight=1;
server 1.2.3.5:3292 fail_timeout=0 weight=1;
server 1.2.3.6:3292 fail_timeout=0 weight=1;
server 1.2.3.7:3292 fail_timeout=0 weight=1;
}
server {
listen 80;
server_name dvr.example.com;
location / {
return 301 https://$server_name$request_uri;
}
}
server {
listen 443 ssl http2 default_server;
server_name dvr.example.com;
ssl on;
ssl_certificate /etc/letsencrypt/live/dvr.example.com-0001/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/dvr.example.com-0001/privkey.pem; # managed by Certbot
location = / {
return 301 https://example.com;
}
location / {
#limit_req zone=mylimit burst=20;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 3600;
proxy_request_buffering off;
proxy_buffering off;
proxy_pass http://backend;
}
location /nginx_status {
# Turn on stats
stub_status on;
access_log off;
# only allow access from 192.168.1.5 #
#allow 192.168.1.5;
#deny all;
}
}
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Any help would be appreciated on how to fix this. And a side note, this server generally has a throughput of 700-900 MBit/s running through it according to nload
EDIT: Thing asked for in comments
$ sudo cat /proc/net/protocols
protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
AF_VSOCK 1136 0 -1 NI 0 yes vmw_vsock_vmci_transport n n n n n n n n n n n n n n n n n n n
PACKET 1344 1 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PINGv6 1112 0 -1 NI 0 yes kernel y y y n n y y n y y y y n y y y y y n
RAWv6 1112 1 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n
UDPLITEv6 1216 0 2 NI 0 yes kernel y y y n y y y n y y y y n n n y y y n
UDPv6 1216 1 2 NI 0 yes kernel y y y n y y y n y y y y n n n y y y n
TCPv6 2160 233 77165 no 320 yes kernel y y y y y y y y y y y y y n y y y y y
UNIX 1024 209 -1 NI 0 yes kernel n n n n n n n n n n n n n n n n n n n
UDP-Lite 1024 0 2 NI 0 yes kernel y y y n y y y n y y y y y n n y y y n
PING 904 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n
RAW 912 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n
UDP 1024 1 2 NI 0 yes kernel y y y n y y y n y y y y y n n y y y n
TCP 2000 1658 77168 no 320 yes kernel y y y y y y y y y y y y y n y y y y y
NETLINK 1040 16 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
And the next
$ ip -s link show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:3f:af:19 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
9479594446622 8365661139 0 110 0 0
TX: bytes packets errors dropped carrier collsns
9300082049967 4894324603 0 0 0 0
last one
sudo netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN
tcp 0 0 localhost.localdom:8000 0.0.0.0:* LISTEN
tcp 0 0 localhost.localdom:6380 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN
tcp 0 0 localhost:domain 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:ssh 0.0.0.0:* LISTEN
tcp 0 0 localhost.lo:postgresql 0.0.0.0:* LISTEN
tcp6 0 0 [::]:3193 [::]:* LISTEN
tcp6 0 0 [::]:3290 [::]:* LISTEN
tcp6 0 0 [::]:3180 [::]:* LISTEN
tcp6 0 0 localhost6.localdo:6380 [::]:* LISTEN
tcp6 0 0 [::]:http [::]:* LISTEN
tcp6 0 0 [::]:3187 [::]:* LISTEN
tcp6 0 0 [::]:3188 [::]:* LISTEN
tcp6 0 0 [::]:3189 [::]:* LISTEN
tcp6 0 0 [::]:3190 [::]:* LISTEN
tcp6 0 0 [::]:ssh [::]:* LISTEN
tcp6 0 0 [::]:3191 [::]:* LISTEN
tcp6 0 0 [::]:3192 [::]:* LISTEN
tcp6 0 0 localhost6.l:postgresql [::]:* LISTEN
udp 0 0 localhost:domain 0.0.0.0:*
raw6 0 0 [::]:ipv6-icmp [::]:* 7
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] SEQPACKET LISTENING 50111 /run/udev/control
unix 2 [ ACC ] STREAM LISTENING 12835653 /run/user/1000/systemd/private
unix 2 [ ACC ] STREAM LISTENING 12835657 /run/user/1000/snapd-session-agent.socket
unix 2 [ ACC ] STREAM LISTENING 12835658 /run/user/1000/gnupg/S.gpg-agent.browser
unix 2 [ ACC ] STREAM LISTENING 12835659 /run/user/1000/gnupg/S.gpg-agent
unix 2 [ ACC ] STREAM LISTENING 12835660 /run/user/1000/bus
unix 2 [ ACC ] STREAM LISTENING 12835661 /run/user/1000/gnupg/S.gpg-agent.ssh
unix 2 [ ACC ] STREAM LISTENING 12835662 /run/user/1000/gnupg/S.gpg-agent.extra
unix 2 [ ACC ] STREAM LISTENING 12835663 /run/user/1000/gnupg/S.dirmngr
unix 2 [ ACC ] STREAM LISTENING 51487 @irqbalance1397.sock
unix 2 [ ACC ] STREAM LISTENING 50082 /run/systemd/private
unix 2 [ ACC ] STREAM LISTENING 50099 /run/lvm/lvmetad.socket
unix 2 [ ACC ] STREAM LISTENING 50101 /run/systemd/journal/stdout
unix 2 [ ACC ] STREAM LISTENING 26678 /run/lvm/lvmpolld.socket
unix 2 [ ACC ] STREAM LISTENING 39064 /var/lib/lxd/unix.socket
unix 2 [ ACC ] STREAM LISTENING 13067 /var/run/vmware/guestServicePipe
unix 2 [ ACC ] STREAM LISTENING 39055 /run/snapd.socket
unix 2 [ ACC ] STREAM LISTENING 39057 /run/snapd-snap.socket
unix 2 [ ACC ] STREAM LISTENING 39060 /run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 39062 /run/uuidd/request
unix 2 [ ACC ] STREAM LISTENING 39066 /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 36990 /var/run/postgresql/.s.PGSQL.5432
unix 2 [ ACC ] STREAM LISTENING 40264 /var/run/supervisor.sock.1365
unix 2 [ ACC ] STREAM LISTENING 39059 @ISCSIADM_ABSTRACT_NAMESPACE
And the nginx status tab shows
Active connections: 947
server accepts handled requests
826649 826649 1261546
Reading: 0 Writing: 640 Waiting: 352
cat /proc/net/protocols
andip -s link show {interface}
andnetstat -Lan
. Questions: (1) If this server is in a VPS, do you know the bandwidth for the VPS and if it's shared with other VPSs, (2) Do you use FastCGI or PHP? (3) What size are a typical received message and a typical answer?