2

I currently have a high traffic nginx server. It seems generally one of the requests goes so slow or delayed and the other go okay.

When I run watch -d --no-title -n4 'netstat -s | egrep -i "lock|socket\ buf"', I get the following output:

    118804 packets pruned from receive queue because of socket buffer overrun
    1 ICMP packets dropped because socket was locked
    10648 delayed acks further delayed because of locked socket

In my /etc/sysctl.conf I have the following :

fs.file-max = 70000
vm.overcommit_memory=1
net.core.somaxconn=165535
net.core.wmem_default=2129920

I also tried this config, but the results are the same:

fs.file-max = 70000
vm.overcommit_memory=1
net.core.somaxconn=165535
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_fin_timeout = 10

The box is a 40 core box running ubuntu 20 with nginx 1.18, and my the relevant nginx conf looks like so:

user user;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 25000;
events {
        worker_connections 1024;
        # multi_accept on;
}

http {

        ##
        # Basic Settings
        ##
        access_log off;
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 0;
        types_hash_max_size 2048;
        # server_tokens off;

        # server_names_hash_bucket_size 64;
        # server_name_in_redirect off;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # Logging Settings
        ##
        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;
        #limit_req_zone $binary_remote_addr zone=mylimit:100m rate=10r/m;
        ##
        # Gzip Settings
        ##

        gzip on;
        gzip_disable "msie6";
        ##
        # Virtual Host Configs
        ##
        upstream backend {
            least_conn;
            server 1.2.3.4:3292 fail_timeout=0 weight=1;
            server 1.2.3.5:3292 fail_timeout=0 weight=1;
            server 1.2.3.6:3292 fail_timeout=0 weight=1;    
            server 1.2.3.7:3292 fail_timeout=0 weight=1;


        }
        server {
            listen      80;
            server_name dvr.example.com;
            location / {
                return 301 https://$server_name$request_uri;
            }
        }
        server {
            listen 443 ssl http2 default_server;
            server_name dvr.example.com;
            ssl on;
            ssl_certificate /etc/letsencrypt/live/dvr.example.com-0001/fullchain.pem; # managed by Certbot
            ssl_certificate_key /etc/letsencrypt/live/dvr.example.com-0001/privkey.pem; # managed by Certbot

            location = / {
                 return 301 https://example.com;
            }

            location / {
                #limit_req zone=mylimit burst=20;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_read_timeout 3600;
                proxy_request_buffering off;
                proxy_buffering off;
                proxy_pass http://backend;

             }

            location /nginx_status {
                 # Turn on stats
                 stub_status on;
                 access_log   off;
                 # only allow access from 192.168.1.5 #
                 #allow 192.168.1.5;
                 #deny all;
             }
        }
        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;
}

Any help would be appreciated on how to fix this. And a side note, this server generally has a throughput of 700-900 MBit/s running through it according to nload

EDIT: Thing asked for in comments

$ sudo cat /proc/net/protocols
protocol  size sockets  memory press maxhdr  slab module     cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
AF_VSOCK  1136      0      -1   NI       0   yes  vmw_vsock_vmci_transport  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
PACKET    1344      1      -1   NI       0   no   kernel      n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
PINGv6    1112      0      -1   NI       0   yes  kernel      y  y  y  n  n  y  y  n  y  y  y  y  n  y  y  y  y  y  n
RAWv6     1112      1      -1   NI       0   yes  kernel      y  y  y  n  y  y  y  n  y  y  y  y  n  y  y  y  y  n  n
UDPLITEv6 1216      0       2   NI       0   yes  kernel      y  y  y  n  y  y  y  n  y  y  y  y  n  n  n  y  y  y  n
UDPv6     1216      1       2   NI       0   yes  kernel      y  y  y  n  y  y  y  n  y  y  y  y  n  n  n  y  y  y  n
TCPv6     2160    233   77165   no     320   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
UNIX      1024    209      -1   NI       0   yes  kernel      n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n
UDP-Lite  1024      0       2   NI       0   yes  kernel      y  y  y  n  y  y  y  n  y  y  y  y  y  n  n  y  y  y  n
PING       904      0      -1   NI       0   yes  kernel      y  y  y  n  n  y  n  n  y  y  y  y  n  y  y  y  y  y  n
RAW        912      0      -1   NI       0   yes  kernel      y  y  y  n  y  y  y  n  y  y  y  y  n  y  y  y  y  n  n
UDP       1024      1       2   NI       0   yes  kernel      y  y  y  n  y  y  y  n  y  y  y  y  y  n  n  y  y  y  n
TCP       2000   1658   77168   no     320   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
NETLINK   1040     16      -1   NI       0   no   kernel      n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n

And the next

$ ip -s link show ens160
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:3f:af:19 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    9479594446622 8365661139 0       110     0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    9300082049967 4894324603 0       0       0       0   

last one

sudo netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN     
tcp        0      0 localhost.localdom:8000 0.0.0.0:*               LISTEN     
tcp        0      0 localhost.localdom:6380 0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:http            0.0.0.0:*               LISTEN     
tcp        0      0 localhost:domain        0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN     
tcp        0      0 localhost.lo:postgresql 0.0.0.0:*               LISTEN     
tcp6       0      0 [::]:3193               [::]:*                  LISTEN     
tcp6       0      0 [::]:3290               [::]:*                  LISTEN     
tcp6       0      0 [::]:3180               [::]:*                  LISTEN     
tcp6       0      0 localhost6.localdo:6380 [::]:*                  LISTEN     
tcp6       0      0 [::]:http               [::]:*                  LISTEN     
tcp6       0      0 [::]:3187               [::]:*                  LISTEN     
tcp6       0      0 [::]:3188               [::]:*                  LISTEN     
tcp6       0      0 [::]:3189               [::]:*                  LISTEN     
tcp6       0      0 [::]:3190               [::]:*                  LISTEN     
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN     
tcp6       0      0 [::]:3191               [::]:*                  LISTEN     
tcp6       0      0 [::]:3192               [::]:*                  LISTEN     
tcp6       0      0 localhost6.l:postgresql [::]:*                  LISTEN     
udp        0      0 localhost:domain        0.0.0.0:*                          
raw6       0      0 [::]:ipv6-icmp          [::]:*                  7          
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ACC ]     SEQPACKET  LISTENING     50111    /run/udev/control
unix  2      [ ACC ]     STREAM     LISTENING     12835653 /run/user/1000/systemd/private
unix  2      [ ACC ]     STREAM     LISTENING     12835657 /run/user/1000/snapd-session-agent.socket
unix  2      [ ACC ]     STREAM     LISTENING     12835658 /run/user/1000/gnupg/S.gpg-agent.browser
unix  2      [ ACC ]     STREAM     LISTENING     12835659 /run/user/1000/gnupg/S.gpg-agent
unix  2      [ ACC ]     STREAM     LISTENING     12835660 /run/user/1000/bus
unix  2      [ ACC ]     STREAM     LISTENING     12835661 /run/user/1000/gnupg/S.gpg-agent.ssh
unix  2      [ ACC ]     STREAM     LISTENING     12835662 /run/user/1000/gnupg/S.gpg-agent.extra
unix  2      [ ACC ]     STREAM     LISTENING     12835663 /run/user/1000/gnupg/S.dirmngr
unix  2      [ ACC ]     STREAM     LISTENING     51487    @irqbalance1397.sock
unix  2      [ ACC ]     STREAM     LISTENING     50082    /run/systemd/private
unix  2      [ ACC ]     STREAM     LISTENING     50099    /run/lvm/lvmetad.socket
unix  2      [ ACC ]     STREAM     LISTENING     50101    /run/systemd/journal/stdout
unix  2      [ ACC ]     STREAM     LISTENING     26678    /run/lvm/lvmpolld.socket
unix  2      [ ACC ]     STREAM     LISTENING     39064    /var/lib/lxd/unix.socket
unix  2      [ ACC ]     STREAM     LISTENING     13067    /var/run/vmware/guestServicePipe
unix  2      [ ACC ]     STREAM     LISTENING     39055    /run/snapd.socket
unix  2      [ ACC ]     STREAM     LISTENING     39057    /run/snapd-snap.socket
unix  2      [ ACC ]     STREAM     LISTENING     39060    /run/acpid.socket
unix  2      [ ACC ]     STREAM     LISTENING     39062    /run/uuidd/request
unix  2      [ ACC ]     STREAM     LISTENING     39066    /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     36990    /var/run/postgresql/.s.PGSQL.5432
unix  2      [ ACC ]     STREAM     LISTENING     40264    /var/run/supervisor.sock.1365
unix  2      [ ACC ]     STREAM     LISTENING     39059    @ISCSIADM_ABSTRACT_NAMESPACE

And the nginx status tab shows

Active connections: 947 
server accepts handled requests
 826649 826649 1261546 
Reading: 0 Writing: 640 Waiting: 352 
9
  • 1
    Please add to your post the outputs of cat /proc/net/protocols and ip -s link show {interface} and netstat -Lan. Questions: (1) If this server is in a VPS, do you know the bandwidth for the VPS and if it's shared with other VPSs, (2) Do you use FastCGI or PHP? (3) What size are a typical received message and a typical answer?
    – harrymc
    Commented Apr 30, 2023 at 16:58
  • 1
    How much RAM is available ? Any idea on how many requests max are handled in parallel? Do you have an indication on which requests are dropped (if any)? See also the answers in this post.
    – harrymc
    Commented Apr 30, 2023 at 17:31
  • 1
    You mean that 1500 users are downloading very big video files in parallel and you're handling it with only one server? You might have 40 cores, but some components are not unlimited, such the motherboard bus, RAM, disk, network adapter etc.
    – harrymc
    Commented Apr 30, 2023 at 18:36
  • 1
    You can gain some small additional efficiency with better tuning nginx, but you need perhaps an improvement of an order of magnitude. You should evolve your website to a cluster instead of a single server. A larger number of more limited (and cheaper) servers might be much more effective than one monster server. Depending on your VPS supplier, there might exist a simple method of replicating your simpler server at need according to your parameters, if this server is essentially read-only.
    – harrymc
    Commented Apr 30, 2023 at 18:54
  • 1
    The number of load balancers (which I take to mean nginx servers) cam be determined empirically.
    – harrymc
    Commented Apr 30, 2023 at 19:16

2 Answers 2

3
+100

This topic is quite interesting to me, but unfortunately I don't have any proxies that use that amount of bandwidth (therefore, can't quite test this myself). I'd be very interested in knowing if any of these helped you.

We assume that nginx server is the actual bottleneck, and that the bandwidth, any network hardware between the servers, and the backend are not problematic.

  1. Make sure you enable HTTP/2 (I see you've got this one) and TLSv1.3. You can get small improvements, but measurable, when using TLSv1.3 over TLSv1.2. Check this blog post by Netflix

  2. Use the default proxy_buffering ("on"). More info here.

  3. Access log: I imagine your disk is getting bombarded. Change the format of the access log to reduce the writes (or set it to "off").

  4. Enable keepalive to upstream server. More info here.

  5. Setting (increasing) Rx/Tx values for your network interface with ethtool - absolutely check this one. Serverfault question here. Maximum value is 4096. My default values were at 256. I could only change the value for one of my 2 interfaces. Change it like this (my interface's name in this case is enp6s0):

    ethtool -G enp6s0 rx 4096 tx 4096

  6. Kernel parameters:

  • Try setting net.ipv4.tcp_mem for total tcp memory. I believe this should be bigger than net.ipv4.tcp_wmem and net.ipv4.tcp_rmem.

  • net.core.rmem/wmem - try setting the minimum and default values as well, not just max

    net.core.rmem_default=262144

    net.core.wmem_default=262144

  • Check this post for setting the optimal rmem_max value (you can try with 262144).

Edit: forgot about transaction queue length.

  1. Transaction queue length - increase your default of 1000 to 5000 or 10000 with ifconfig ${interface} txqueuelen ${size}. Recommended for all high throughput servers. Put into /etc/rc.local to persist after reboot.

Edit2: more!

  1. Additional nginx parameters:
  • Decrease timeouts:

client_body_timeout 30

client_header_timeout 30

keepalive_timeout 30

send_timeout 30

  • Increase buffer sizes:

client_body_buffer_size 32K

client_header_buffer_size 2k

large_client_header_buffers 8 16k

  1. Compression-related points (more applicable to websites with lots of new users every day, serving static content)

9.1. gzip compression parameters:

  • Add gzip_min_length to avoid compressing small files. I set it up around 2048 (value in bytes).
  • gzip_comp_level - defaults to 6. If you're worried about CPU more than bandwidth, decrease this to level 1 or 2. You'll still get most of the decreased bandwidth benefit with much less CPU work.

9.2. brotli compression:

  • Consider adding brotli compression (either replacing gzip for brotli, or you can have both active at the same time). Not sure if decompession and compresison are actually faster than gzip, but it does compress the files better.

9.3. Static compression:

  • Absolutely criminal that people don't know of this / don't use this. You can serve pre-compressed static files instead of compressing them on the fly.

9.4. Additional point:

  • If you're interested in compressing AUDIO files, I want to mention the .opus lossy compression algorithm, used by discord for example. Opus should replace .mp3, but I don't see it happening yet ... Check it out.
6
  • @chuf I have implimented everything I could in that, and it seems to have brought it down to just this, if you have any other ideas ` 43 delayed acks further delayed because of locked socket``I will check tomorrow morning to see if more errors show up on the watch
    – nadermx
    Commented May 2, 2023 at 0:14
  • I have not been able to understand exactly how to impliment step 5 or how to calulate the amount for step 6. When I follow the link to step 5, it shows the hardware had 10x less than the queue size.
    – nadermx
    Commented May 2, 2023 at 3:56
  • That sounds great! I've also added some additional explanation for steps 5 and 6, and added some more info at the end.
    – GChuf
    Commented May 2, 2023 at 8:01
  • Okay, I implimented everything, although the files aren't on the files feeding the load balancer, but that's another thing. Anyways, I applied everything and let it sit, it seems I still get 1 ICMP packets dropped because socket was locked 157 delayed acks further delayed because of locked socket so I do think that there is a bit of hitting hardware limit here as well. So I think you and the child answer are both right.
    – nadermx
    Commented May 2, 2023 at 17:04
  • If you're talking about static compression, you can still copy them to nginx server and compress them, but that's another topic. Are you sure you have enough open file descriptors? Check how many of them are used with lsof, and pipe it to wc -l: lsof | wc -l. Otherwise, I'd look for disk/memory/cpu bottlenecks.
    – GChuf
    Commented May 2, 2023 at 17:25
0

Using a server at 100%+ of resources is in general not a good idea. You have 1500 users that are downloading very big video files in parallel and you're handling it with only one nginx server.

Although your monster server has 40 cores, some components are not unlimited, such the motherboard bus, RAM, disk, network adapter etc. These will all cause bottlenecks that will affect how reactive is your server. It seems to me that at the moment you have reached the limits of your hardware.

You can gain some small additional efficiency with better tuning nginx, but you need perhaps a much bigger improvement than that.

As the nginx server does nothing but load balancing of connections to other backend servers, you can easily evolve your website by duplicating this nginx server.

You might find that a number of more limited (and cheaper) nginx servers might be more effective than one monster server. Computer clusters on the internet tend to use multiple smaller nodes rather than a few very large ones. Some VPS suppliers will automatically increase the number of servers based on the current load.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .