docker build is very slow here in a - what I would call - standard setup environment. Environment is like this:

  • Virtual Machine
  • docker CE 18.09.0, git commit 4d60db4 running on a linux amd64 (Ubuntu 16.04.5 LTS)
  • Backing file system is ext4
  • Storage driver is overlay2
  • The machine has 4 GB of RAM, 2 vCPUs (Xeon E7-8880 v4)

What I do is I run the following Dockerfile

FROM ubuntu:18.04

ARG uid=1000
ARG gid=1000
RUN echo "${gid} ${uid}"

# Some further ENV stuff

RUN DEBIAN_FRONTEND=noninteractive && \
   apt-get update && apt-get -y install -y tzdata && \
   ln -fs /usr/share/zoneinfo/UTC /etc/localtime && \
   dpkg-reconfigure --frontend noninteractive tzdata && \
   apt-get -y install -y apt-transport-https ca-certificates \
     wget curl unzip vim git less mc jed \
     python python-openstackclient  && \
   apt-get -q autoremove && \
   apt-get -q clean -y && \
   rm -f /var/lib/apt/lists/*

# There are further RUN commands below, but they don't matter

I then build the docker image by calling

$ docker build -t test:latest .

It is obvious that installing python (and other stuff) will create a large number of small files on the build container/image. Yet, what happens is the following:

  • The FROM/ARGs and the first RUN command goes through like nothing (as expected)
  • Hitting the second RUN command, it is expected that this will take a little, as we install quite a large bunch of applications via the package manager. apt and all the tools roughly require two minutes to complete (a runtime I would be totally okay with).

After having written the last line to the console

0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded

the system takes ages to complete. I already kept it running for 4+ hours, but the next layer (the next RUN) command had not being reached, yet.

In that state, checking the system load, you see that the entire system is idling (loadavg < 0.1). There is more than 1GB of free RAM. Once in a very while you see dockerd briefly doing something, but with <2% CPU load. Moreover, checking IO load (using iotop) I see both Total DISK READ and Total DISK WRITE to be around 30 K/s (which is nothing). The underlying disk beneath the backing file system is known to be SSD (thus, we can safely assume that I/O throughput of 80+ MB/s should be possible). Also, if I manually test-read and test-wrote to /var/lib/docker (where the backing file system is mounted), I get write speeds of 200+ MB/s. The backing file system still has more than 10 GB of free space.

Also checking ps auxw | grep docker I see that docker build is in a blocked state and the sub-process

docker-untar /var/lib/docker/overlay2/0475df....07/diff

is running, but in state "stopped". In sum after 2+ hrs of wall-clock time, this process has consumed less than 2 CPU-seconds so far. Looking into the diff dir mentioned by docker-untar above, making several snapshots of du -sh ., I can see that the data volume in that directory increases very slowly (rate ~1 MB / 3 minutes, that's roughly 5,5 kb/s - even ancient floppy disks were faster than this!). At the time of writing, there were 103 MB in it. A find . | wc -l resulted in the value of 5157, equally only going up very slowly (162 files had been written during those 3 minutes it took to generate 1 MB of data).

Running containers on the same machine and reading/writing to the container's attached image is without problem.

I also have access to a very similar virtual machine (with even more CPU/RAM). However, also there, the same picture can be observed.

Has anyone an idea what could be misconfigured here, such that docker is not using the free CPU&I/O capacity of the machine?

  If you manually run each command. Does it behave the same?
    – Ramhound
    Commented Jun 1, 2019 at 14:26
  Yes, it does. Apparently, it only gets stuck on committing the layer after all command parts (of the second RUN) have been completed. I also can adjust the Dockerfile (e.g. adding/removing packages), but ending up with the same problem. Only if I make the RUN command trivial, then also the commit of the layer is fast. Commented Jun 1, 2019 at 14:29
  Have you considered just reporting the issue as a bug?
    – Ramhound
    Commented Jun 1, 2019 at 14:32
  No, not yet (considered raising a bug) Commented Jun 1, 2019 at 14:34
  • 1
    FWIW, runs in about 3mns on my (native) Linux machine (after fixing the final rm that fails because there are directories...)(Ubuntu 16.04, docker 18.06.0-ce). Generated layer is 380MB.
    – xenoid
    Commented Jun 1, 2019 at 21:03

Just for the sake of completeness here: After endless hours of debugging, it turned out that the McAfee virus scanner used on the server had a resource leak, which made it slow down in general. I did not notice that in first place, as there were no other I/O-intensive operations running on the server. The workaround I applied was to "cron" a restart of the virus scanner every night. Since then this issue never surfaced again.

