3

We are currently developing a system that does automated BIOS updates for our computers.

The system runs in Windows 10 PE booted via PXE-Boot. One of the first steps is connecting to a SMB-Share where all the needed data is stored.

The share itself is hosted on Debian 4.9.88-1+deb9u1 with smbd Version 4.5.12-Debian.

The problem

The procedure needs multiple reboots and booting Windows PE again.

What we discovered is, that each system runs fine on first boot, but with every reboot to Windows PE, the connection to the SMB-Share takes more time and sometimes even times out with Windows error 63.

Since Windows PE is stored "new" in the RAM on each reboot and is not persistent, in our opinion the problem has to be on server-side.

The log files from Samba did not contain any errors regarding the connection of the hosts.

Mount Command in PE

NET USE B: \\SERVER\buffet \user:user password

Since Windows 10 is not allowing anonymous guest logins or shares without a password, not using login-credentials is not an option, despite the fact that I personally don't believe the problem has to do with authentication.

Samba-Config

As you can see we already tried many timeout-configurations:

[global]
    workgroup = WORKGROUP
    security = user
    server role = standalone
    disable netbios = no
    server string = %h
    map to guest = Bad User
    obey pam restrictions = Yes
    passdb backend = tdbsam
    pam password change = Yes
    passwd program = /usr/bin/passwd %u
    passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* .
    unix password sync = Yes
    log file = /var/log/samba/log.%m
    log level = 10 winbind:5 auth:3
    max log size = 1000
    dns proxy = No
    usershare allow guests = Yes
    panic action = /usr/share/samba/panic-action %d
    create mask = 0777
    directory mask = 0777
    map hidden = no
    map system = no
    map archive = no
    store dos attributes = yes
    dos filemode = Yes
    acl map full control = Yes
    server min protocol = SMB3_10
    socket options = TCP_NODELAY IPTOS_LOWDELAY
    read raw = yes
    write raw = yes
    server signing = no
    strict locking = no
    min receivefile size = 16384
    use sendfile = Yes
    aio max threads = 250
    aio read size = 1
    aio write size = 1
    ldap timeout = 3
    deadtime = 1
    winbind request timeout = 10
    name cache timeout = 0
    winbind max domain connections = 50
    winbind max clients = 750
    inherit owner = No

[buffet]
    path = /share/buffet
    public = yes
    writeable = no
    browseable = yes
    guest ok = yes
    force user = nobody
    force group = nogroup
    read only = yes
    case sensitive = no
    default case = lower
    preserve case = yes
    short preserve case = yes

How we can fix the config so the hosts can make infinite reboots and still connect to the share instantly?

Attempts to fix it

We've tried several timeout settings as you can see in the smb.conf above, but they all had no effect for our problem (conclusion is, that they cannot be the problem).

We've also pulled and analysed a ton of samba logfiles, where we tried to avoid any "failed" or "timeout" or whatever may point to an error due to misconfiguration and finally we solved them all, but nevertheless, the issue still persists.

We also tried some different DHCP server settings, like setting the default lease time to a very little value like ten seconds but again, this has no effect on our issue.

We had also tried to force Samba to use SMB2 or SMB3_11 but setting the used SMB protocol to a fixed value also wasn't able to fix the problem, that after three or four reboots of our winPE client, the SMB share will no longer connect immediately.

After a fast ping was successful, there is no high CPU / Memory load on the server, while the attempt to mount the share lasts. Also we monitored the network bandwidth using speedometer and we can only see really low loads (only few KB/s up and down) on the connection. The ping times itself do not increase during these laggy attempts to mount the share.

4
  • Have you ruled out lower-level latency issues? Can you set up some ping tests? Here in U&L, you might scare fewer people off if you lead off with "clients of the SMB-Share takes more time and sometimes even times out" instead of all the "Windows" terms. Not that we know where the problem is, yet!
    – Jeff Schaller
    Commented Sep 3, 2018 at 12:11
  • @JeffSchaller: Thank you for your recommendation, what would you recommend to clearify lower-level-latency issues? Before each try to mount the share there is a loop that pings the server, after a maximum of 5 attempts there is success pinging the server. Commented Sep 3, 2018 at 14:13
  • Do the ping times increase with the reboots? Is the Debian system becoming overloaded— cpu, memory/swapping?
    – Jeff Schaller
    Commented Sep 3, 2018 at 15:03
  • @JeffSchaller no the ping times stay the same. Also the system is not overloaded, it is on moderate load. Commented Sep 3, 2018 at 18:50

1 Answer 1

0

As a further follow up, I would like to inform, that today we moved our file transfer architecture away from samba over to HTTP, to check, if the delay comes from the service (samba) or from the host / network. We decided to use apache2 to host the files which are required for our application, and so we tried half a day to face the same issue as we had with samba, with no avail. For over four hours of testing (we did automated client reboots after successful payload download from the apache share), we didn't see a single unsuccessful connection attempt!

Due to this, we decided to reduce our samba config to a bare metal config which actually lookes like this, to prevent interference between the different settings we made:

[global]
workgroup = WORKGROUP
security = user
server role = standalone
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
map to guest = Bad Password
log file = /var/log/samba/log.%m

But even with this very limited configuration, we still see the issue that after a very limited amount of reboots the delay between a successful ping to the server (to check it's availability) and connection to the smb share starts to increase, until it finally times out and throws an "network path was not found" error.

Interesting fact in this case is, that we can repeat this behavior with any client we have available. We just boot our WinPE, let it ping the server, if it pings, connect the samba share, download some required files (only few MB) and let the client reboot. Even if we are testing parallel on multiple clients which are delayed for let's say two reboots, we can see that if one client "hangs", the other, fresh client launches immediately. Conclusion is, that the problem must be caused by a client / samba interference, or something which points into the direction of too many simultaneous connections of a particular client to the samba share.

We think, that we narrowed down the problem completely to samba, as with apache we could not cause the problem to appear for half a day. Moreover, due to the fact, that with apache everything is working as expected, we think, that a server or network miconfiguration can not be the case.

Would you think the same thing and say it must be samba?

We would appreciate any idea which may be helpful. As we have solved this particular case (the solution for us is to not use samba), we would like to know the root cause of this problem, as we have a similar issue in another factory department where we cannot avoid using samba to share files over the network. According to this, we are willing to continue debugging this issue, if anyone of you has more input for us.

To avoid any further confusion, I'd Like to inform you, that Marcel Kohlmeyer and I are colleagues, who work on the same problem simultaneously.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .