3

My Debian desktop has been configured to boot with systemd-networkd, dhcpcd, pppoe, etc. services to provide internet services for several devices. My problem is that it occasionally fails to reach a complete booting. I have checked the log from journalctl. It turns out that systemd-networkd service was never executed. The booting process was just stucked at some point in this boot. Services including systemd-networkd and all of which want network.target or network-online.target were never executed. As is compared in the following logs, an incomplete boot stops before systemd-networkd is executed. I set a shell script to reboot the system when network is malfunctioning, but many other services like crontab were also not executed, so this will not solve the problem.

What may obstruct Debian to continue a boot?

I'd like to stress that systemd-networkd service did NOT fail to start, it wasn't started at all, since there was zero log from systemd-networkd, including info and error. The glitch should have happen before it was executed, and obstructed the system to continue booting.

Edit: The system has been running smoothly for weeks. The problem began with two unexpected reboots. Both of these reboots are incomplete, and there is an MCE event witnessed in the second reboot. This is said to be relevant to ucode update, but I'm not sure if it is necessary, since this error has only occurred once. The relevant log says:

Nov 03 16:57:14 server kernel: mce: [Hardware Error]: Machine check events logged
Nov 03 16:57:14 server kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: baa000000000080b
Nov 03 16:57:14 server kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 5d000000 IPID 1002e00000500
Nov 03 16:57:14 server kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1667465832 SOCKET 0 APIC 0 microcode 8701021

I'm sure the reboots were not caused by power failure. The reason remains to be found out. I also wonder if the kernel is able to recognise an incomplete boot and automatically reset. A software watchdog should be able to help getting out of a crash and a crontab task that checks network can ensure connection. Combining these two should do what I want, however it didn't. I don't understand the situation when a boot is stuck somewhere. If there is a crash, the watchdog should raise a reset. If not, other units that are unbounded with dependencies should continue to initialize, who actually did not. This is really confusing... end edit

I've tried to analyse the log, yet nothing useful found. I wish to find out the reason for this occasional failure and its solution. Any idea will be appreciated, thanks everyone in advance!

My platform

Linux 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux

This is the log of an incomplete boot

This is the log of a normal boot

Edit2:

The problem has become catastrophic, because I'm losing the last possible chance to config the computer through LAN ssh. In this struggle I have confirmed that none of the network interfaces are brought up, since ping works on none of them. This could be evidence of my theory that networkd has never been executed in the booting process. When possible, I'll try to acquire more information.

6
  • Couldn't write '1' to 'net/ipv6/conf/ppp0/forwarding', ignoring: No such file or directory seems relevant. there are quite a few more. Commented Nov 4, 2022 at 8:05
  • @mashuptwice Since this error occurs in both incomplete and normal boot, it might not be the root cause. Thank you for responding anyway!
    – cby120
    Commented Nov 4, 2022 at 11:09
  • Is there any output from systemctl list-jobs during an incomplete boot? How did you obtain the log of the incomplete boot and did it really stop at that point? What status does netfilter-persistent.service report? Commented Nov 4, 2022 at 12:15
  • @user1686 I have only remote access through ssh. The journal is obtained after manual reset, by journalctl, so I can not be completely sure that it really stopped at this point. I will try to get some live log in erroneous situation. About netfilter, it didn't report any error in system journal, and its function is to import iptables from file, which I guess should not block the entire system to continue booting.
    – cby120
    Commented Nov 4, 2022 at 12:31
  • 1
    @cby120 it might be a clue for a misconfiguration anyways. It is worth checking the cause. Commented Nov 5, 2022 at 11:17

1 Answer 1

1

It turns out to be a hardware fault, probably the RAM slot. When booting, it can not even start a self check or enter the BIOS. After switching the motherboard it becomes normal. Thanks to everyone that helped.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .