4

I got a Fedora 23 Server edition installation which I have done pretty much nothing with admin-wise besides setting up ssh, fail2ban, samba, and users.

The other day I was completely locked out of my system with the message ssh_exchange_identification: read: Connection reset by peer which with some googling led me to ssh_exchange_identification: read: Connection reset by peer

So I assumed that fail2ban had mucked up with the banning. No biggie, just log in to the machine 'physically' and reset them. Upon trying to do this I see my console gets flooded with a log message (which I can't remember, "systemd.service something something read-only something"). This flooding made it impossible to log in. I started typing my credentials, only to have it broken because a log-message would be printed and reset the credentials.

I was forced to a hard-reboot, ie. press reset-button / hold power-button on the machine.

As I want to get to the bottom of the problem and find out what the trouble was I did some log-digging. I don't seem to be able to find the cause however.

If I do journalctl --since 2016-04-10 I get the following

Apr 10 12:19:32 Server smartd[620]: Device: /dev/sdc [SAT], 47 Currently unreadable (pending) sectors
-- Reboot --
Apr 11 20:27:53 Server systemd-journal[148]: Runtime journal is using 8.0M (max allowed 81.0M, trying to leave 121.5M free of 802.5M available → current limit 81.0M).

I dropped all logs from, when I assume the trobles started, till I did the hard-reboot. (The /dev/sdc drive has been giving that error since I got it)

I also read on https://freedesktop.org/wiki/Software/systemd/Debugging/ that you should check /var/log/messages for logs.

The relevant parts here that I can see are

Mar  6 18:52:49 Server audit: USER_AUTH pid=3435 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=? acct="root" exe="/usr/sbin/sshd" hostname=183.3.202.108 addr=183.3.202.108 terminal=ssh res=failed'
Apr 11 20:28:46 Server rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting
Apr 11 20:39:15 Server rsyslogd-2177: imjournal: 91404 messages lost due to rate-limiting
Apr 11 20:39:15 Server dnf: repo: using cache for: rpmfusion-free

Which tells me that there were 91404 log messages. I can't find a single one of them however. All were dropped as far as I can tell.

Since I want to know what actually 'broke' my system, is there anywhere else that the logs might be stored? Or do I just have to pray that the stars don't align to cause the same problem again?

1 Answer 1

0

NOTE: without further evidence to confirm my suspicion, this is just hypothetical speculation.

You mentioned seeing something about 'readonly'. It's possible that your system encountered unrecoverable corruption or a hardware fault on the disk (or cable or sata interface etc) and remounted the root fs as readonly in order to prevent further damage/corruption.

The most important things you need to do ASAP are 1. make sure your backups are up-to-date, 2. test your disk(s) thoroughly (e.g. with a smartctl test), or it/they may be about to fail.

But, back to the logging issue:

If the root fs (or /var/log) is read-only then no logs will get written to disk.

If you have another system you can forward log messages to, I'd recommend configuring rsyslog to log both to local files and to a remote system. This machine can also provide the same service to the remote system.

e.g. on two of my systems (ganesh and kali), I have the following in /etc/rsyslog.d/00remote.conf:

$ModLoad imudp   # provides UDP syslog reception
$UDPServerAddress 0.0.0.0
$UDPServerRun 514 

if $fromhost-ip == '127.0.0.1' and $syslogfacility-text == 'kern' then @kali

That's the file as it is on ganesh. on kali, it's identical except that @kali is replaced with @ganesh.

This forwards all facility kern log entries that originate from localhost to the remote logging host. The IP address check prevents a remote logging loop meltdown.

There are other ways to configure remote logging with rsyslog, including tcp based logging rather than udp. This is just the first method I tried when I needed it years ago - it worked, and was simple, and I haven't had any need to change it yet. Search on this site for other methods and more details.

BTW, make sure that incoming traffic for udp port 514 is blocked at your firewall.

5
  • I did a # smartctl -t long /dev/sda, which is my system-drive and it came out clean (or, Passed and I'm currently doing the same for all drives). I don't have another machine to use, unfortunately. So currently I can't store my logs on a different machine. Other suggestions or ideas?
    – qwelyt
    Commented Apr 13, 2016 at 9:48
  • Another question arises: If the drive where the logs are stored is read-only, how could journalctl write -- REBOOT -- to the logs? I didn't enter it myself. It was just there when I looked in them. So I assume that it's written on shutdown.
    – qwelyt
    Commented Apr 13, 2016 at 9:50
  • no idea...most of my systems are sysvinit, and the only one I have that runs systemd (plus rsyslog) doesn't write REBOOT to any log file...even journalctl -a | grep REBOOT returns nothing.
    – cas
    Commented Apr 13, 2016 at 10:02
  • BTW, if /dev/sda passed its test, it's possible that /dev/sdc had some kind of error that messed up the motherboard's sata interface (or the kernel's sata/AHCI driver - I've seen that kind of thing happen with IDE and SCSI drives but IIRC not with SATA so far). Given that all the logs are missing from the time of the crash until the reboot, I still think that your rootfs was probably re-mounted read-only.
    – cas
    Commented Apr 13, 2016 at 10:06
  • I have 3 drives on my server. sd{a,b,c} and I ran the smartctl -t long on all of them. All have reported that the passed the test. This makes the sdc error message even more cryptic for me. Regarding -- Reboot -- Seems to be some journalctl-magic which is inserted at each boot perhaps? digitalocean.com/community/tutorials/…
    – qwelyt
    Commented Apr 14, 2016 at 5:11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .