1

I ran into a problem this morning. I have a computer that is used for machine learning and nothing else. I use python to run tensorflow to train some models that I made.

The problem is that I couldn't log into the machine over VNC this morning. Upon investigating I found out that the gnome shell had restarted and I had to login again. Unfortunately my whole session was gone. I figured that it might have been the oom killer - these machine learning programs tend to consume a lot of memory (the uptime is still over two months so it was not a complete restart). However, upon looking at the dmesg the last Out of memory message was logged two months ago.

From my machine learning session I know that the crash must have happened about one hour after midnight since this is the last time that something was written into the directory where the model is periodically saved.

I just cannot find any more logs about what has caused the crash.

When looking at /var/log/kern.log there is a rfkill: input handler enabled message at around the same time as the crash happened. As far as I know this machine doesn't even have any wireless devices. The last message before that was some apparmor stuff four days ago.

I tried to use journalctl to look at the logs for the gdm.service and, despite claimining that the logs begin at the end of July, there is an entry at around the same time when the crash happened saying that a session opened for user gdm by (uid=0).

My question is: how can I find out what really happened and what killed the process?

[edit] I just looked at the directory containing the machine learning code. I had a terminal with vim where all the python code was opened so I could quickly edit the code when I needed to. However, looking at the directory there are no swap files. Normally when a crash occurs the swap files just stay and you have to manually recover your vim session but this is not the case. Could the crash actually have been a more or less graceful restart of the display manager? If so are there any logs anywhere about that?

2
  • Are you sure about the vim swapfile? It might be placed in ~/.vim/backup/. session opened for user gdm would be the GDM restarting after the crash, IMO, so that would be exactly the time of the crash. You could check /var/log/Xorg.0.log for a possible crash report.
    – nyov
    Commented Aug 30, 2019 at 8:26
  • Oh yes, sorry - I forgot to mention that I was told to look at in /var/log/Xorg.0.logbut, unfortunately, the last entry in there just reads: [122.573] (II) Server terminated successfully (0). Closing log file. As for ~/.vim/backup/ unfortunately that folder doesn't exist and there are definitely no swap files in the working directory itself.
    – Randryn0
    Commented Aug 30, 2019 at 8:45

0

You must log in to answer this question.

Browse other questions tagged .