6

A customer of mine is using a Lenovo Thinkpad T450s since something like a year. The machine is running Debian jessie, with a kernel out of jessie-backports, right now 4.5+73~bpo8+1. The latest UEFI version is flashed / installed. The operating system is installed in "UEFI mode", with an extra EFI partition etc. Up until ~ four weeks ago, this setup was rock solid and stable: Lenovo Thinkpad and Debian, what could possibly go wrong?

Since four weeks, at each boot, the machine shows the error I've put into the subject. Here is an image of this:

Error: The non-volatile system UEFI variable storage is nearly full.

Pressing Esc will continue the boot process, which worked "well"...for two more weeks. The message then changed from

Press Esc to continue or F1 to enter setup.

to

Clean up YES or NO

(sadly, I've got no image of this).

My client hit "YES", which erased the storage, as far as I see all of this currently, which rendered the machine unbootable. Afterwards, I've restored the "debian" boot entry, and the machine was happy again, booting up fine, etc. This lasted for some more days; since ~ one week, the message is popping up again at each boot.

I've tried to contact the Lenovo support four times on four different days, and gave up after spending ~ 30 minutes in the phone queue each time.

I've used all my $(your-favorite-search-engine-here)-skills these last days, and found next to nothing: Where this comes from, how to debug and, most importantly, how to fix this. As things stand currently, I guess, the machine will be unbootable again soon.

Any pointers highly appreciated!

6
  • How large is the EFI partition? What percentage of the EFI filesystem is in use? Find out with: df -hT /boot/efi
    – Deltik
    Commented May 27, 2016 at 13:40
  • @Deltik It's ~ 500mb, usage currently is around 1%. (I've checked this in advance as well, sorry for not putting it into the question. As far as I understand the error message, this isn't about /boot/EFI, but about "something different", right?) [I'm I no way something like an "UEFI-expert", so it's quite possible that I'm completely wrong on this.]
    – gxx
    Commented May 27, 2016 at 13:44
  • Ah, that would be too easy, wouldn't it? The next place to look is here: /sys/firmware/efi/efivars/. These are probably the variables that UEFI system is complaining about.
    – Deltik
    Commented May 27, 2016 at 13:51
  • @Deltik Yeah...would have been to easy. :( About /sys/firmware/efi/vars/: There is "much stuff" inside. Any idea what should I check for?
    – gxx
    Commented May 27, 2016 at 13:52
  • I'm taking a guess here, but this should sort the variables by size in ascending order: ls -lh /sys/firmware/efi/efivars | sort -k5 -h
    – Deltik
    Commented May 27, 2016 at 13:56

2 Answers 2

5

The Linux kernel since version 3.8 abstracts UEFI variable storage as efivarfs.

Mounting efivarfs

If mount | grep '^efivarfs' doesn't return anything, you can mount efivarfs using this command:

mount -t efivarfs efivarfs /sys/firmware/efi/efivars

Now, you can browse /sys/firmware/efi/efivars to see if any variables stand out.

Sorting UEFI variables by size

efivarfs doesn't have a concept of disk usage, but it does report each variable by size. This command sorts the variables by size, ascending:

ls -lh /sys/firmware/efi/efivars | sort -k5 -h

Next steps

This is as far as I can take you using the information that you've provided. Next up, you need to figure out what is taking up so much space in the UEFI variable NVRAM.

The Arch Linux wiki suggests deleting /sys/firmware/efi/efivars/dump-* files/variables if they exist, though it doesn't mention what creates those variables.

As discussed in chat, one approach would be to take a snapshot of the UEFI variables, flush them as proposed by the Lenovo firmware, reinstall Debian's EFI boot, take a snapshot again, wait for the UEFI variables to fill up again, and take one more snapshot. Then, you'll be able to compare the snapshots to see what changed and hopefully identify what is causing the problematic variable or variables to take so much space.

If all else fails, you could go back to legacy booting.

10
  • As suggested by the Arch wiki, inside /sys/firmware/efi/efivars were indeed many dump-* files. For now, I've removed these, which made the UEFI firmware happy again. I'll post a follow-up tomorrow. (Thanks for this pointer Deltik, highly appreciated, didn't found this on my own.)
    – gxx
    Commented May 30, 2016 at 21:10
  • 2
    If I'm not mistaken, the dump-* files contain kernel crash information. I don't know much about them, but I'd expect that only kernels configured for debugging should be creating them, so I recommend checking kernel compilation options (if the kernels are locally compiled) and boot options to see if something might be misconfigured to created these files unnecessarily.
    – Rod Smith
    Commented May 31, 2016 at 22:14
  • @RodSmith Thanks. The setup is pretty basic, nothing fancy: GRUB2, UEFI, Debian vanilla kernels out of jessie-backports, not self compiled. Any idea for what I should specifically?
    – gxx
    Commented Jun 2, 2016 at 15:56
  • 1
    In that case, you might want to post a bug report with Debian, since it sounds like it may be the default Debian configuration that's filling the limited storage available in NVRAM.
    – Rod Smith
    Commented Jun 2, 2016 at 20:41
  • 2
    Thanks, deleting the dump* files worked for me.
    – Boiethios
    Commented Dec 18, 2016 at 14:10
2

I've hit the same error on a Thinkpad T430 with kernel 5.13. In my case the reason was an outdated, incompatible version of acpi_call module (which is called by TLP through tpacpi-bat script), causing kernel oops, as explained here. Although the bug in acpi_call has already been fixed, most distributions still have an outdated version.

Here is a proposed fix, in case someone stumbles upon this thread while googling for a solution.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .