0

I have a script that dumps the current load into a screen window on my server so I can glance without switching to a browser to see zabbix, etc.

The script basically just loops while ;: do awk "{print $1,$2,$3}" < /proc/loadavg; sleep 20s; done (but in a fancier output using /usr/bin/rev that makes it easy to read). (Script here https://sizone.org/m/hacks/loadlog )

While running this (on a server with a customer thrashing 18 simul tar backup jobs ..), I got this output:

0113 081430 22.04 22.68 23.82 22.38 22.73 23.82 26.80 23.73 24.12 29.77 24.65 24.42 31.43 25.33 24.65 /usr/local/bin/loadlog: line 23: /usr/bin/rev: cannot execute binary file: Exec format error 27.64 25.06 24.58 26.99 25.09 24.60

What can cause a TEMPORARY exec format error?

I also did this:

$ sudo md5sum `which rev` 1ebd9cc77b09f907767d39d5b4746c4e  /usr/bin/rev
    
$ sudo apt-get install --reinstall util-linux
Reading package lists... Done Building dependency tree... Done Reading state information... Done 0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 198 not upgraded. Need to get 0 B/1141 kB of archives. After this operation, 0 B of additional disk space will be used. (Reading database ... 50992 files and directories currently installed.) Preparing to unpack .../util-linux_2.36.1-8+deb11u1_amd64.deb ... Unpacking util-linux (2.36.1-8+deb11u1) over (2.36.1-8+deb11u1) ... Setting up util-linux (2.36.1-8+deb11u1) ... fstrim.service is a disabled or a static unit not running, not starting it. Processing triggers for mailcap (3.69) ... Processing triggers for man-db (2.9.4-2) ...
    
$ sudo md5sum `which rev` 1ebd9cc77b09f907767d39d5b4746c4e  /usr/bin/rev

So rev is the same (or md5sum is cleverly hacked to have me believe..). Notably, this is ECC ram on an HP Proliant Gen 9.

What gives? Cosmic ray (vs ECC?)? What do I do to verify things further? Nothing else has given me temporary exec format errors anywhere on the box that I've noticed.

3
  • ECC memory might also create failures. There do exist also other reasons why data in RAM is able to change - hardware failures (especially problems with power supply), software bugs, manipulation.
    – paladin
    Commented Jan 14 at 0:42
  • Yeah but Error-Correcting ram (and ECC onboard in the Cpu caches) is highly resistant to errors. If this is the cause, I should buy a lottery ticket next.
    – math
    Commented Jan 15 at 17:36
  • Memory errors are more often than you think, even for ECC memory. hectronic.se/solutions/development/technologies/memory-storage/….
    – paladin
    Commented Jan 16 at 5:06

1 Answer 1

0

A truncated /usr/bin/rev file could cause this, but I'm not sure what could cause that to happen.

Some sort of temporary damage to ld.so is another option.

3
  • good point. but was no apt-anything going on during that time. however, checking historical snapshots to see if something changed: $ md5sum /.zfs/snapshot/202401*/usr/bin/rev | cut -d \ -f1 | uniq -c 15 1ebd9cc77b09f907767d39d5b4746c4e nope.
    – math
    Commented Jan 15 at 17:39
  • also, rev wouldnt be truncated during an apt update/install, it'd be atomically replaced methinks.
    – math
    Commented Jan 17 at 0:20
  • that's a good point
    – Jasen
    Commented Jan 17 at 0:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .