When a memory module in one of my servers fails the event log will often report the wrong DIMM slot or a non existent DIMM slot altogether. The best way we have come up with to determine the failed DIMM is by checking for which one is missing.
I have a command that produces the following output:
Location Tag: P1-DIMMA1 Size: 34359738368 bytes
Location Tag: P1-DIMMA2
Location Tag: P1-DIMMB1
Location Tag: P1-DIMMC1
Location Tag: P1-DIMMD1 Size: 34359738368 bytes
Location Tag: P1-DIMMD2
Location Tag: P1-DIMME1 Size: 34359738368 bytes
Location Tag: P1-DIMMF1
Location Tag: P2-DIMMA1 Size: 34359738368 bytes
Location Tag: P2-DIMMA2
Location Tag: P2-DIMMB1 Size: 34359738368 bytes
Location Tag: P2-DIMMC1
Location Tag: P2-DIMMD1 Size: 34359738368 bytes
Location Tag: P2-DIMMD2
Location Tag: P2-DIMME1 Size: 34359738368 bytes
Location Tag: P2-DIMMF1
In this example P1-DIMMB1 is failed (that DIMM slot is populated in P2 but not P1)
I am looking for a programmatic method to determine which DIMM slot(s) is/are empty in one cpu but not the other. I have come up with the following bash monstrocity to accomplish this but I am sure there is a more simple way to accomplish this with awk
.
cpu1_dimms=()
cpu2_dimms=()
missing=()
while read -r line; do
dimm=$(awk '{print $3}' <<<"$line")
cpu=${dimm:1:1}
size=$(awk '{print $5}' <<<"$line")
if [[ -n "$size" ]]; then
case $cpu in
1) cpu1_dimms+=( "${dimm:3}" );;
2) cpu2_dimms+=( "${dimm:3}" );;
esac
fi
done < <(echo "$var")
for dimm in "${cpu1_dimms[@]}"; do
if ! [[ "${cpu2_dimms[@]}" =~ "$dimm" ]]; then
missing+=( "P2-$dimm" )
fi
done
for dimm in "${cpu2_dimms[@]}"; do
if ! [[ "${cpu1_dimms[@]}" =~ "$dimm" ]]; then
missing+=( "P1-$dimm" )
fi
done
This assumes that the output of the aforementioned command is stored in the variable var