1

When a memory module in one of my servers fails the event log will often report the wrong DIMM slot or a non existent DIMM slot altogether. The best way we have come up with to determine the failed DIMM is by checking for which one is missing.

I have a command that produces the following output:

  Location Tag: P1-DIMMA1   Size: 34359738368 bytes
  Location Tag: P1-DIMMA2
  Location Tag: P1-DIMMB1   
  Location Tag: P1-DIMMC1
  Location Tag: P1-DIMMD1   Size: 34359738368 bytes
  Location Tag: P1-DIMMD2
  Location Tag: P1-DIMME1   Size: 34359738368 bytes
  Location Tag: P1-DIMMF1
  Location Tag: P2-DIMMA1   Size: 34359738368 bytes
  Location Tag: P2-DIMMA2
  Location Tag: P2-DIMMB1   Size: 34359738368 bytes
  Location Tag: P2-DIMMC1
  Location Tag: P2-DIMMD1   Size: 34359738368 bytes
  Location Tag: P2-DIMMD2
  Location Tag: P2-DIMME1   Size: 34359738368 bytes
  Location Tag: P2-DIMMF1

In this example P1-DIMMB1 is failed (that DIMM slot is populated in P2 but not P1)

I am looking for a programmatic method to determine which DIMM slot(s) is/are empty in one cpu but not the other. I have come up with the following bash monstrocity to accomplish this but I am sure there is a more simple way to accomplish this with awk.


cpu1_dimms=()
cpu2_dimms=()
missing=()

while read -r line; do
    dimm=$(awk '{print $3}' <<<"$line")
    cpu=${dimm:1:1}
    size=$(awk '{print $5}' <<<"$line")
    if [[ -n "$size" ]]; then
        case $cpu in
            1)  cpu1_dimms+=( "${dimm:3}" );;
            2)  cpu2_dimms+=( "${dimm:3}" );;
        esac
    fi
done < <(echo "$var")

for dimm in "${cpu1_dimms[@]}"; do
    if ! [[ "${cpu2_dimms[@]}" =~ "$dimm" ]]; then
        missing+=( "P2-$dimm" )
    fi
done
for dimm in "${cpu2_dimms[@]}"; do
    if ! [[ "${cpu1_dimms[@]}" =~ "$dimm" ]]; then
        missing+=( "P1-$dimm" )
    fi
done

This assumes that the output of the aforementioned command is stored in the variable var

1 Answer 1

3

This AWK script finds missing DIMMs using the contents given on standard input or as a file to be processed:

!/Size:/ {
    cpu = substr($3, 1, 2)
    dimm = substr($3, 4)
    missing[cpu] = missing[cpu] " " dimm
}

END {
    for (cpu in missing) {
        split(missing[cpu], dimms, " ")
        for (key in dimms) {
            for (cmpcpu in missing) {
                if (cpu != cmpcpu && missing[cmpcpu] !~ dimms[key]) {
                    print cpu "-" dimms[key]
                }
            }
        }
    }
}

It outputs the missing DIMMs to its standard output.

The script works by listing lines with no “Size”, building up a string of missing DIMMs, per CPU. It then processes each CPU, splitting the string of missing DIMMs up, and looking for each individual DIMM in the other CPUs’ list of missing DIMMs; if it fails to match (for at least one other CPU), it outputs the DIMM as missing.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .