Skip to main content
deleted 7 characters in body; edited title
Source Link

amd64_edac_mod.ko(Error Detection and Correction driver) not loading with the latest kernel, So ECC not enabled to check RAS

I am trying to enable ECC to confirm RAS feature in ubuntu-18.04


naveenk@naveenk-X399-AORUS-PRO:~$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               8
Model name:          AMD Ryzen Threadripper 2990WX 32-Core Processor
Stepping:            2
CPU MHz:             1715.339
...

after boot i am not able to list the edac module. amd64_edac_mod.ko in "lsmod"

Tried manually inserting the module, But its failing:

amd64_edac_mod.ko(Error Detection and Correction driver) not loading with the latest kernel, So ECC not enabled

I am trying to enable ECC in ubuntu-18.04


naveenk@naveenk-X399-AORUS-PRO:~$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               8
Model name:          AMD Ryzen Threadripper 2990WX 32-Core Processor
Stepping:            2
CPU MHz:             1715.339
...

after boot i am not able to list the edac module.

Tried inserting the module:

amd64_edac_mod.ko(Error Detection and Correction driver) not loading with the latest kernel to check RAS

I am trying to enable ECC to confirm RAS feature in ubuntu-18.04


:~$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               8
Model name:          AMD Ryzen Threadripper 2990WX 32-Core Processor
Stepping:            2
CPU MHz:             1715.339
...

after boot i am not able to list the edac module amd64_edac_mod.ko in "lsmod"

Tried manually inserting the module, But its failing:

Source Link

amd64_edac_mod.ko(Error Detection and Correction driver) not loading with the latest kernel, So ECC not enabled

I am trying to enable ECC in ubuntu-18.04

Its GIGABYTE BIOS mode, I have checked ECC option, but it is not present. and details of server is:


naveenk@naveenk-X399-AORUS-PRO:~$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           1
NUMA node(s):        4
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               8
Model name:          AMD Ryzen Threadripper 2990WX 32-Core Processor
Stepping:            2
CPU MHz:             1715.339
...

AMD64 EDAC driver for ECC checking Cloned latest kernel "5.3.0-rc1" and enabled EDAC related configs, compiled and generated debians.

I installed kernel image and headers debians in ubutnu18.04.

after boot i am not able to list the edac module.

:~$ lsmod | grep edac
edac_mce_amd           32768  0

Check the dmesg logs, I am seeing below error messages:

[   17.489578] EDAC amd64: Node 0: DRAM ECC disabled.
[   17.489580] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
                Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
                (Note that use of the override may cause unknown side effects.)
[   17.489584] EDAC amd64: Node 1: DRAM ECC disabled.
[   17.489585] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
                Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
                (Note that use of the override may cause unknown side effects.)

But There is no logs related to AMD64_EDAC Driver initialization

Again check the path whether drivers compiled or not as below and amd64_edac_mod.ko present.

:~$ ls /lib/modules/5.3.0-rc1-test/kernel/drivers/edac/
**amd64_edac_mod.ko**  **edac_mce_amd.ko**  i3200_edac.ko  i5100_edac.ko  i7300_edac.ko   i82975x_edac.ko  pnd2_edac.ko  skx_edac.ko
e752x_edac.ko      i3000_edac.ko    i5000_edac.ko  i5400_edac.ko  i7core_edac.ko  ie31200_edac.ko  sb_edac.ko    x38_edac.ko

Tried inserting the module:

/lib/modules/5.3.0-rc1-test/kernel/drivers/edac$ sudo modprobe -v amd64_edac_mod
insmod /lib/modules/5.3.0-rc1-test/kernel/drivers/edac/amd64_edac_mod.ko 
modprobe: ERROR: could not insert 'amd64_edac_mod': No such device

Because of above driver not installed, under mc -> mc0 and mc1 memory controllers are not listed

:~$ ls /sys/devices/system/edac/mc/
power  subsystem  uevent

Could please help me the reason why the driver not installed ?