0

Just after one of the power supply failed, the following error was recoreded in syslog. (the OS is ubuntu 14.04.) I am running a JVM (java virtual machine) with 64 HEAP on server with 128 RAM. do you think power failure affect on RAM allocations? or any effect on OS or running Application ?

Jul 25 14:14:37 ubuntu-132 kernel: [14872493.166347] divide error: 0000 [#1] SMP Jul 25 14:14:37 ubuntu-132 kernel: [14872493.166489] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c intel_rapl x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm ipmi_devintf irqbypass crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper input_leds mxm_wmi ablk_helper dcdbas joydev cryptd sb_edac 8250_fintek ipmi_si edac_core mei_me shpchp ipmi_msghandler mei acpi_power_meter wmi mac_hid lpc_ich nls_iso8859_1 lp parport hid_generic usbhid uas hid usb_storage tg3 ptp ahci libahci pps_core megaraid_sas fjes Jul 25 14:14:37 ubuntu-132 kernel: [14872493.168579] CPU: 30 PID: 158701 Comm: java Not tainted4.4.0-31-generic #50~14.04.1-Ubuntu Jul 25 14:14:37 ubuntu-132 kernel: [14872493.168846] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.4.2 01/09/2017 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.169097] task: ffff88025e24d280 ti: ffff88011bb40000 task.ti: ffff88011bb40000 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.169347] RIP: 0010:[] [] task_numa_find_cpu+0x238/0x700 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.169635] RSP: 0000:ffff88011bb43bb0 EFLAGS: 00010257 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.169806] RAX: 0000000000000000 RBX: ffff88011bb43c50 RCX: 0000000000000000 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.170036] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88018579c400 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.170267] RBP: ffff88011bb43c18 R08: 00000001dd9a90d7 R09: 000000000007f981 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.170503] R10: 000000000006a5cb R11: fffffffffffffd86 R12: ffff8802534444c0 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.170741] R13: 0000000000000013 R14: 00000000000002c8 R15: fffffffffffffdad Jul 25 14:14:37 ubuntu-132 kernel: [14872493.170980] FS: 00007fc16b7f7700(0000) GS:ffff88103e9c0000(0000) knlGS:0000000000000000 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.171249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.171443] CR2: 00007fc16b7f5ef8 CR3: 00000001c6f71000 CR4: 00000000003406e0 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.171680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.171919] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.172157] Stack: Jul 25 14:14:37 ubuntu-132 kernel: [14872493.172229] 0000000000002e04 00000000000002f9 000000000000030f ffff88025e24d280 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.172492] 00000000000002c9 0000000000000021 0000000000016d00 00000000000002c9 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.172755] ffff88025e24d280 000000000000008f ffff88011bb43c50 00000000000001e3 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.173017] Call Trace: Jul 25 14:14:37 ubuntu-132 kernel: [14872493.173105] [] task_numa_migrate+0x4a0/0x930 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.173300] [] ? update_curr+0x80/0x170 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.182774] [] numa_migrate_preferred+0x79/0x80 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.192289] [] task_numa_fault+0x91d/0xcc0 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.201866] [] ? mpol_misplaced+0x14e/0x190 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.211507] [] handle_pte_fault+0x5a6/0x1470 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.221002] [] ? schedule_hrtimeout_range_clock+0xb9/0x130 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.230440] [] ? schedule_hrtimeout_range_clock+0xa0/0x130 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.239693] [] handle_mm_fault+0x250/0x540 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.248859] [] __do_page_fault+0x19a/0x430 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.257910] [] do_page_fault+0x22/0x30 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.266879] [] page_fault+0x28/0x30 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.275686] Code: 4d b0 4c 89 f7 e8 29 d5 ff ff 48 8b 4d b0 49 8b 86 b0 00 00 00 31 d2 48 0f af 81 d8 01 00 00 49 8b 4e 78 4c 8b 73 78 48 83 c1 01 <48> f7 f1 48 8b 4b 20 49 89 c1 48 29 c1 4c 03 4b 48 4c 39 7d d0 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.294167] RIP [] task_numa_find_cpu+0x238/0x700 Jul 25 14:14:37 ubuntu-132 kernel: [14872493.303258] RSP Jul 25 14:14:37 ubuntu-132 kernel: [14872493.329766] ---[ end trace b138563aaca724d4 ]

2
  • You are asking for speculation, there can be no concrete answer here with the given information, but if this is a one-time fluke that occurred about the time of the power supply failure, then why would you think it is anything else besides a power fluctuation that caused it?
    – acejavelin
    Commented Jul 28, 2018 at 14:12
  • @acejavelin because it happen right after power redundant failure. my question is about affect of power supply failure on RAM allocation, OS and system performance. not just this case. thanks for responding. Commented Jul 28, 2018 at 14:20

1 Answer 1

1

Redundant power is exactly that, redundant... In any circumstance one of the two power supplies should be able to be disconnected or fail without effecting the operation of the server. In most cases, the kernel is not even aware that a power supply has failed, or if it is aware it is just to log it or notify the system administrator of an issue.

However, when a power supply fails, it is possible that it could cause a short or over/under voltage condition momentarily, which could cause almost anything in the server to not function properly. Most motherboards have protections for this built into the redundant power circuitry but they are not perfect.

To answer your specific questions, attempting to clarify your questions:

Do you think a single power supply failure in a redundant power supply server affects RAM allocations?

No, the power supply failure itself likely did not cause a failure in RAM allocations. The more likely cause is as stated above.

Would a power supply failure as above have any effect on the OS or running Application ?

In a perfect world, it should not effect the OS or any application, but we do not live in a perfect world. The more likely cause is as stated above.

1
  • 1
    it could also be that the same thing that cause one of the power supplies to fail caused the other one to glitch.
    – Jasen
    Commented Jul 28, 2018 at 23:59

Not the answer you're looking for? Browse other questions tagged .