Skip to main content
edited body
Source Link
ewwhite
  • 198.9k
  • 92
  • 448
  • 813

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMPTSMTP tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?

More than logging, you want to see if you can graph the system performance and understand when this is happening. See if this correlates to a specific activity.

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMPT tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?

More than logging, you want to see if you can graph the system performance and understand when this is happening. See if this correlates to a specific activity.

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMTP tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?

More than logging, you want to see if you can graph the system performance and understand when this is happening. See if this correlates to a specific activity.

added 163 characters in body
Source Link
ewwhite
  • 198.9k
  • 92
  • 448
  • 813

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMPT tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?

More than logging, you want to see if you can graph the system performance and understand when this is happening. See if this correlates to a specific activity.

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMPT tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMPT tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?

More than logging, you want to see if you can graph the system performance and understand when this is happening. See if this correlates to a specific activity.

Source Link
ewwhite
  • 198.9k
  • 92
  • 448
  • 813

Usually, it's an I/O or disk subsystem issue. Often times, this will be coupled with an extremely-high system load average. For example, the system detailed in the graph below became unresponsive (yet was pingable) when a script ran awry, locked a bunch of files and the load rose to 36... on a 4-CPU system.

enter image description here

The services that are running in RAM and do not require disk access continue to run... Thus, the network stack (ping) is up, but the other services stall when disk access is required... SSH when a key is referenced or password lookup needed. SMPT tends to shut down when load average hits 30 or so...

When the system is in this state, try a remote nmap against the server's IP to see what's up.

Your logging probably doesn't work if this is a disk or storage issue...

Can you describe the hardware setup? Is this a virtual machine? What is the storage layout?