What is kernel panic and what to do about it

Found an interessting post on the cPanel forum and though it might be of use for some, this is a sumary of the content:

What is kernel panic on a Linux server ?
Andy explains: A kernel panic is a message displayed by an operating system upon detecting an internal system error from which it cannot recover. Kernel panics often provide cryptic debugging information that is useful only to the developers of the operating system.
Attempts by the operating system to read an invalid or unpermitted memory address are a common source of kernel panics. A panic may also occur as a result of a hardware failure.

What to do about kernel panic ?

BrianOz writes:
There’s a flag you can set to restart the server automatically after a kernel panic. Without this, the box will panic and just sit there hoping you or someone else will come along and look at the console messages and restart it. This could be a long wait overnight!

To setup the panic reboot timer:

Shell command:
echo kernel.panic = 120 >> /etc/sysctl.conf

This will ensure that the system will reboot 120 seconds after a panic rather than just staying shut down.

You can change the live kernel setting on the fly by reading or writing to /proc/sys/kernel/panic. For instance, “cat /proc/sys/kernel/panic” will display 0 usually, “echo 120 > /proc/sys/kernel/panic” will modify the current value.

The above works for Centos, I haven’t tried it anywhere else, and unfortunately since setting it up a month ago I haven’t had a system crash.

Also, there’s a thing called netdump available that should be able to show why a server crashed by logging it to another server when the crash occurs. I don’t know the details of setting it up yet, my initial half-arsed attempt failed and I haven’t been able to try again. This would work well coupled with the above panic timer as you should then be able to see why the crash occurred, and then recover from it automatically.

Posted by Web Monkey