Xen Crashdump Analyser

A tool for analysing Xen crashes

Latest Changes:

Source:

Building:

The Xen Crashdump Analyser is written in (a subset of) C++, and has no dependencies. Grap g++ from your standard distro repositories, type make, and it should work. To run successfully, the analyser needs to know about the structure of some of Xen's internal structures, such as struct vcpu and struct domain. This is covered by the above patch which adds some new offsets using the standard system, and appends them to the hypervisor symbol table.

Environment:

The Xen Crashdump Analyser is designed to be run with access to /proc/vmcore of a crashed system. Typically, this means running in a kdump environment, but it is possible to copy /proc/vmcore off a crashed system for later analysis. XenServer uses a 64MB crash region, so the analyser has been designed from scratch with this in mind. Furthermore, in the case of a hypervisor crash, no pointers, data or pagetables left over can be trusted, leading to very defensive coding style when trying to walk Xen memory pulling out state. As an upper stress test, I have verified correct functionality when running in a 64MB crash region (including kernel and regular root filesystem), analysing a crash with 40 domains, each with 4095 vcpus. (CentOS 6.2 wouldn't boot with 4096, but I was too busy to investigate why. Also, the server ran like treacle)

Setup:

To play with the crashdump analyser, you will need a 64bit Xen, compiled with the patch from above, running a classic-xen dom0 kernel. The kexec functionality is not present in pvops currently, but should be appearing soon. Specify crashkernel=<size>@<location> on Xen's command line. Load a crash kernel in the normal mannor using kexec from the kexec-tools package. To crash the server, use echo c > /proc/sysrq-trigger or xl debug-keys c.

The analysis:

See ./xen-crashdump-analyser --help for full information, but in short

./xen-crashdump-analyser -x /path/to/xen.symtab -d /path/to/dom0.symtab -o /path/to/output/directory

This will write a set of logs to the output directory:

Example analysis:

Here is an exaple analysis which is the basis of my investigation into race conditions in the scheduler (http://lists.xen.org/archives/html/xen-devel/2013-02/msg01411.html)

Todo list:

I will freely admit that it is far from bug free, and the code is somewhat organic in places. So far, bugs get fixed and new features get added on a 'when I am not doing something more urgent' basis. In the majority of software crashes, the Xen and dom0 state are sufficient to start working on the problem. The plan is to upstream the analyser into the main Xen repository, given its close ABI links with the hypervisor. It is presented here in a hope that you will try it out and find it useful. I think the following features would be nice, and will see about implementing them (in my copious free time):

Feedback/queries/questions/suggestions welcome