How to debug your initramfs init

Posted on February 18, 2024 by Linus Heckemann

So you find yourself in an awkward situation. You want to debug the startup of the PID1 of your initramfs. My first piece of advice is not to get into this situation. But should you find yourself there anyway, here’s how Julien Malka, Ryan Lahfa, and I dealt with it after making some questionable life choices at Ocean Sprint

See also our talk at FOSDEM where Ryan and Julien provide some more context and talk about the awesome NixOS test framework.

The problem

The systemd which we were using as the PID1 of our initramfs was crashing early during its startup. This presented two challenges:

  • Firstly, because it was PID1, the crash caused a kernel panic, which meant we couldn’t run any more code in user space to debug it;
  • Secondly, because the crash was early during boot, we didn’t have any persistent filesystems available in which we could store a core dump for later analysis.

We decided to try debugging the running process, so we could catch the error before it killed the process. The tool we would usually use for debugging running processes is gdb, but there were several reasons why this was not suitable:

  • gdb is fairly large, weighing in at 40M itself and bringing with it 126M of transitive dependencies through Python, which we don’t want to pull into our initrd;
  • At this early stage of boot, we don’t have facilities for advanced terminal capabilities, which would make using gdb directly somewhat uncomfortable;
  • We may want to use a graphical frontend to gdb, which won’t run in the minimal environment available early during boot.

Thankfully, there’s a component of gdb that helps with this!

gdbserver

gdbserver is a component of gdb that allows exposing debugging of a process via the network or a serial interface.

It has two modes: one where it launches the to-be-debugged child process and waits for a client to connect:

$ gdbserver localhost:3333 sleep infinity
Process hello created; pid = 7780
Listening on port 3333

(meanwhile, in another terminal)

(gdb) target remote localhost:3333
Remote debugging using localhost:3333
Reading /nix/store/w8vm09hri2zz7yacryzzzxvsapik4ps4-coreutils-9.1/bin/coreutils from remote target...
[...]
(gdb) bt
#0  0x00007f2c0715ed24 in pause ()
   from target:/nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libc.so.6
#1  0x00000000004d96c5 in xnanosleep ()
#2  0x0000000000486858 in single_binary_main_sleep ()
#3  0x00000000004094ef in launch_program ()
#4  0x000000000040882b in main ()
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f2c0715ed24 in pause ()
   from target:/nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libc.so.6
(gdb)

And another mode where it attaches to an already running process.

$ sleep infinity &
[1] 3945492

$ gdbserver --attach localhost:3333 3945492
Attached; pid = 3945492
Listening on port 3333

and attach:

(gdb) target remote localhost:3333
Remote debugging using localhost:3333
Reading /nix/store/jmy11m3c935yyvs4njz3s52p9azgvg6f-coreutils-full-9.3/bin/coreutils from remote target...
[...]
0x00007f5b562fd6c4 in pause () from target:/nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libc.so.6
(gdb) bt
#0  0x00007f5b562fd6c4 in pause () from target:/nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libc.so.6
#1  0x00000000004d3225 in xnanosleep ()
#2  0x0000000000488ab8 in single_binary_main_sleep ()
#3  0x000000000040a6cf in launch_program ()
#4  0x0000000000409a0b in main ()

Since PID1 is the root of the process tree and can’t be the child of another process, we need this second mode. But where do we even launch the gdbserver process from? And as we can see in the backtrace, attaching to the running process gives us a state where the process has already done some work; we don’t control when exactly we attach, and we might miss the point in execution that we want to debug!

It turns out we can solve both problems in one go: if we write our own script that replaces the “real” init (whatever that may be), we can have that script launch gdbserver and attach to itself, then replace itself with the real init using exec. This would look something like this:

  1. Launch gdbserver, attaching to the current process:

    gdbserver --attach localhost:3333 $$ &

    $$ is the PID of the shell running the script; the & at the end ensures that the gdbserver process runs in the background, so the script’s execution continues without waiting for gdbserver to exit.

  2. Give the debugger some time to start up and attach, so that we don’t miss the beginning of what the real init does.

    sleep 1
  3. Once the script’s execution has resumed, replace the shell with the real init, e.g. systemd:

    exec systemd

However, since the script is the first userspace process to run, we have an extremely barebones environment – the filesystem contains nothing but the unpacked initramfs; only the drivers built into the kernel image are loaded. /dev, which contains files representing devices (including serial ports) and /proc, which exposes process information as files and is needed by gdbserver, aren’t set up yet. Additionally, in order for gdbserver to talk to the world outside the machine, we’ll need to set up a serial console or network interface.

Serial

Code for this approach is in the accompanying repo.

Let’s start with a serial console, since it’s easier and sufficient for many use cases.

Drivers for serial consoles are often built into the kernel image to allow logging as soon as possible, so we don’t need to take any extra steps there. However, the files representing the serial consoles (device nodes) do not exist when our script starts, so we need to create these ourselves. The easiest way to do this is using devtmpfs, a virtual filesystem provided by the kernel where device nodes are precreated automatically.

#!/bin/sh
export PATH=/bin
mkdir -p /dev /proc
mount -t devtmpfs devtmpfs /dev
mount -t proc proc /proc
gdbserver --attach /dev/ttyS1 $$ &
sleep 1
exec /lib/systemd/systemd

We can put together an initramfs with nixpkgs facilities (see repository), and test it in a QEMU VM. In order to talk to the VM’s serial console from outside, we have QEMU attach it to a Unix socket on the host by passing -chardev socket,id=debugsock,path=./debug.sock,server=on -serial chardev:debugsock.

QEMU will wait for a client to connect to the socket before starting the VM, so we need to attach gdb:

(gdb) target remote ./debug.sock
Remote debugging using ./debug.sock
Reading /nix/store/2ishn1q3c9qk9p5ax4j2y4rk0yqh09gj-busybox-1.36.1/bin/busybox from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
[...]
0x00007fe63bbc9ab7 in clock_nanosleep () from target:/nix/store/9y8pmvk8gdwwznmkzxa6pwyah52xy3nk-glibc-2.38-27/lib/libc.so.6
(gdb)

If you’re following along, you’ll notice that the warning is appropriate; it takes several seconds to load all the files. Since all the relevant pieces live in the Nix store – at the same paths as on the host where we built the initrd – we can load everything directly from our own filesystem rather than pushing everything through the serial console by using set sysroot / before attaching to the target. If you want to apply this technique on a different system, you should ensure that the initrd build directory is kept around, and set the sysroot to the path where the initrd files were collected before being packed into an initrd archive.

Now we’re attached to the shell, which is waiting for the child process sleep to exit. After that, it will use the execve system call to replace itself with systemd. We can have gdb resume execution until execve is called using a catchpoint:

(gdb) catch syscall execve
Catchpoint 1 (syscall 'execve' [59])
(gdb) c
Continuing.

Catchpoint 1 (call to syscall execve), 0x00007f876e4e7e0b in ?? ()
(gdb) si
process 1 is executing new program: /nix/store/iidxwcyp8pqhrq3iji17shs4m6gin0kv-systemd-254.6/lib/systemd/systemd

Catchpoint 1 (returned from syscall execve), 0x00007fa5b091ff20 in _start ()
   from /nix/store/9y8pmvk8gdwwznmkzxa6pwyah52xy3nk-glibc-2.38-27/lib/ld-linux-x86-64.so.2
(gdb)

And now we’re attached to the systemd process and can set breakpoints and control execution and inspect memory and variables (to the extent that debugging symbols are available) to work out what’s going wrong. Since the debugging client is running outside the VM and initramfs, we can also run fancy graphical debugger clients or editor/IDE-integrated debugging tools.

Conclusion

Debugging PID1 in the initrd is a little tricky and requires some extra contortions, but ends up being perfectly manageable thanks to gdbserver and its ability to expose a debugging target via a serial console. The same technique can be applied via the network, which can be useful for hardware that doesn’t expose serial consoles conveniently, which becomes relevant if a problem cannot be reproduced in a virtual machine as we’ve been able to use here, though this requires some further setup steps to get the network interface running before the debug server can be talked to. I might cover this in a future blog post!