When working with a Linux system being loaded through PXE / tftpboot / NFS in a device without hard disk it could happen that we need to understand or fix problems at boot time process, part of it is initramfs.
The procedure to unpack an initrd.img is simple:
# create a new directory root@mainserver:/tftpboot/system# mkdir test # copy the image into it root@mainserver:/tftpboot/system# cp initrd_12.04 test/ # change to that directory root@mainserver:/tftpboot/system# cd test/ # list its content root@mainserver:/tftpboot/system/test# ls -ltr total 11136 -rw-r--r-- 1 root root 11383089 jul 5 16:24 initrd_12.04 # extract the image root@mainserver:/tftpboot/system/test# zcat initrd_12.04 | cpio --extract 47645 blocks # and list again root@mainserver:/tftpboot/system/test# ls -ltr total 11172 -rw-r--r-- 1 root root 11383089 jul 5 16:24 initrd_12.04 -rwxr-xr-x 1 root root 7237 jul 5 16:25 init drwxr-xr-x 8 root root 4096 jul 5 16:25 scripts drwxr-xr-x 2 root root 4096 jul 5 16:25 sbin drwxr-xr-x 2 root root 4096 jul 5 16:25 run drwxr-xr-x 6 root root 4096 jul 5 16:25 lib drwxr-xr-x 7 root root 4096 jul 5 16:25 etc drwxr-xr-x 3 root root 4096 jul 5 16:25 conf drwxr-xr-x 2 root root 4096 jul 5 16:25 bin
Here we’ll have access to the scripts and configuration of initramfs. I’ve modified the init script and added echo and sleep whenever was necessary to display the different stages of the execution. Also when starting Linux I’ve added the parameter «debug» to the kernel, so this «case statement» in the init script would be executed:
debug) debug=y quiet=n exec >/run/initramfs/initramfs.debug 2>&1 set -x ;;
As you can see, errors will be redirected to initramfs.debug but also I wanted to see all the executions with set -x so I’ve commented the exec part.
After changes were made it’s time to pack everything again:
root@mainserver:/tftpboot/system/test# find . 2>/dev/null | cpio --quiet --dereference -o -H newc | gzip -9 > initrd_12.04 cpio: ./etc/ld.so.conf.d/i386-linux-gnu_GL.conf: Cannot stat: No such file or directory cpio: ./etc/modprobe.d/blacklist-oss.conf: Cannot stat: No such file or directory cpio: File ./initrd_12.04 grew, 11763712 new bytes not copied
I’ve safely ignored those errors which correspond to broken symlinks. The modified initrd_12.04 was generated and it worked perfectly.
It’s important to notice that the last step of init will be executing:
exec run-init ${rootmnt} ${init} "$@" ${recovery:+--startup-event=recovery} <${rootmnt}/dev/console >${rootmnt}/dev/console 2>&1
In this case run-init which is executed from /usr/lib/klibc/bin/run-init won’t display errors so easily due the «exec», in this case we must divide the debug process in «before run-init» and «after executing run-init». If there’s not a «panic error» usually the startup scripts, certain udev rules and other processes will be executed through SystemV or Upstart.
If the system hangs at «Stopping Userspace bootsplash» and there is no text console to interact, run-init «somehow has finished» and we should look for issues on the startup processes of SystemV or Upstart located at /etc/init.d and /etc/init respectively, disabling heavy services like X, or pulseaudio is a lucky shot.
In my case I was stuck at «Stopping Userspace bootsplash» with no X server or text console to interact with the device, the system was just «stuck» there. Using other machine and disabling X (in init.d) lead me to a text console and finally taking a look on the services that used «X» I found the one causing problems, which was a personalized binary to splash an image at the boot process, after disabling it I had «X» back and everything worked.