< Index

Recovering from an Unclean Shutdown

I'm sure all of us have experienced power shutdowns many times in our lives. It's just something that happens every now and then. But sometimes, the consequences are direr than usual.

This time, after power returned, nothing seemed out of the ordinary. The light turned on, the phone started charging, a few minutes later the router finished booting, and I remembered I need to connect to my server's initrd via SSH to enter the disk encryption password.

I couldn't connect to it! Why? Ah, it makes sense. udhcpc, which is used for DHCP in initrd, isn't a daemon, it runs as a normal program. So, after running like 3 times and not seeing the router, it simply stopped trying and continued booting. I will need to fix this at some point, at least make it retry like 50 times, as my server is headless and can't boot without an internet connection (which is needed for entering the FDE password).

After turning the server off and on again, it booted into initrd. I connected to it and entered the password. Then I waited. And waited. But it didn't continue booting, it stayed in initrd. This is normal, as the filesystem needs some times for fsck to finish. But even after some time has passed, it didn't boot. Checking dmesg, it printed a lot of bch2_btree_update_start(): error journal_reclaim_would_deadlock. That makes sense. If it deadlocks, an indefinite delay is to be expected. Looking on the internet, I found a Phoronix article that mentioned a deadlock in recovery being fixed in Linux 6.9. Well, the server is currently on Linux 6.8.4, so what can I do? Not much.

Let's start with simple things. Let's try the userspace bcachefs-tools utility. Maybe bcachefs fsck can help us?

It deadlocks as well. That actually makes sense, it probably uses the same code. What about the new version of bcachefs-tools? The current version is 1.7.0, but I'm using 1.4.1. How do I update it? It's pretty simple - download it from the Nix binary cache, upload it to the server - I can't use SCP because neither the new SFTP protocol nor the old SCP protocol binaries are available in initrd, but I can use FTP instead (tcpsvd -vE 21 ftpd -wA /path/to/ftp/dir), then run it via /nix/store/*-extra-utils/lib/ld-linux-aarch64.so.1 ./bcachefs.

...is what I expected, but ld-linux-aarch64.so.1 simply segfaulted. Even when I didn't actually run the binary and simply used --list to print some info about the binary, it still segfaulted. Maybe I need to fix the interpreter in the binary to point to the actual ld-linux path? I'd be happy to test it, however patchelf for some reason refuses to work on this binary. It happily returns code 0, but the interpreter path ends up being XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. The resulting binary runs fine via QEMU, but I don't think it should run on an actual ARM machine. This is getting quite ridiculous, but fine. What other options do I have?

  1. Hoping it resolves itself after leaving it overnight
  2. Somehow mounting the drives on a different device with Linux 6.9 (there's a total of 3 drives and they have to be mounted simultaneously, so this is no easy task) and fscking them
  3. Uploading a new initrd+kernel to the server. This sounds doable, but considering the above could be an issue that only happens because I'm building everything on an x86 laptop, I'm not sure whether doing that is a good idea right now.
  4. Booting a Linux 6.9 system on the server via USB. There are still a few weeks until NixOS 24.05 release, so I'd have to use some other system, I'm not sure which ones use Linux 6.9. Perhaps I could use Arch Linux ARM here. Worst of all, I'd have to either partially disassemble the server to get serial access, or make the live USB start SSH on boot, and both options sound like major pain.

Let's try the options in that order.

...option 1 was a no go.

For option 2, I'm thinking of exposing the drives over the network. That can be achieved using this script.

I've immediately faced an issue - my initrd doesn't have the target_core_mod module! No big deal, I can upload it using FTP. If I upload just the ko file, modprobe still complains about not seeing the file, so let's just upload the entire modules directory.

And finally, let's set iSCSI up:

modprobe configfs
modprobe target_core_mod
mount -t configfs none /sys/kernel/config
mkdir -p /sys/kernel/config/target/iscsi

for i in 0 1 2; do
  mkdir -p "$DATA"
  echo "udev_path=/dev/mapper/bch$i" > "$DATA/control"
  echo 1 > "$DATA/enable"

  mkdir "$DIR/tpgt_1"
  mkdir "$DIR/tpgt_1/lun/lun_0"
  ln -s "$DATA" "$DIR/tpgt_1/lun/lun_0/data"
  echo 1 > "$DIR/tpgt_1/enable"
  mkdir "$DIR/tpgt_1/np/"
  echo 0 > "$DIR/tpgt_1/attrib/authentication"
  echo 1 > "$DIR/tpgt_1/attrib/generate_node_acls"
  echo 0 > "$DIR/tpgt_1/attrib/demo_mode_write_protect"

And now on my laptop (which I just updated to Linux 6.9 and bcachefs-tools 1.7), let's connect these drives:

services.openiscsi.enable = true;
services.openiscsi.name = "iqn.2020-08.org.linux-iscsi.initiatorhost:workstation";
iscsiadm --mode discovery --portal --type sendtargets
iscsiadm -m node -L all

and finally:

bcachefs fsck /dev/sda:/dev/sdb:/dev/sdc

This works!!!

Right away I see Doing compatible version upgrade from 1.4: member_seq to 1.7: mi_btree_bitmap, but considering it says compatible version upgrade, I'm sure this is nothing to worry about, right?

...but oh god this is super slow. I'm glad it worked at all. I'll get back to you in a few hours, or tens of hours at this rate, when this finishes (or doesn't finish) running.

...This is the next day, and it's still doing something (or not doing anything). Let's try the kernel implementation of fsck instead - bcachefs fsck -kp /dev/sda:/dev/sdb:/dev/sdc.

Wait, no, Ctrl+C, let's also add -v so it's more verbose...

...Ctrl+C is sure taking a while. Makes sense, the kernel is probably doing something right now.

Okay, it's stopped. Run bcachefs fsck -kpv /dev/sda:/dev/sdb:/dev/sdc... recovering from clean shutdown? Huh? So it stopped considering the filesystem unclean? The journal seems to be fully replayed too, it now says journal read done, replaying entries 126137108-126137108 (as opposed to journal read done, replaying entries 126132517-126136248 it printed before, the max journal entry number increasing after each failed fsck for some reason).

This is mildly concerning, it might mean the journal isn't replayed properly. But you know what, there's nothing I can do about it. Let's just boot now. echo b > /proc/sysrq-trigger on the server to reboot it (because that's the easiest way to shut the iSCSI target down), connect to it via SSH again, enter the password - this time it continues booting, but I'm not sure whether it actually works - I can't ping the server! Let's wait for like 10 minutes, and if that doesn't work out, hard reboot the server and try again... oh, here it is, it's pingable now, and I've successfully connected over SSH - phew.

I'm quite worried about my Postgres DB, as Postgres is quite fragile when faced with random DB corruption, so let's check it using pg_dumpall... looks like it's alright! If this is alright, I'll consider the operation a success, even though it took three days. I'll update to Linux 6.9 ASAP, and will hopefully never have to do this again.

By the way, if I didn't have initrd SSH access, I wouldn't have been able to do all of this remotely, and would have to connect via serial every time my server doesn't boot for some reason. So, I'm super happy I have it, it's proven to be extremely useful multiple times already. I definitely recommend this, the ability to do full disk encryption is a nice bonus.

This may not have been possible if I used any distro other than NixOS, as NixOS allows me to easily fetch this "old" kernel version to get the modules to upload to initrd.

It wouldn't have been possible if bcachefs code was unreliable to the extent an interrupted fsck broke the filesystem.

And, I guess it would've been much harder if the Linux kernel didn't have so many features, to the point of natively supporting iSCSI.

So, all's well that ends well!

Have any comments, questions, feedback? You can click here to leave it, publicly or privately!