I'm sure all of us have experienced power shutdowns many times in our lives. It's just something that happens every now and then. But sometimes, the consequences are direr than usual.
This time, after power returned, nothing seemed out of the ordinary. The light turned on, the phone started charging, a few minutes later the router finished booting, and I remembered I need to connect to my server's initrd via SSH to enter the disk encryption password.
I couldn't connect to it! Why? Ah, it makes sense. udhcpc
, which is
used for DHCP in initrd, isn't a daemon, it runs as a normal program.
So, after running like 3 times and not seeing the router, it simply
stopped trying and continued booting. I will need to fix this at some
point, at least make it retry like 50 times, as my server is headless
and can't boot without an internet connection (which is needed for
entering the FDE password).
After turning the server off and on again, it booted into initrd. I
connected to it and entered the password. Then I waited. And waited. But
it didn't continue booting, it stayed in initrd. This is normal, as the
filesystem needs some times for fsck to finish. But even after some time
has passed, it didn't boot. Checking dmesg
, it printed a lot of
bch2_btree_update_start(): error journal_reclaim_would_deadlock
. That
makes sense. If it deadlocks, an indefinite delay is to be expected.
Looking on the internet, I found a Phoronix
article
that mentioned a deadlock in recovery being fixed in Linux 6.9. Well,
the server is currently on Linux 6.8.4, so what can I do? Not much.
Let's start with simple things. Let's try the userspace bcachefs-tools
utility. Maybe bcachefs fsck
can help us?
It deadlocks as well. That actually makes sense, it probably uses the
same code. What about the new version of bcachefs-tools
? The current
version is 1.7.0
, but I'm using 1.4.1
. How do I update it? It's
pretty simple - download it from the Nix binary cache, upload it to the
server - I can't use SCP because neither the new SFTP protocol nor the
old SCP protocol binaries are available in initrd, but I can use FTP
instead (tcpsvd -vE 0.0.0.0 21 ftpd -wA /path/to/ftp/dir
), then run it
via /nix/store/*-extra-utils/lib/ld-linux-aarch64.so.1 ./bcachefs
.
...is what I expected, but ld-linux-aarch64.so.1
simply segfaulted.
Even when I didn't actually run the binary and simply used --list
to
print some info about the binary, it still segfaulted. Maybe I need to
fix the interpreter in the binary to point to the actual ld-linux
path? I'd be happy to test it, however patchelf
for some reason
refuses to work on this binary. It happily returns code 0, but the
interpreter path ends up being
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
. The
resulting binary runs fine via QEMU, but I don't think it should run on
an actual ARM machine. This is getting quite ridiculous, but fine. What
other options do I have?
Let's try the options in that order.
...option 1 was a no go.
For option 2, I'm thinking of exposing the drives over the network. That can be achieved using this script.
I've immediately faced an issue - my initrd doesn't have the
target_core_mod
module! No big deal, I can upload it using FTP. If I
upload just the ko
file, modprobe
still complains about not seeing
the file, so let's just upload the entire modules
directory.
And finally, let's set iSCSI up:
modprobe configfs
modprobe target_core_mod
mount -t configfs none /sys/kernel/config
mkdir -p /sys/kernel/config/target/iscsi
for i in 0 1 2; do
DATA="/sys/kernel/config/target/core/iblock_$i/data"
mkdir -p "$DATA"
echo "udev_path=/dev/mapper/bch$i" > "$DATA/control"
echo 1 > "$DATA/enable"
DIR="/sys/kernel/config/target/iscsi/iqn.2003-01.org.linux-iscsi.server:bch$i"
mkdir "$DIR/tpgt_1"
mkdir "$DIR/tpgt_1/lun/lun_0"
ln -s "$DATA" "$DIR/tpgt_1/lun/lun_0/data"
echo 1 > "$DIR/tpgt_1/enable"
mkdir "$DIR/tpgt_1/np/0.0.0.0:3260"
echo 0 > "$DIR/tpgt_1/attrib/authentication"
echo 1 > "$DIR/tpgt_1/attrib/generate_node_acls"
echo 0 > "$DIR/tpgt_1/attrib/demo_mode_write_protect"
done
And now on my laptop (which I just updated to Linux 6.9 and bcachefs-tools 1.7), let's connect these drives:
services.openiscsi.enable = true;
services.openiscsi.name = "iqn.2020-08.org.linux-iscsi.initiatorhost:workstation";
iscsiadm --mode discovery --portal 192.168.1.6 --type sendtargets
iscsiadm -m node -L all
and finally:
bcachefs fsck /dev/sda:/dev/sdb:/dev/sdc
This works!!!
Right away I see Doing compatible version upgrade from 1.4: member_seq
to 1.7: mi_btree_bitmap
, but considering it says compatible version
upgrade
, I'm sure this is nothing to worry about, right?
...but oh god this is super slow. I'm glad it worked at all. I'll get back to you in a few hours, or tens of hours at this rate, when this finishes (or doesn't finish) running.
...This is the next day, and it's still doing something (or not doing
anything). Let's try the kernel implementation of fsck instead -
bcachefs fsck -kp /dev/sda:/dev/sdb:/dev/sdc
.
Wait, no, Ctrl+C, let's also add -v
so it's more verbose...
...Ctrl+C is sure taking a while. Makes sense, the kernel is probably doing something right now.
Okay, it's stopped. Run bcachefs fsck -kpv
/dev/sda:/dev/sdb:/dev/sdc
... recovering from clean shutdown
? Huh? So
it stopped considering the filesystem unclean? The journal seems to be
fully replayed too, it now says journal read done, replaying entries
126137108-126137108
(as opposed to journal read done, replaying
entries 126132517-126136248
it printed before, the max journal entry
number increasing after each failed fsck for some reason).
This is mildly concerning, it might mean the journal isn't replayed
properly. But you know what, there's nothing I can do about it. Let's
just boot now. echo b > /proc/sysrq-trigger
on the server to reboot it
(because that's the easiest way to shut the iSCSI target down), connect
to it via SSH again, enter the password - this time it continues
booting, but I'm not sure whether it actually works - I can't ping the
server! Let's wait for like 10 minutes, and if that doesn't work out,
hard reboot the server and try again... oh, here it is, it's pingable
now, and I've successfully connected over SSH - phew.
I'm quite worried about my Postgres DB, as Postgres is quite fragile
when faced with random DB corruption, so let's check it using
pg_dumpall
... looks like it's alright! If this is alright, I'll
consider the operation a success, even though it took three days. I'll
update to Linux 6.9 ASAP, and will hopefully never have to do this
again.
By the way, if I didn't have initrd SSH access, I wouldn't have been able to do all of this remotely, and would have to connect via serial every time my server doesn't boot for some reason. So, I'm super happy I have it, it's proven to be extremely useful multiple times already. I definitely recommend this, the ability to do full disk encryption is a nice bonus.
This may not have been possible if I used any distro other than NixOS, as NixOS allows me to easily fetch this "old" kernel version to get the modules to upload to initrd.
It wouldn't have been possible if bcachefs code was unreliable to the extent an interrupted fsck broke the filesystem.
And, I guess it would've been much harder if the Linux kernel didn't have so many features, to the point of natively supporting iSCSI.
So, all's well that ends well!
Have any comments, questions, feedback? You can click here to leave it, publicly or privately!