I was transfering data from and disk to a new one. However the new one an seagate IronWolf 12Tb has trouble (may be too sensible to output voltage). Never the less then replace operation stops on with a message I did not take time to note.
So I had to reboot to remove Seagate disk. I perform a btrfs check on the original disk that ends with no errors, stop the server, remove de 12Tb disk and reboot...
With a boot failure as my btrfs device won't mount with :
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
So I perform (as you should did) the : dmesg | tail and get :
[ 2833.182505] BTRFS info (device sdd1): disk space caching is enabled
[ 2833.182515] BTRFS info (device sdd1): has skinny extents
[ 2833.321953] BTRFS warning (device sdd1): cannot mount because device replace operation is ongoing and
[ 2833.321962] BTRFS warning (device sdd1): tgtdev (devid 0) is missing, need to run 'btrfs dev scan'?
[ 2833.321969] BTRFS error (device sdd1): failed to init dev_replace: -5
[ 2833.339466] BTRFS: open_ctree failed
Well I agree with the situation however the "btrfs replace cancel" require a mount point. And the system refuse to mount... the dog looking ofr his tail.
usage: btrfs replace cancel <mount_point>
I've made many search and did not find any viable solution. I've search for "replace operation is ongoing" and I hopefully found page with source code of : dev-replace.c where if found this block of code :
/*
* allow 'btrfs dev replace_cancel' if src/tgt device is
* missing
*/
if (!dev_replace->srcdev &&
!btrfs_test_opt(dev_root, DEGRADED)) {
ret = -EIO;
pr_warn("btrfs: cannot mount because device replace operation is ongoing and\n" "srcdev (devid %llu) is missing, need to run 'btrfs dev scan'?\n",
(unsigned long long)src_devid);
}
if (!dev_replace->tgtdev &&
!btrfs_test_opt(dev_root, DEGRADED)) {
ret = -EIO;
pr_warn("btrfs: cannot mount because device replace operation is ongoing and\n" "tgtdev (devid %llu) is missing, need to run btrfs dev scan?\n",
(unsigned long long)BTRFS_DEV_REPLACE_DEVID);
}
Not it's a small advise that the "official" reason of the error is that the btrfs volume is degraded. Hopefully i was reading this page as the same time : Using Btrfs with Multiple Devices where I was reading :
Replacing failed devices
Using btrfs replace
When you have a device that's in the process of failing or has failed in a RAID array you should use the btrfs replace command rather than adding a new device and removing the failed one. This is a newer technique that worked for me when adding and deleting devices didn't however it may be helpful to consult the mailing list of irc channel before attempting recovery.
First list the devices in the filesystem, in this example we have one missing device that we will replace with a new drive of the same size. In the following output we see that the final device number (which is missing) is device 6:
enter code here
user@host:~$ sudo btrfs filesystem show
Label: none uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 6 FS bytes used 5.47TiB
devid 1 size 1.81TiB used 1.71TiB path /dev/sda3
devid 2 size 1.81TiB used 1.71TiB path /dev/sdb3
devid 3 size 1.82TiB used 1.72TiB path /dev/sdc1
devid 4 size 1.82TiB used 1.72TiB path /dev/sdd1
devid 5 size 2.73TiB used 2.62TiB path /dev/sde1
*** Some devices missing
This is not my exact situation as I do not have "*** Some devices missing", however it's quite close. I read the following :
If the device is present then it's easier to determine the numeric device ID required.
Before replacing the device you will need to mount the array, if you have a missing device then you will need to use the following command:
sudo mount -o degraded /dev/sda1 /mnt
Here it was the way to mount a degraded btrfs in order to cancel an interupted replace operation.