0

For the last fourteen years I have been running Linux from USB pen drives, using an install of Debian Live running in persistent mode. I am irrationally in love with running Linux this way.

I recently purchased a 64GB HP pen drive and set it up the way I normally have been setting up such systems. Here are the details of the system and the kernel, as reported by uname -a:

Linux debian 6.3.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.3.7-1 (2023-06-12) x86_64 GNU/Linux

The problem is that everything will be chugging along just fine - often for hours - and then suddenly my home directory and storage will change to "read only." Checking with dmesg reveals a string of errors just before the failure, of which the most relevant seems to be repeated errors saying tag#0 device offline or changed.

I don't understand why this is happening. Most of the similar posts on kernel mailing lists and elsewhere haven't received a reply - such as this - but the few that have, such as this indicate this may be a hardware error. It is unlikely to be an error in the laptop's USB reader, since the same system has been running with a Sandisk pen drive for over a year, and I never experienced these problems. Before I just go out and buy a new pen drive, though, I wanted to ask if this might be a kernel related error, or some other error I am not aware of?

Edit: I have added some logs of error messages in these two Pastebins: here - about twenty minutes before the file system is reset - and then here. The tag#0 messages actually seem to be about twenty minutes before the failure, apologies if they are not actually relevant.

Rebooting restores everything to normal.

Edit 2: Adding the output of lsusb -tv, which is confusing, as my drive is a 64GB one, not a 5000 MB one...

/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 2: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        ID 03f0:2003 HP, Inc 
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 480M
    ID 1d6b:0002 Linux Foundation 2.0 root hub
    |__ Port 5: Dev 2, If 0, Class=Video, Driver=uvcvideo, 480M
        ID 5986:9102 Bison Electronics Inc. 
    |__ Port 5: Dev 2, If 1, Class=Video, Driver=uvcvideo, 480M
        ID 5986:9102 Bison Electronics Inc. 
    |__ Port 6: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
        ID 0bda:c824 Realtek Semiconductor Corp. 
    |__ Port 6: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
        ID 0bda:c824 Realtek Semiconductor Corp. 
9
  • And after reboot all's fine again? Commented Jul 3, 2023 at 13:38
  • 1
    What does the entire error message look like? Commented Jul 3, 2023 at 13:56
  • @JoepvanSteen yes!
    – ShankarG
    Commented Jul 3, 2023 at 16:06
  • 1
    My first suspect would be the pendrive itself. Commented Jul 3, 2023 at 17:40
  • 1
    Lines 1-7 of the first pastebin are the relevant part here. (Most of the first pastebin starting with the I/O error at line 8, and everything from the second pastebin, is just fallout of the storage device being already gone at that point.) Was there anything happening right above the USB 'reset' message? Out of curiosity, what devices are those according to lsusb -tv? ("1-3" corresponds to "Bus 01 Port 3", and so on.) It looks almost like the entire USB host controller is resetting and the USB pendrive is reporting its own failures, although it's not clear which one really failed first. Commented Jul 3, 2023 at 17:48

1 Answer 1

1

It seems that the USB drive stops responding as it finds itself unable to handle the heavy amount of writes, and it stops responding in a way that causes your whole USB controller to stop responding as well.

Overall it's really not uncommon even for USB 3.x flash drives to have really poor write performance when overwhelmed. AFAIK they just have some amount of fast "cache" to make file copies look fast, much like QLC SSDs, but they're designed for large linear writes (i.e. drop a few large files into the USB drive), not for many random writes like what Debian is doing.

For a full "desktop" Linux installation I would recommend an actual SSD connected via USB; there are enclosures for M.2 modules (either SATA or NVMe) that aren't much bigger than a typical USB stick.

As for your logs:

  • Jul 04 10:46:20.762006 debian kernel: usb usb1: root hub lost power or was reset
    Jul 04 10:46:20.762366 debian kernel: usb usb2: root hub lost power or was reset
    

    That's the kernel saying that your USB host controller has crashed.

  • Jul 04 11:09:01.974689 debian kernel: usb 1-5: reset high-speed USB device number 2 using xhci_hcd
    Jul 03 17:56:33.944054 debian kernel: usb 1-6: reset full-speed USB device number 5 using xhci_hcd
    Jul 03 17:56:33.944317 debian kernel: usb 1-3: reset high-speed USB device number 2 using xhci_hcd
    Jul 03 17:56:33.944576 debian kernel: usb 2-2: reset SuperSpeed USB device number 2 using xhci_hcd
    

    That's the kernel saying it has lost connection to everything that was connected to the USB host controller in question (including your USB stick and various internal USB devices – 1-5 is your webcam, 1-6 is Bluetooth, 1-3 is your external USB hub with Logitech mouse and keyboard, 2-2 is your USB stick.)

  • Jul 03 17:56:33.944817 debian kernel: sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
    Jul 03 17:56:33.945075 debian kernel: sd 1:0:0:0: [sdb] tag#0 Sense Key : Unit Attention [current] 
    Jul 03 17:56:33.945343 debian kernel: sd 1:0:0:0: [sdb] tag#0 Add. Sense: Not ready to ready change, medium may have changed
    Jul 03 17:56:33.945580 debian kernel: sd 1:0:0:0: [sdb] tag#0 CDB: Write(10) 2a 00 00 47 3c d0 00 00 08 00 
    

    If I'm reading this correctly, that's the USB flash drive itself reporting a failed SCSI "Write(10)" command; interestingly though, not because of the previous USB resets, but because of an internal error – the USB drive reports that its storage medium has been removed. (That's the error code that e.g. a CD-ROM drive would use if there was no CD to read/write to. Although USB flash drives don't have removable media, it's still fairly common for them to report "no media" in situations when their internal flash storage goes bad.)

Everything that follows is just fallout of the sdb device being already gone at that point – resetting the USB storage device 2-2 causes all queued SCSI commands to be cancelled, the underlying SCSI blockdev failures cause all queued Ext4 I/Os to fail, the I/O failures cause Ext4 to mark the filesystem read-only, and the filesystem being read-only makes systemd-journald complain.

It is unlikely to be an error in the laptop's USB reader, since the same system has been running with a Sandisk pen drive for over a year

There's no "USB reader" here. There's a USB host controller that handles the underlying connection, but all of the "drive" or "reader" parts are inside the flash drive itself.

Which is what makes the "root hub" resets even stranger, at first glance. But, to be honest, it is actually not unusual at all for software and firmware to have latent bugs that might not show up for many years, until they are tickled by another bug or unusual behavior from another device.

For example, a USB host controller might have severe bugs in its "device not responding" recovery code that might never show up for decades as long as you never connect such a device, but as soon as you have a slightly malfunctioning device it triggers the host controller bug as well.

So never jump to conclusions about some component "working perfectly fine"; it could always be that it works perfectly fine under certain conditions and something happened to change those conditions.

Edit 2: Adding the output of lsusb -tv, which is confusing, as my drive is a 64GB one, not a 5000 MB one...

The lsusb command does not show drive capacity – it doesn't concern itself with what kind of the device is attached, only with the attachment itself. So that's "5000M" as in 5000 Mbps, the standard link speed of a USB 3.0 Gen1 connection (what dmesg also calls "SuperSpeed").

The host controller itself (and the "root hub" representing it) is apparently a Gen2 one supporting 10 Gbps. Similarly, the other entries are USB 2.0 (480 Mbps "high-speed") and USB 1.1 (12 Mbps "full-speed").

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .