1

I'm running a Linux Box on a Seagate Free Agent Dockstar, very limited machine but more than able to do what I need, which is true most of the times...

I have the operating system on a flash drive and use an external USB 2 "classic magnetic" Western Digital 1,5TB hard disk for massive storage.

Not seldom it happens that the wait for IO % suddenly goes up to almost 100% and the system is on its knees, to the point that is very difficult to even ssh in it; a typical 'iostat -x' in those situations gives output like:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.50 0.00 2.00 8.00 14.80 91400.00 0.00 91400.00 2000.00 100.00

where sdb is the flash drive and sda is the usb disk. This tells me that the USB drive is 100% busy but almost no one is writing or reading from it.

I also used a 'lsof +D ' during "normal" use and nothing suspicious is found: a fair amount of file is used, but nothing strange.

How can I debug deeper? Keep in mind that the machine uses an ARM processor,only has 128MB of RAM and has no screen or local console, but, given this limits, I can install almost everything if needed.

Edit: I also tried to run smartctl which says the drive is fit:

SMART overall-health self-assessment test result: PASSED

... there is a lot of output but none of that seems useful

Edit2:

I really think that the drive has hardware issues; I noticed that when it is busy 100% without load it makes a repetitive noise which reminds me of the old day Amiga floppy disks - they made a similar noise like they were going around without a goal...

For this reasons the suggestion to move /var and /tmp to the magnetic disk only worsened the situation.

I guess the only way to solve this is to buy a new hard drive and backup valuable data ASAP. :-(

1
  • I think this can be closed... there is nowhere on stackexchange I could get any help for this. Bad luck with the disk, it happens! Commented May 18, 2011 at 13:53

2 Answers 2

1

Double-check that the WD drive is OK. I have just detected that one of mine has read errors. It was taking enourmus amounts of time (minutes) to read some bad sectors. Unfortunately SMART may not work over USB, which makes checking the drives condition hard.

One way to check that is reading the whole disk using dd:

dd if=/dev/sdX of=/dev/null bs=1M

That will take quite some time over USB, but if the dd command errors then you know the disk is broken. You can read the disk while it is mounted, but be careful with if= and of=!

3
  • I tried but it did not give much info... at some point the drive was stuck but nothing was displayed by dd (not even sending it a SIGUSR1). Thanks for the heads up, but now I'm trying with e2fsck -c Commented May 7, 2011 at 11:29
  • no result from e2fsck -c either... I'm trying with e2fsck -cc but it will take a few days... Commented May 9, 2011 at 5:43
  • In my search for info I learned that all decently recently drives handle bad sector internally (to a given extent); so, until the hard disk is capable of doing this, tools like e2fsck will always report 0 bad sectors. When those tools start to show bad sector is time to backup and get a new disk ASAP. Luckyly my usb drive is supported by smartmontools and that should have been the way to go from the start: in a few hours it would have told me that the drive is perfectly fit. At this point I 'm really stuck... Commented May 11, 2011 at 8:56
0

Flash drives typically have poor write performance. So you need to minimize OS writes to the flash drive.

  • Use "noatime" options in fstab for your / and /usr partition.
  • Disable swap.
  • Put /var on something else if possible (that "massive storage" if you can), if not, disable all the logging you don't absolutely need.
  • Same thing with /tmp.
1
  • thanks for your advice, but the freezing drive is a "regular" magnetic disk; I updated the text to make it clearer Commented May 6, 2011 at 14:36

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .