0

Apologies in advance, this will be a long post as I figure explaining everything that's relevant is better than being vague and having a long back-and-forth. The main reason I'm asking here is because I've tried to do my own research to figure out what the problem might be, but there are too many factors and it's very specific, so it's hard to explain to google exactly what I'm after and most searches bring up generic information about hard drive failure and to run chkdsk and check the SMART status of the drives, etc. Over all, unhelpful stuff. Basically here to see if someone knows or has experience with this specific type of corruption and whether there are other ways I can identify the real issue.

So, this is now the third time this has happened to me.

I had a 3TB external drive that was used for storing and seeding a reasonably large amount of torrents. It started to slow down and many files appeared to be corrupted, so I attempted to back up what I could before it caused a few full blown crashes and I swapped it with a new one. I thought maybe I had just used it too much and all the constant reading and writing had worn it out quicker than the others.

Then, more recently, a 3TB WD Green internal drive that I had bought for extra storage did a similar thing. This drive was for storing my photos and large video files (I'm a photographer/filmmaker) and started to give real trouble being accessed, and some files that I had other saved copies of became corrupted, so I bought a new 4TB Blue drive and managed to save pretty much everything that I still needed onto that. I ran CrystalDiskInfo and checked what I could, formatted it, ran chkdsk, and I think some Seagate repair tool before it started freezing things completely and I gave up. It was under warranty so I sent it to WD in Vietnam and they sent me a replacement when they got it and tested that it was definitely faulty.

So, naive me - with my new 4TB Blue and my replacement 3TB Green from WD, just used them to store different things (no redundancy). It's been about a year since I got these two new drives now - and I noticed a few weeks ago that some of my video files were playing back but with spots of glitchiness and corruption. At first I thought it was just lagging or something, but on rewinding it was clear that the same frames were damaged every playthrough, so I immediately started copying everything off to other drives.

I now have a bunch of damaged files (some of which I have no good backup version of, sadly). They're mostly video, but also include photos, of which I'd say 1 in every 6 or so JPG files has some sort of corruption (shifted parts of the image and missing colour information in random large blocks). I've saved all of these damaged files but have learned a big lesson about redundancy, and during the process I had quite a few fully frozen crashes.

Eventually, on one particular boot up the drive worked (albeit very slowly and inconsistently, often stopping for 30 seconds to a minute and then starting again, or speeding up and slowing down) so I was able to copy what I could off it. It is now running so slowly that a full HDD Regenerator scan is estimated to take around one week, chkdsk doesn't even work, and just browsing through what's still left is arduous. It's under warranty and I have the receipt, so I can easily get it replaced where I bought it.

My question is, could there be some other issue causing this failure or am I just very unlucky?

My system is quite old, but I haven't needed to upgrade for a long time because honestly for my usage, buying a new motherboard, CPU and RAM just hasn't been worthwhile. I'm running an i5 2500K on an ASUS P8Z68-V with 16GB of GSkill 1600MHz RAM (All bought around 2011 from memory, though I added 8GB of the same RAM and have swapped the GPU out a few times, currently using a GTX970).

The CPU used to be overclocked to around 4.5GHz (I do have a custom cooler) but over time I got system instability (BSOD's, freezes etc.) and had to keep lowering the clock over time before completely removing the OC and just using the CPU at stock speeds. Maybe the old hardware is what is damaging my drives.

The motherboard has felt a little temperamental over the last few years, sometimes randomly I get repeated "You just unplugged(/plugged) a device from the audio jack" notifications or repeated "A USB device has malfunctioned" notifications when I haven't connected or disconnected anything.

Another thing I should mention is I have 7 of 8 internal SATA ports in use on my system (Two SSDs, 4 HDDs and a Bluray Drive), plus a few externals connected via USB. I added a PCIe SATA card with two ports when I got the new Blue drive a year ago.

A secondary question I have (for which the answer may be similar to the first) is: What could cause this type of data corruption?

...where the video files still play back but only select frames are damaged? Some files are fine, others have mild corruption and it's widespread, pretty much every single folder on the drive is affected.

I've had drives crash before where they become hard to read or slow and then throw errors and eventually start clicking (full hardware failure) or drives that freeze windows but can be accessed via a bootable disk and files recovered with command line tools, but never widespread mild corruption.

I've included screenshots of CrystalDisk:

CrystalDiskInfo

and HDD Regenerator's SMART info:

HDD Regenerator

The drive in question is B: (Blue). I barely trust this SMART stuff as HDD Regenerator was showing "BACKUP RECOMMENDED" for the O: drive earlier, which is the new replacement from WD, and now the error is gone. On the screenshot I took it's showing some issues for my old 60GB Corsair SSD now, which wasn't there earlier and also makes no sense as that drive just sits there and does barely anything. Temperatures all seem fine, but I did consider that as a potential issue as there are lots of drives crammed into my case.

I'm worried that if this does have something to do with my main hardware (CPU, mobo, RAM) that it will keep happening to otherwise perfectly fine hard drives.

Sorry for the giant wall of text - obviously I'll clarify if there's something pertinent I've missed. Appreciate any help I can get!


Edit #1: So this morning the faulty drive got to where it appeared to not even be mounted in Windows. Trying to do anything with it fully froze the system. I removed it and put it in an external enclosure, and then connected it to my Macbook Air. At first it told me the disk needs initializing blah blah... I used Disk Utility and was able to quick format it (what is even going on???) to ExFat. I copied some files to it to test - fast, no issues. Moved it back to the PC internally (SATA), Disk Management couldn't even interact with it. Kept saying it was out of date and needed refreshing (which took ages and did nothing when I tried it), and the disk had a little red symbol next to it. Again, long pauses even trying to look at properties.

I put it back in the external enclosure and connected it via USB - it mounted perfectly. I quick formatted it to NTFS for giggles, that worked fine. Tried to use chkdsk again, it threw the old error it was throwing before saying the drive can't be unlocked, but ran a lot faster and quickly asked me if it could be unmounted to continue. I let it do this, and now it's running a full chkdsk. Anyone with more HDD failure experience... Maybe this will make more clear what the issue with the drive is/was? I'll report back in 7 hours or so when it's finished checking.

I should also add that I just figured out that my Premiere Media Cache was present on that drive as I just booted Premiere back up to try and get some work done on the drive that's still working, and it told me the Media Cache was missing. Could this have anything to do with it?


Edit #2: Adding images to show the corruption I'm talking about. All files still "work", they're just damaged, but not to the point of complete data loss.

Corrupted Photo File:

Corrupted Photo File

Corrupted Video File Still (these glitches only last for a few frames, some are worse than others):

Corrupted Video File Still


Edit #3: Okay so this is super weird. Chkdsk finally finished it's thing via the external. No errors whatsoever. Everything is fine apparently. Just to test, I moved the drive back into the PC and connected it via SATA again. I had it set to offline from before for some reason, so switched it back to online, and it shows up as a 465.75GB RAW partition (interestingly this is about the same size you get with an NTFS formatted 500GB drive) and two other partitions that are unallocated. I cant use the first one, so I try formatting it. It fails because of a device I/O error (???). I try the second partition, an unallocated one of 1582.25GB... "The format did not complete successfully" it now shows as a 1582.25GB RAW partition, like the first one. The third one I can't even start any process with, everything is greyed out.

I feel like this is the drive at fault but I don't understand it at all. If I had clear indication that the drive actually was fine (it seemed to be in the external housing, even down to the chkdsk output), I'd know my other hardware was at fault, not the drive. This is just confusing.

Does anyone know of software that isn't for S.M.A.R.T. or standard disk checking that I can run on this thing to determine that it's actually dying or dead? Chkdsk literally gave me the all green for a drive that by all accounts when connected internally does not work.


Edit #4: Memtest86 results from last night...

Memtest86


Edit #5: Got a full pass on Intel's Processor Diagnostic Tool, now running Prime95.

14
  • 3
    I won't answer with pure speculation, but my purest speculation is there may be power issues with so many drives. If all 4 HDD spin up at once, you could pull, short term, 8 additional amps (2 amps per hdd x 4 x 12v = 96 watts?) to get the platters moving above and beyond their "normal" operating requirements. So you might wind up with sag. This might also explain ghost device connectivity. I'd expect the computer to simply shut down without warning, but stranger things have happened. Don't buy a new PSU on the weight of this comment though, maybe use that as a google starting point
    – Yorik
    Commented Sep 20, 2017 at 19:00
  • This old question has some interesting answers and comments that focus on power and grounding. Commented Sep 20, 2017 at 19:06
  • Also, based on your post, I wouldn't t discount the possibility that the glitches are actually injected code (aka "executable image exploit")
    – Yorik
    Commented Sep 20, 2017 at 19:13
  • Have you run memtestx86 or similiar overnight to test for memory corruption?
    – cybernard
    Commented Sep 20, 2017 at 23:43
  • Hmm, I didn't think of power as being the issue. Like I said my thought was maybe the motherboard, but that's a good point. I'll look into that a bit. Commented Sep 21, 2017 at 3:19

0

You must log in to answer this question.