2

I have a Windows installation on a 1TB SanDisk SDSSDH3-1T00-G25 disk, and have had for about two years. I had noticed a slowdown, but thought little of it (it's not a computer I use intensively), then a week ago I noticed that nightly full backups were taking a ungodly long time.

According to WINSAT, the disk is really slow in random access, but a more thorough test showed that it is also very slow in very long sequential reads.

This is a hardware issue, but I cannot diagnose it and SanDisk Dashboard says the SSD is perfect and has 99% life left. The 1TB size is because I upgraded a 256GB disk and got a good deal on the SSD, but I never filled it more than perhaps 300GB, so it's not the "SSD almost full" effect. The onboard firmware is up-to-date (no updates available). Windows TRIM status is "enabled" (running fsutil behavior query DisableDeleteNotify from an elevated prompt returns "0" as it should).

How do I know it's a hardware issue -- because, both to solve the problem radically and to check it reliably, I bought another SSD (a Samsung 870 EVO) and cloned the old one with an external dock, running with no computer attached. A 1TB clone usually takes ten to fifteen minutes. This one took fourteen hours.

In an external USB3 enclosure, the old disk is happily running without errors at 2-5 MB/s of throughput - I think I have owned some old IDE disks that were this slow, but that was a long time ago and am not sure :-).

Did anyone encounter this... thing? But the real question: is there a way of restoring - or at least increasing - the SSD speed? I'd somewhat like to be able to put to use that 99% life left on a 1TB external disk, if at all possible. If it's either a deterioration of the drive, or a sign of impending failure, or if I can't be sure to have a reliable storage, then of course I'll just trash it.

Tests I'm planning to perform on my idle time:

  • running a SMART test. My bad, I didn't think of doing that before cloning the disk.
  • running the external unit with a 1m USB cable, from inside a fridge. I'm aware very high temperatures can erase SSDs, but how about simple summer heat waves? What if they can slow down a device? I started noticing the slowdown at the beginning of June, so maybe...?
  • re-reading and re-writing the whole surface. Maybe I've been getting problems from sectors that have become "difficult to read reliably". I'll use dd from Linux to refresh the unit.

Epilogue

Rewriting the cloned image from the new Samsung to the old Sandisk SSD restored the latter's read speed to full.

Therefore, a periodical rejuvenation of the SSD is doable.

I'm playing with the idea of writing something similar to ddrescue -- save the last offset, then read sectors and, if read speed is below a threshold, perform a low-level rewrite. When arriving to the end of the disk, save the timestamp of the beginning and end of operation. Run whenever one month has elapsed from the previous run, as long as system load is reasonably low, with the nicest possible I/O and CPU priority. This ought to write as little as possible and impact performances as little as possible, and still have the disk performance loss at the very worst no more than one month along.

2 Answers 2

3

Old data will be slower to read

(it's not a computer I use intensively), then a week ago I noticed that nightly full backups were taking a ungodly long time

In general, 'hot' data, data that frequently changes and so gets re-written will read faster than 'cold' data, data you write to the drive and sits unchanged for a long time.

The SSD needs to put more effort into reading the colder data. Charge in the NAND cells will drop over time, the more it does, the more the drive needs to rely on error correction or even so called RR registers to read the data.

The latter means the SSD will experiment with different thresholds that decide if the data inside a cell represents a zero or a one.

See how changing these values dramatically increases read success when we address individual NAND chips using specialized data recovery software:

https://youtu.be/pSIJKt_uBfc

Note that the software in the video tries to emulate what SSD controllers (and other flash controllers) do to recover data from NAND during normal operations.

This is of little practical value to an SSD user, but means to demonstrate a process that SSD's will apply on the fly to 'recover' data. All these error recovery procedures take time and will make the SSD appear slow.

TRIM enabled does not mean Windows will TRIM

Windows TRIM status is "enabled" (running fsutil behavior query DisableDeleteNotify from an elevated prompt returns "0" as it should).

Even if so, Windows will only TRIM NTFS drives. So if you for whatever strange reason decided to go with exFAT, Windows will not send TRIM commands.

I have also noticed certain file system errors will cause Windows not to send TRIM commands.

These are just examples to illustrate that even if you suspect Windows to send TRIM commands because it's configured to, doesn't mean it always will.

Also we need to understand what TRIM does and doesn't and how it affects mainly write speeds. TRIM itself does nothing, it's simply a command that allows an OS or utility to notify about LBA addresses the SSD is free to erase.

So now SSD can put these aside for the 'garbage collector' which will then rearrange them so it has erase blocks it can erase. At that point the pages inside the erase block are ready for use again.

These processes take time, and so even if we're dealing with an SSD with 50% free space, if we'd constantly rewrite the 50% occupied space there's still a chance the garbage collector can not keep up and we have to wait for the SSD to free up pages for the data we want to write.

SMART data could be useful

running a SMART test. My bad, I didn't think of doing that before cloning the disk.

I feel that with this type of questions that rather than telling us SMART is OK, you should post an actual screenshot or txt report.

Effect of temperature

running the external unit with a 1m USB cable, from inside a fridge. I'm aware very high temperatures can erase SSDs, but how about simple summer heat waves? What if they can slow down a device? I started noticing the slowdown at the beginning of June, so maybe...?

Heat does not erase SSD's but it may indeed propagate 'charge bleed', the phenomena where the charge level in individual cells drops. But lots of things influence charge levels inside cells, even reading and writing neighboring cells can 'inject' charge. Writing while 'cold', reading while warm is also potentially bad.

Going back to the special RR registers I mentioned earlier:

Colleagues of mine, much more proficient than I am in SSD and NAND data recovery observed effects of temperature on the quality of reads and even to a degree where they would need to employ RR or not. IOW, by either cooling the NAND chips or heating them the need for RR disappeared and it was hypothesized that by cooling or heating the NAND the temperature was closer to the temperature at the time the NAND was programmed or written to.

Rather than freezing it, taking care of a steady ambient temperature might be the best way to go.

Read / write surface scans

re-reading and re-writing the whole surface. Maybe I've been getting problems from sectors that have become "difficult to read reliably". I'll use dd from Linux to refresh the unit.

To detect read problems there's no need to waste an entire p/e cycle, you can just read the drive. A read/write 'scan using dd does not just write once, it will induce a great number of 'write-amplification' events.

Also, once you have written to each LBA sector, entire LBA space will be 'mapped' in FTL, leaving only overprovisioned space to the garbage collector to erase stale pages.

It's like utilizing the files system 100% which is as we know not a great idea on an SSD. It should be able to handle it, but it's far from ideal IMO.

If you erase the entire drive, or read/write to it using dd, at least make sure to TRIM the entire space when putting the drive back to use, for example using https://github.com/tenox7/disktrim.

Closing words

Comparing a 'virgin state' SSD to a 2 year old SSD, assuming comparable specs, will always favor the brand new SSD.

If specs differ then comparison is unfair. A more modern SSD or a better equipped SSD (more efficient ARM CPU, better firmware, more pseudo SLC NAND and whatnot) may make the older SSD seem 'sick' while it's doing what it can do under a specific set of circumstances.

Backing up the SSD, TRIM the entire 'surface' and then restore data from backup should largely restore expected speeds, although this will never counteract 'wear'.

6
  • Thank you. This is more or less what I was suspecting. And yes, I did expect the new SSD to be better, what didn't look right was for the new drive to get 500MB/s and the old drive to only reach two to five. A 20% difference I can get over, but two orders of magnitude is a bit much! I'll run a test by issuing a TRIM on the SSD (which is only 90% provisioned: the cloner dock recognizes partitions), then cloning the new Samsung drive onto it.
    – LSerni
    Commented Jun 30, 2023 at 10:26
  • Yes, I can not guarantee the 5 MB is normal under circumstances, what I wrote is largely 'the theory'. But if charge bleed is a factor and lots of error recovery is required, it will be considerably slower. So outcome of your experiments would be interesting. Note that such an experiment with regards to read speed would be due to refreshed data (so less error recovery required). Commented Jun 30, 2023 at 10:37
  • I could swear I had reported on this but apparently I didn't. Anyway, cloning the new SSD onto the old SSD using the duplicator dock took the expected short time, after which data read speed on the old SSD was magically restored.
    – LSerni
    Commented Jul 7 at 21:24
  • So just reading improved speed, is this what you're saying? If so, I have seen indication for that happening before: benchmark (slow) -> read-only scan -> speed restored. Only explanation I can think of is that SSD, being forces to read, discovered it needs to put a lot of effort into reading data (ECC correction, RR reads) and then decides to refresh the data. Commented Jul 7 at 23:36
  • no, sorry, I probably misspoke: what I mean is, after I copied the slow SSD on a new SSD, and wrote about it, I then wrote back the new SSD on the old SSD (I did a second "clone" in the reverse direction). The data on the old SSD remained the same, obviously, but the old, slow SSD, once rewritten, became fast again.
    – LSerni
    Commented Jul 8 at 8:03
1

Finally, someone with the same problem as me.

https://www.reddit.com/r/DataHoarder/comments/1dw1jk4/cheap_ssd_slower_read_speed_after_time_in_cold/

My TeamGroup CX2 does that too. Freshly-written data can be read back at over 400 MB/s. After half a year, I get read speeds of around 80 MB/s. After one year, I get read speeds of 1-2 MB/s.

Any new data you write on the drive will be "fresh", so it can be read at "normal" speed. So, simply duplicating the files then deleting the old copies is enough.

It is sometimes recommended to simply leave the drive plugged in for 24 hours (or even more, since 24 hours would not me enough to read 2 TB at 5 MB/s). The controller should begin checking for stale data and refreshing it to pass the time. I assumed most controllers would do that, but apparently not.

1
  • No, they do not. I often keep my laptop powered on for up to two days (but it's CPU-bound tasks, the disk has little to do) and even so, I had the problem.
    – LSerni
    Commented Jul 7 at 21:18

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .