How to tell whether zeros originate from trim or from actually writing zeros on an SSD

Question

Say if I write zero to a certain range of LBAs of a drive, then I TRIM those logical blocks (say, with blkdiscard or even hdparm). Given the drive has the Read Zero After TRIM behavior, is there a way for someone to tell whether the the logical blocks were TRIM'd or not?

(One way I could think of, which might work, could be a timed read to detect a faster response on trimmed data block than on a non-trimmed data block with all zeros, but this question is more about an API to just ask the disk.)

dont think its possible without directly accessing the SSD hardware. — Keltari, Commented Jun 1, 2021 at 2:17
Are you asking of the block in an SSD is trimmed and considered free by the ssd, or are you asking if a block in a file is a hole with no disk block associated with it? The former is only relevant to block devices and free blocks in filesystems, not associated with files. The latter is relevant to files but not SSDs or trim. You seem to be confusing the two in your question. — user10489, Commented Jun 1, 2021 at 5:04
AFAIK there's no command in the common protocols / command sets that allow you to check the logical-block-to-physical-block mapping. There might be "forensic" approach to communicate with the controller though, which I don't know of. — Tom Yan, Commented Jun 1, 2021 at 7:50
@user10489: I did ask exactly the question, I intended to ask. Now, I clarified the question. You assume, that trimming is irrelevant on allocated disk space. This assumption is wrong, which makes you jump to a wrong conclusion. — Bodo Thiesen, Commented Jun 1, 2021 at 14:58
There are plenty of good reasons (and bad ones) for writing zeros instead of a hole or creating a hole instead of writing zeros. There is a long history in unix surrounding holes in files (usually involving random seeks and databases that store as sparse files), and I'm not aware of any unix that will create a hole if you explicitly write zeros. — user10489, Commented Jun 2, 2021 at 4:57

user10489 · Accepted Answer · 2021-06-02 05:12:29Z

First, trim is not used for zeroing blocks. Trim is used to tell an SSD that a block is no longer needed and its contents can be discarded. The intent of trim is to actually help an SSD do write leveling by telling it that it can freely rewrite a block without first preserving its contents. As such, it doesn't necessarily make sense to take an extra step to zero the block first.

The SSD trim standard states that reading a trimmed block will produce undefined results, rather than a zero block, although some versions of the standard do include a "read zero after trim" version of trim. Since reading a trimmed block has undefined behavior, even ignoring cache issues, it is unlikely you will be able to tell if a block is trimmed by timing a read of it. Writing to a trimmed block might not have a timing difference either, as write leveling might cause a different physical block to be written anyway.

Since trim is intended to discard blocks, a filesystem would not use trim on a block in a file -- but only on a block freed from a file. So to ask if there is a way to tell if a block in a file has been trimmed makes no sense, because a filesystem would not do this to a file. If you do have a filesystem with trimmed blocks in files, the filesystem is likely corrupt. Even if you do have such a corrupt filesystem, the trim standard does not include any way to query if a block is trimmed, or even how many blocks on the SSD are trimmed. A better question would be to ask if there is a way to determine if a block in a file is corrupt. Some filesystems (e.g., zfs, others) do have this ability, but it may not be directly accessible outside of filesystem internals. In a RAID, on reading a corrupt block, the raid might log this event, but will also reconstruct the block and likely rewrite a good block in its place. If that fails, presuming the RAID doesn't just go offline, it might return an I/O error.

However if your filesystem is actually an image of a filesystem that is actually a file in another filesystem, fstrim can tell the operating system to release free blocks, which causes the underlying filesystem to actually create holes in the filesystem image file. Unlike trim on SSDs, the behavior of holes in files is very well defined, and will always return zeros. There are also (somewhat unportable) system calls that will allow a program to ask the filesystem where the holes in a file are.

For the record, even while quite many SATA SSDs do not advertise RZAT, the actual behavior is often at least partially RZAT. (For example, only the remainder that is smaller than certain granularity will be ignored; I've seen in some older case that even each TRIM range needs to align to that.) This seems to have changed in the case of NVMe btw, as the spec is apparently "deallocation"-aware by nature. The point is, the RZAT behavior does not (necessarily) rely on zeroing the actual memory. Rather it "deallocates" by "unmapping" LBA from that. — Tom Yan, Commented Jun 2, 2021 at 5:27
This really is an implementation detail and the standard leaves it up to the manufacturer to decide. As such, behavior of these features has drifted over the years where the standard allows it. — user10489, Commented May 21, 2023 at 13:17

Tom Yan · Accepted Answer · 2021-06-02 06:31:45Z

I think you should ask the question in a better way, by not involving file or filesystem at all, since that seems to be your actual intest:

Say if I write zero to a certain range of LBAs of a drive, then I TRIM those logical blocks (say, with blkdiscard or even hdparm). Given the drive has the Read Zero After TRIM behavior, is there a way for someone to tell whether the the logical blocks were TRIM'd or not?

And the answer (from me) is, unless the vendor provides some vendor-specific way to check the mapping of the logical blocks to the actual storage, or you manage to hack the controller by some means to do the equivalent, no, there's no standard way in common protocols / command sets for someone to tell reliably.

By the way, there's even a thing called "write same" in SCSI and recent ACS, which can be used to "write zero". Here I'm referring to "normal" write dd. Speaking of that, some SSD controller were known to "optimize" normal zero writing, IIRC... — Tom Yan, Commented Jun 2, 2021 at 6:37

zwhconst · Accepted Answer · 2022-11-01 02:39:14Z

For NVME disks, there's a thing called DULBE, or Deallocated or Unwritten Logical Block Error. Here's a quote from chapter 3.2.3.2.1 Deallocated or Unwritten Logical Blocks in NVM Express® NVM Command Set Specification revision 1.0c October 3rd, 2022.

Using the Error Recovery feature (refer to section 4.1.3.2), host software may select the behavior of the controller when reading deallocated or unwritten blocks. The controller shall abort Copy, Read, Verify, or Compare commands that include deallocated or unwritten blocks with a status of Deallocated or Unwritten Logical Block if that error has been enabled using the DULBE bit in the Error Recovery feature.

For certain NVME disk models that support this feature, it seems you can utilize it to retrieve whether one block is trimmed/unwritten or not.

But I don't know any model by which this feature is supported, at least not by Samsung SSD 980 PRO 1TB or INTEL P4510 1TB:

# nvme id-ns /dev/nvme0n1 -H | grep "Deallocated or Unwritten Logical Block error"
  [2:2] : 0     Deallocated or Unwritten Logical Block error Not Supported

Joep van Steen · Accepted Answer · 2024-06-09 20:38:44Z

I have asked myself the same thing. I had this hypothesis that since zeros that are result of TRIM, which are merely let's say "placeholders", are fundamentally different than sectors actually read the difference may be measurable.

You can observe a difference in read speed between say a volume of 100 GB that trimmed vs. one that is zero filled. The trimmed volume's read speed exceeds that of the zero filled volume. I don't have a benchmark at hand to demonstrate this, but I have seen plenty that do: a trimmed drive returns blocks of zeros on read but is dramatically faster than reading actual zeros that were written to the SSD.
Since reading from NAND requires 'increased power' to pump the NAND to required levels, and that this is not required to get 'trimmed LBA space' as the controller simply returns placeholder sectors this may be measurable too.

I have run some tests to confirm this and sort of as proof of concept, and those seem to support the idea, I need to repeat more refined tests some day. However I was able to measure a difference in power consumption when reading trimmed drives vs. zero filled drives although the difference was marginal:

Note that modern SSDs may apply compression at the firmware level and in that case all bets are off again!

Stack Exchange Network

How to tell whether zeros originate from trim or from actually writing zeros on an SSD

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
ssd
trim
.

Linked

Hot Network Questions

How to tell whether zeros originate from trim or from actually writing zeros on an SSD

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged ssdtrim.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
ssd
trim
.