3

I read about ext4 checksums (https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums), but I am not sure what they actually checksum. The name implies that only metadata is covered, not the actual data in the files? What kinds of errors can be detected?

0

2 Answers 2

1

I cannot see a real benefit to checksumming file data itself. It is an expensive operation on a block of data of unknown size and is duplicating effort that is done by the disk.

The checksumming here is providing filesystem level integrity checking, actual data on disk would rely on the disk internal checksums themselves. By checksumming metadata you protect the critical filesystem structures from software bugs and provide an extra layer of defence.

Essentially if data is corrupted in the filesystem checksumming data tells you what you can ignore or what you need to validate and recheck, there is little benefit (and potentially a big overhead) of checksumming large files when it is already done by the disk itself.

Actual file checksumming is also something that could easily be done by the application that wrote the data in the first place, archive formats do this and many applications will check data integrity to be sure they are not loading garbage. Doing it at the filesystem level as well as application and disk would be redundant and almost certainly unnecessary.

3
  • Thanks! This sounds like a good thing to have, but I will need to keep looking then to have a way to keep my old archived data more safe. I do backups, but if I do not know when an old file becomes corrupted I do not know when it is time to read it back from the backup.
    – pelle
    Commented Oct 18, 2017 at 8:07
  • For backup purposes you should always be thinking about redundancy. Depending on how important the data is you would have a second duplicate copy on your current machine, a local copy on-premesis on a USB hard drive or memory stick, and another off-site copy. It's up to you to be certain your data is safe, dont trust the dumb box on the floor to do it for you.
    – Mokubai
    Commented Oct 18, 2017 at 8:24
  • It is for long-term storage. I have files I have kept around since downloading them to floppies 30 years ago. Have them well-covered by on- and off-site backups, but if a few bytes in a file are corrupt and I do not notice it for many years it can be difficult to find a good copy (depending on how well the various backup media have survived) so I would prefer to notice broken files as soon as possible. There is already a long list of bad files because I kept them on cdrom for a decade or two before gathering them on the current disk.
    – pelle
    Commented Oct 18, 2017 at 10:31
1

The patch author summarized what is covered here:

  • The superblock stores a crc32c of itself.
  • Each inode stores crc32c(fs_uuid + inode_num + inode_gen + inode + slack_space_after_inode)
  • Block and inode bitmaps each get their own crc32c(fs_uuid + group_num + bitmap), stored in the block group descriptor.
  • Each extent tree block stores a crc32c(fs_uuid + inode_num + inode_gen + extent_entries) in unused space at the end of the block.
  • Each directory leaf block has an unused-looking directory entry big enough to store a crc32c(fs_uuid + inode_num + inode_gen + block) at the end of the block.
  • Each directory htree block is shortened to contain a crc32c(fs_uuid + inode_num + inode_gen + block) at the end of the block.
  • Extended attribute blocks store crc32c(fs_uuid + id + ea_block) in the header, where id is, depending on the refcount, either the inode_num and inode_gen; or the block number.
  • MMP blocks store crc32c(fs_uuid + mmpblock) at the end of the MMP block.
  • Block groups can now use crc32c instead of crc16.
  • The journal now has a v2 checksum feature flag.
  • crc32c(j_uuid + block) checksums have been inserted into descriptor blocks, commit blocks, revoke blocks, and the journal superblock.
  • Each block tag in a descriptor block has a checksum of the related data block.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .