1

What happens physically if I want to copy some files to an external hard drive and do cp -a [file] /dev/sdX instead of cp -a [file] /dev/sdX1? Or when wiping a drive by overwriting it with zeros, is there a difference between dd if=/dev/zero of=/dev/sdX and dd if=/dev/zero of=/dev/sdX1? AFAIK, sdX means "the disk" and sdX1 the 1st partition on it. From a computer science perspective, both sdX and sdX1 seem to represent files (or objects in OOP lingo), but I don't really understand what the difference is from a physical perspective.

PS.: The question is maybe analogous to the question how a drive formatted in FAT is different to one formatted in Ext4. My guess here would be that 99,99% is identical, and that it's just a tiny file (the partition table?) which is different (formatting a drive usually takes only a few seconds). So the OS (?) will read the formatting information and then proceed to write the data according to a protocol specific to that format.

2 Answers 2

2

What happens physically if I want to copy some files to an external hard drive and do cp -a [file] /dev/sdX instead of cp -a [file] /dev/sdX1?

Assuming that you have proper permissions, neither command would write the file in a manner that is readily retrievable.
Both commands would cause the contents of file to be directly written to LBAs (aka sectors) and bypass any filesystem conventions.
No filename, no file size, no owner, no create time, no permissions, no attributes of any kind would be associated with the data that was written.

The first command, with destination /dev/sdX, would sequentially write the file contents starting at the first LBA of the physical drive. This would clobber the MBR (Master Boot Record) and the partition table, and make the contents of all filesystems inaccessible.
The second command, with destination /dev/sdX1, would sequentially write the file contents starting at the first LBA of partition number 1. This would clobber the boot sector of the partition, and likely corrupt the filesystem of this partition, making the filesystem (and all of its files) inaccessible.

In other words the primary difference between these two commands is the LBA (aka sector number) of where the writing starts.


Or when wiping a drive by overwriting it with zeros, is there a difference between dd if=/dev/zero of=/dev/sdX and dd if=/dev/zero of=/dev/sdX1?

The first command, with destination /dev/sdX, would completely zero-out all LBAs of the drive. No information would be readable any more (without some allegedly extraordinary techniques).
The second command, with destination /dev/sdX1, would zero-out all the LBAs of partition number 1. Although the definition of the partition still exists, the filesystem of that partition (and its files) has been overwritten and is no longer accessible. The LBAs outside this partition are not affected.


AFAIK, sdX means "the disk" ...

It refers to a drive.
SSDs and flash drives do not have "disks".

From a computer science perspective, both sdX and sdX1 seem to represent files

Not all operating systems treat devices as files (e.g. MS Windows).

...but I don't really understand what the difference is from a physical perspective.

These device names (actually device nodes) represent the top-level logical layout of the storage device. The convention established by the IBM PC and MS DOS is that a PC mass-storage device is divided into logical partitions. (Flexible disks, aka floppies, are exempt from this convention.)

Each partition is defined by a starting LBA (aka sector number) and an ending LBA, and therefore has a size (number of LBAs or sectors). Each partition is also assigned an identification code for the filesystem type when the partition is formatted (for that filesystem).
Depending on the partitioning scheme (e.g. MSDOS or GPT), there are optional flags and (volume) names.

The entire drive is represented by a /dev/sdX device node. Other than the drive size (i.e. number of LBAs) and a partition table (if any), this device does not have any other salient properties that are user concerns.


PS.: The question is maybe analogous to the question how a drive formatted in FAT is different to one formatted in Ext4.

No, those are not analogous.
Filesystems exist at an abstraction layer above drive partitioning.
The complexity of a filesystem is several orders of magnitude greater than a partitioning scheme. You have also selected two filesystems that are extreme opposites in complexity and features.

My guess here would be that 99,99% is identical, and that it's just a tiny file (the partition table?) which is different (formatting a drive usually takes only a few seconds).

Your guess is incorrect. The duration that the computer expends to perform a procedure is not a reliable gauge of the complexity of that operation.

The ext4 filesystem has file ownership and permissions. FAT does not.

The ext4 filesystem has journaling for robustness from power cuts. FAT does not.

There are more differences.


Addendum

No, those are not analogous. Filesystems exist at an abstraction layer above drive partitioning.

But aren't both the partition tables and the file system specification saved on a very small portion of the drive?

Insisting that comparing different filesystem is analogous to drive versus partition is inane. A partition is contained within the drive; the partition cannot exist without a drive.

Whereas filesystems are mutually exclusive, especially FAT versus ext4 (although the relationship between ext2/ext3/ext4 filesystems is sort of the exception to the rule). Only one filesystem can exist in a partition. You can install one of these filesystems,and then forget about and never use the other.

Your attempt to divert focus to static information stored on the medium (i.e. the partition table and some vague filesystem data that you assume exists somewhere) makes no sense.
The definition of where a partition starts and ends (or what filesystem is to be installed) is formulated first.
Then that information is stored on the medium somewhere and somehow.
The real work begins when the OS has to perform I/O to the partition(s) as files are read and written.

In other words the partition table (which doesn't change while partitions are mounted) and the filesystem structures (which will change/grow from their initialized form installed during the format) are not the salient components of partitions and filesystem (respectively).
Focusing on such data is akin to having a strategy of running a marathon that relies solely on your starting position (i.e. at the front of the crowd), and neglects the next 42 kilometers.

(FYI a long time ago for a job at a UNIX company, I wrote a FAT format utility. So at one point in time I knew exactly what disk formatting entailed.)


Addendum 2

Rather, they're like a rule set for the OS ...

No.
The partition table merely contains (parametric) data that is used by the OS.
The "rules" are implemented by the code that comprises the algorithms of the OS.
These algorithms use data.
Data are not algorithms (or "rules").

The filesystem ID for a partition installs a filesystem handler, which will process all subsequent open, read, write, lseek, and close systemcalls for that mountpoint.
The initialized structures written during format (e.g. empty allocation tables, empty root directory) are updated as the filesystem creates, writes, and deletes files.
The "rules" for performing open, read, write, lseek, and close file operations are implemented in the code of the filesystem handler.
All filesystem I/O operations will pass through the partition layer (to perform the LBA translation).

The pseudocode for a LBA translator (i.e. convert a given LBA (lba) of a filesystem "device" (device_mounted) to the LBA of the actual/physical device) could be:

if (device_mounted == drive)  
    then  
        if (lba >= drive_size)  
            then  reject translation
            else  actual_lba = lba

    else if (device_mounted == valid_partition_number)
        then  
            if (lba >= partition_size)  
                then  reject translation
                else  actual_lba = lba + partition_start

return (actual_lba)

Handling partitions is just a simple mapping.
Filesystems are subsystems that have a user interface, require resources, and perform I/O.
Using some minute commonality is not a sensible basis for comparison.

11
  • Thx for the detailed answer! "No, those are not analogous. Filesystems exist at an abstraction layer above drive partitioning." => But aren't both the partition tables and the file system specification saved on a very small portion of the drive? I mean, the physical drive does not change its structure depending on the file system, rather, it's a specific way (a rule set, a program) the host system follows to store information (like a low-level data structure), right?
    – david
    Commented Jul 24, 2020 at 21:45
  • "The duration that the computer expends to perform a procedure is not a reliable gauge of the complexity of that operation." => Yes, but it would take much longer to write over an entire drive (in order to format it to a specific file system) than to just place a few files somewhere which tells the host which rule set to follow when writing information to the drive.
    – david
    Commented Jul 24, 2020 at 21:45
  • Like, choosing a file system is a convention on how to read and write (flip bits) in order to save and retrieve a certain information, right? => The physical memory being the canvas and the file system etc. the (logical shape of) my brush...
    – david
    Commented Jul 24, 2020 at 21:50
  • "Both commands would cause the contents of file to be directly written to LBAs (aka sectors) and bypass any filesystem conventions." => Sorry, I just realized there is a difference between writing to /dev/... and /mnt/.... AFAIK, in the latter case, a file system is mounted, whereas in the former case, the host will "talk" through a different program/file (maybe something closer to the drive's hardware controller)
    – david
    Commented Jul 24, 2020 at 21:57
  • @david -- See addendum to answer. "I just realized there is a difference between writing to /dev/... and /mnt/..." -- That is not the original question that was posted and answered. BTW I'm a (retired) professional software engineer with several decades of experience, including writing firmware for disk controllers, and drivers & software for filesystems and network applications. I am not guessing or only read about how these systems work.
    – sawdust
    Commented Jul 25, 2020 at 1:11
1

A disk like /dev/sdX can be seen as a number of bytes where the number of bytes matches the disk size. On a disk like /dev/sdX you can place some kind of partition table, but you don't have to place a partition table on your disk, you can simply access your /dev/sdX as an ordinary file if you so wish or you can place a file system directly on /dev/sdX.

With some kind of partition table the beginning of /dev/sdX will be used to describe how different parts of /dev/sdX has been allocated for different partitions /dev/sdX1, /dev/sdX2... As you at least need some space for the partition table each partition will be smaller than the entire disk /dev/sdX.

Again, the partitions may contain some file system or be accessed like an ordinary file.

Exemple:

|---/dev/sdX-------------------------------------------------------------|
| partition table |--/dev/sdX1-------|--/dev/sdX2------------------------|

So if you wipe /dev/sdX you will wipe all partitions and also the partition table. If you wipe /dev/sdX1 you will only wipe that partition.

A file system is some way to use a continous device like a disk or partition in a way which allows you to dynamically allocate parts of that space for different files in a directory structure. As you understand there are many ways to implement a file system which usually creates some kind of linked lists with data.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .