What happens physically if I want to copy some files to an external hard drive and do cp -a [file] /dev/sdX instead of cp -a [file] /dev/sdX1?
Assuming that you have proper permissions, neither command would write the file in a manner that is readily retrievable.
Both commands would cause the contents of file
to be directly written to LBAs (aka sectors) and bypass any filesystem conventions.
No filename, no file size, no owner, no create time, no permissions, no attributes of any kind would be associated with the data that was written.
The first command, with destination /dev/sdX, would sequentially write the file contents starting at the first LBA of the physical drive. This would clobber the MBR (Master Boot Record) and the partition table, and make the contents of all filesystems inaccessible.
The second command, with destination /dev/sdX1, would sequentially write the file contents starting at the first LBA of partition number 1. This would clobber the boot sector of the partition, and likely corrupt the filesystem of this partition, making the filesystem (and all of its files) inaccessible.
In other words the primary difference between these two commands is the LBA (aka sector number) of where the writing starts.
Or when wiping a drive by overwriting it with zeros, is there a difference between dd if=/dev/zero of=/dev/sdX and dd if=/dev/zero of=/dev/sdX1?
The first command, with destination /dev/sdX, would completely zero-out all LBAs of the drive. No information would be readable any more (without some allegedly extraordinary techniques).
The second command, with destination /dev/sdX1, would zero-out all the LBAs of partition number 1. Although the definition of the partition still exists, the filesystem of that partition (and its files) has been overwritten and is no longer accessible. The LBAs outside this partition are not affected.
AFAIK, sdX means "the disk" ...
It refers to a drive.
SSDs and flash drives do not have "disks".
From a computer science perspective, both sdX and sdX1 seem to represent files
Not all operating systems treat devices as files (e.g. MS Windows).
...but I don't really understand what the difference is from a physical perspective.
These device names (actually device nodes) represent the top-level logical layout of the storage device. The convention established by the IBM PC and MS DOS is that a PC mass-storage device is divided into logical partitions. (Flexible disks, aka floppies, are exempt from this convention.)
Each partition is defined by a starting LBA (aka sector number) and an ending LBA, and therefore has a size (number of LBAs or sectors). Each partition is also assigned an identification code for the filesystem type when the partition is formatted (for that filesystem).
Depending on the partitioning scheme (e.g. MSDOS or GPT), there are optional flags and (volume) names.
The entire drive is represented by a /dev/sdX device node. Other than the drive size (i.e. number of LBAs) and a partition table (if any), this device does not have any other salient properties that are user concerns.
PS.: The question is maybe analogous to the question how a drive formatted in FAT is different to one formatted in Ext4.
No, those are not analogous.
Filesystems exist at an abstraction layer above drive partitioning.
The complexity of a filesystem is several orders of magnitude greater than a partitioning scheme.
You have also selected two filesystems that are extreme opposites in complexity and features.
My guess here would be that 99,99% is identical, and that it's just a tiny file (the partition table?) which is different (formatting a drive usually takes only a few seconds).
Your guess is incorrect. The duration that the computer expends to perform a procedure is not a reliable gauge of the complexity of that operation.
The ext4 filesystem has file ownership and permissions. FAT does not.
The ext4 filesystem has journaling for robustness from power cuts. FAT does not.
There are more differences.
Addendum
No, those are not analogous. Filesystems exist at an abstraction layer above drive partitioning.
But aren't both the partition tables and the file system specification saved on a very small portion of the drive?
Insisting that comparing different filesystem is analogous to drive versus partition is inane. A partition is contained within the drive; the partition cannot exist without a drive.
Whereas filesystems are mutually exclusive, especially FAT versus ext4 (although the relationship between ext2/ext3/ext4 filesystems is sort of the exception to the rule). Only one filesystem can exist in a partition. You can install one of these filesystems,and then forget about and never use the other.
Your attempt to divert focus to static information stored on the medium (i.e. the partition table and some vague filesystem data that you assume exists somewhere) makes no sense.
The definition of where a partition starts and ends (or what filesystem is to be installed) is formulated first.
Then that information is stored on the medium somewhere and somehow.
The real work begins when the OS has to perform I/O to the partition(s) as files are read and written.
In other words the partition table (which doesn't change while partitions are mounted) and the filesystem structures (which will change/grow from their initialized form installed during the format) are not the salient components of partitions and filesystem (respectively).
Focusing on such data is akin to having a strategy of running a marathon that relies solely on your starting position (i.e. at the front of the crowd), and neglects the next 42 kilometers.
(FYI a long time ago for a job at a UNIX company, I wrote a FAT format utility. So at one point in time I knew exactly what disk formatting entailed.)
Addendum 2
Rather, they're like a rule set for the OS ...
No.
The partition table merely contains (parametric) data that is used by the OS.
The "rules" are implemented by the code that comprises the algorithms of the OS.
These algorithms use data.
Data are not algorithms (or "rules").
The filesystem ID for a partition installs a filesystem handler, which will process all subsequent open, read, write, lseek, and close systemcalls for that mountpoint.
The initialized structures written during format (e.g. empty allocation tables, empty root directory) are updated as the filesystem creates, writes, and deletes files.
The "rules" for performing open, read, write, lseek, and close file operations are implemented in the code of the filesystem handler.
All filesystem I/O operations will pass through the partition layer (to perform the LBA translation).
The pseudocode for a LBA translator (i.e. convert a given LBA (lba
) of a filesystem "device" (device_mounted
) to the LBA of the actual/physical device) could be:
if (device_mounted == drive)
then
if (lba >= drive_size)
then reject translation
else actual_lba = lba
else if (device_mounted == valid_partition_number)
then
if (lba >= partition_size)
then reject translation
else actual_lba = lba + partition_start
return (actual_lba)
Handling partitions is just a simple mapping.
Filesystems are subsystems that have a user interface, require resources, and perform I/O.
Using some minute commonality is not a sensible basis for comparison.