2

I tried to test the write speed of some SSDs and when writing to the disk directly is somehow slower that writing to the disk when it is formatted as ext4. How does this work? Is this correct or am I measuring something wrong?

for i in {1..5}; do dd if=/dev/zero of=/dev/sda1 bs=1G count=1 oflag=dsync; done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.18148 s, 150 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.18312 s, 149 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.1938 s, 149 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.15976 s, 150 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.2125 s, 149 MB/s

If i now format the disk as ext4

mkfs.ext4 /dev/sda1
mount /dev/sda1 /tmp/test
mount -ls
/dev/sda1 on /tmp/test type ext4 (rw,relatime,data=ordered)

for i in {1..5}; do dd if=/dev/zero of=/tmp/test/test.txt bs=1G count=1 oflag=dsync; done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.66437 s, 230 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.60112 s, 233 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.58899 s, 234 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.61334 s, 233 MB/s
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.60241 s, 233 MB/s

Thanks

Johannes

edit: When activating /proc/sys/vm/block_dump like frostschutz suggested and then copying to the ext4 drive it becomes obvious that the data is split up differently by the kernel.

for i in {1..5}; do dd if=/dev/zero of=/tmp/test/test.txt bs=1G count=1 oflag=dsync; done
[  922.895200] dd(2571): READ block 74112 on unknown-block(8,0) (8 sectors)
[  922.903712] dd(2571): READ block 8448 on unknown-block(8,0) (8 sectors)
[  923.724470] dd(2571): dirtied inode 12 (test.txt) on sda
[  923.729762] dd(2571): dirtied inode 12 (test.txt) on sda
[  923.735005] dd(2571): dirtied inode 12 (test.txt) on sda
[  924.543323] kworker/u8:0(2560): READ block 8320 on unknown-block(8,0) (8 sectors)
[  924.553112] kworker/u8:0(2560): WRITE block 278528 on unknown-block(8,0) (2048 sectors)
[  924.561496] kworker/u8:0(2560): WRITE block 280576 on unknown-block(8,0) (2048 sectors)
[  924.570013] kworker/u8:0(2560): WRITE block 282624 on unknown-block(8,0) (2048 sectors)
[  924.578534] kworker/u8:0(2560): WRITE block 284672 on unknown-block(8,0) (2048 sectors)

for i in {1..5}; do dd if=/dev/zero of=/dev/sda bs=1G count=1 oflag=dsync; done
[ 1504.428021] kworker/u8:0(2560): WRITE block 0 on unknown-block(8,0) (8 sectors)
[ 1504.435320] kworker/u8:0(2560): WRITE block 8 on unknown-block(8,0) (8 sectors)
[ 1504.442589] kworker/u8:0(2560): WRITE block 16 on unknown-block(8,0) (8 sectors)
[ 1504.449955] kworker/u8:0(2560): WRITE block 24 on unknown-block(8,0) (8 sectors)
[ 1504.457342] kworker/u8:0(2560): WRITE block 32 on unknown-block(8,0) (8 sectors)
[ 1504.464720] kworker/u8:0(2560): WRITE block 40 on unknown-block(8,0) (8 sectors)
3
  • The SSD in question is a Plextor M6e NVMe SSD
    – JWoeber
    Commented May 5, 2020 at 13:09
  • It seems someone was already wondering about the same phenomenon as me unix.stackexchange.com/questions/123594/… but there seem to be no answers that apply to SSDs or could explain the results when using oflag=dsync
    – JWoeber
    Commented May 5, 2020 at 13:14
  • mkfs TRIM / discard the entire device, thus providing optimal benchmark conditions. Also with /proc/sys/vm/block_dump enabled (warning - TONS of output), I'm seeing writes of 8 sectors (dd on raw block device) vs. writes of 16384 sectors (dd on ext4) so it might be due to how the kernel decides to split things up? Commented May 5, 2020 at 14:15

1 Answer 1

0

mkfs TRIM / discard the entire device, thus providing optimal benchmark conditions.

Also with /proc/sys/vm/block_dump enabled (warning - TONS of output), I'm seeing writes of 8 sectors (dd on raw block device) vs. writes of 16384 sectors (dd on ext4) so it might be due to how the kernel decides to split things up, since you can't literally send 1G block writes out?

dd on ext4:

dd(12080): dirtied inode 12 (test.txt) on loop0
dd(12080): dirtied inode 12 (test.txt) on loop0
dd(12080): dirtied inode 12 (test.txt) on loop0
kworker/u8:4(10318): READ block 2056 on loop0 (8 sectors)
kworker/u8:4(10318): WRITE block 278528 on loop0 (16384 sectors)
kworker/u8:4(10318): WRITE block 294912 on loop0 (16384 sectors)
kworker/u8:4(10318): WRITE block 311296 on loop0 (16384 sectors)
kworker/u8:4(10318): WRITE block 327680 on loop0 (16384 sectors)
kworker/u8:4(10318): WRITE block 344064 on loop0 (16384 sectors)
kworker/u8:4(10318): WRITE block 360448 on loop0 (16384 sectors)
...

dd directly:

dd(12116): WRITE block 0 on loop0 (8 sectors)
dd(12116): WRITE block 8 on loop0 (8 sectors)
dd(12116): WRITE block 16 on loop0 (8 sectors)
dd(12116): WRITE block 24 on loop0 (8 sectors)
dd(12116): WRITE block 32 on loop0 (8 sectors)
dd(12116): WRITE block 40 on loop0 (8 sectors)
dd(12116): WRITE block 48 on loop0 (8 sectors)
dd(12116): WRITE block 56 on loop0 (8 sectors)
dd(12116): WRITE block 64 on loop0 (8 sectors)
dd(12116): WRITE block 72 on loop0 (8 sectors)
dd(12116): WRITE block 80 on loop0 (8 sectors)
dd(12116): WRITE block 88 on loop0 (8 sectors)
dd(12116): WRITE block 96 on loop0 (8 sectors)
dd(12116): WRITE block 104 on loop0 (8 sectors)
dd(12116): WRITE block 112 on loop0 (8 sectors)
dd(12116): WRITE block 120 on loop0 (8 sectors)
dd(12116): WRITE block 128 on loop0 (8 sectors)
...

Now I only tested a loop device, not a real SSD, so... it might not be accurate.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .