Timeline for Large ext4 partition ridiculously slow when writing

Current License: CC BY-SA 3.0

20 events

when toggle format	what		by	license	comment
Dec 1, 2023 at 16:16	comment	added	Mikko Rantalainen		Are you sure the disks are not actually SMR HDDs? Those are notoriously bad if you write too much in "short" time period. You should expect the performance to drop if you write more than ~20 GB/hour or maybe 200K write operations per hour per disk. The only way to restore performance of SMR HDD is to let it idle powered on, and maybe do one disk operation every minute to make sure it doesn't enter any sleep modes that would prevent cleaning its internal cache area. See unix.stackexchange.com/a/489530/20336 for details.
Aug 31, 2014 at 8:22	vote	accept	seba
Jun 9, 2014 at 17:23	history	edited	Braiam		edited tags
Apr 22, 2014 at 17:13	answer	added	seba		timeline score: 4
Apr 21, 2014 at 20:31	history	tweeted			twitter.com/#!/StackUnix/status/458342383818932225
Apr 21, 2014 at 17:09	answer	added	Thorsten Staerk		timeline score: 0
Apr 21, 2014 at 17:01	comment	added	Thorsten Staerk		how are the disks connected? SAN, USB, ATA or SCSI?
Apr 21, 2014 at 14:45	comment	added	psusi		@HaukeLaging, you got that backwards; -t flushes the kernel cache and actually measures the disk's real read speed, -T just reads what is already in the kernel cache.
Apr 21, 2014 at 9:58	history	edited	seba	CC BY-SA 3.0	added 3986 characters in body
Apr 21, 2014 at 9:44	history	edited	seba	CC BY-SA 3.0	added 4701 characters in body
Apr 21, 2014 at 8:40	comment	added	seba		Partitions alignment is OK. I ran again the crawler and slowdowns appeared even without a journal. I've gathered the data requested by @ThorstenStaerk here: vigna.di.unimi.it/slow.txt (sorry but it wouldn't let me paste it here—too long). Note that for some reason (maybe journal disabling) now the server is just twice as slow. It is also possible that I didn't catch it in the worst possible moment. I'll try to run again the crawler to see whether I can again slow down writes to a 100x factor instead of a 2x. If I'm really unable I'll try to puck back the journal in.
Apr 20, 2014 at 21:33	comment	added	ek9		Also check that your partitions are aligned correctly, you can do it with gdisk, it will complain if that is the case upon running.
Apr 20, 2014 at 20:43	comment	added	Thorsten Staerk		please paste the output of iostat -x on the slow and a normal server. Please paste vmstat 10 while writing so we see how much IOwait is on the system. Please do an strace on slow and normal servers to see performance like this: strace -c ls -R
Apr 20, 2014 at 18:59	comment	added	seba		For the time being, I'm trying disabling journalling altogether.
Apr 20, 2014 at 18:58	comment	added	seba		cfq. Thanks, really, for pointing this out—I didn't even know there was a settable scheduler. I think deadline would be more appropriate for what we are doing. We need the system being as dumb as possible, as we alternate phases in which we make large writes to different files.
Apr 20, 2014 at 18:31	comment	added	UnX		what scheduler do you use ? none,cfq,deadline?
Apr 20, 2014 at 17:15	comment	added	seba		bs=1M did not change the test (it was indeed running faster on the other servers), but thanks for the pointer. UPDATE: after leaving the disk doing nothing for 10 hours, THE SPEED IS AGAIN HIGH. The dd test copies hundreds of megabytes per second, like all other servers. So it appears that something "happens" incrementally to the file system. After a few hours, things gets slower and slower. But if stop all activity and wait for a few hours things get back to normality. I guess this has something to do with delayed writes, but frankly I don't know what I should change.
Apr 20, 2014 at 15:43	review	First posts
Apr 20, 2014 at 16:41
Apr 20, 2014 at 15:30	comment	added	frostschutz		Always add `bs=1M` or something to `dd`, otherwise you're measuring syscall overhead. `smartctl -H` is useless. Check kernel logs, full smart data, partition alignment, tune2fs, benchmark disks individually/directly w/o filesystem, ... if nothing comes up, replace cables, disk anyway
Apr 20, 2014 at 15:24	history	asked	seba	CC BY-SA 3.0

toggle format