Skip to main content
20 events
when toggle format what by license comment
Dec 1, 2023 at 16:16 comment added Mikko Rantalainen Are you sure the disks are not actually SMR HDDs? Those are notoriously bad if you write too much in "short" time period. You should expect the performance to drop if you write more than ~20 GB/hour or maybe 200K write operations per hour per disk. The only way to restore performance of SMR HDD is to let it idle powered on, and maybe do one disk operation every minute to make sure it doesn't enter any sleep modes that would prevent cleaning its internal cache area. See unix.stackexchange.com/a/489530/20336 for details.
Aug 31, 2014 at 8:22 vote accept seba
Jun 9, 2014 at 17:23 history edited Braiam
edited tags
Apr 22, 2014 at 17:13 answer added seba timeline score: 4
Apr 21, 2014 at 20:31 history tweeted twitter.com/#!/StackUnix/status/458342383818932225
Apr 21, 2014 at 17:09 answer added Thorsten Staerk timeline score: 0
Apr 21, 2014 at 17:01 comment added Thorsten Staerk how are the disks connected? SAN, USB, ATA or SCSI?
Apr 21, 2014 at 14:45 comment added psusi @HaukeLaging, you got that backwards; -t flushes the kernel cache and actually measures the disk's real read speed, -T just reads what is already in the kernel cache.
Apr 21, 2014 at 9:58 history edited seba CC BY-SA 3.0
added 3986 characters in body
Apr 21, 2014 at 9:44 history edited seba CC BY-SA 3.0
added 4701 characters in body
Apr 21, 2014 at 8:40 comment added seba Partitions alignment is OK. I ran again the crawler and slowdowns appeared even without a journal. I've gathered the data requested by @ThorstenStaerk here: vigna.di.unimi.it/slow.txt (sorry but it wouldn't let me paste it here—too long). Note that for some reason (maybe journal disabling) now the server is just twice as slow. It is also possible that I didn't catch it in the worst possible moment. I'll try to run again the crawler to see whether I can again slow down writes to a 100x factor instead of a 2x. If I'm really unable I'll try to puck back the journal in.
Apr 20, 2014 at 21:33 comment added ek9 Also check that your partitions are aligned correctly, you can do it with gdisk, it will complain if that is the case upon running.
Apr 20, 2014 at 20:43 comment added Thorsten Staerk please paste the output of iostat -x on the slow and a normal server. Please paste vmstat 10 while writing so we see how much IOwait is on the system. Please do an strace on slow and normal servers to see performance like this: strace -c ls -R
Apr 20, 2014 at 18:59 comment added seba For the time being, I'm trying disabling journalling altogether.
Apr 20, 2014 at 18:58 comment added seba cfq. Thanks, really, for pointing this out—I didn't even know there was a settable scheduler. I think deadline would be more appropriate for what we are doing. We need the system being as dumb as possible, as we alternate phases in which we make large writes to different files.
Apr 20, 2014 at 18:31 comment added UnX what scheduler do you use ? none,cfq,deadline?
Apr 20, 2014 at 17:15 comment added seba bs=1M did not change the test (it was indeed running faster on the other servers), but thanks for the pointer. UPDATE: after leaving the disk doing nothing for 10 hours, THE SPEED IS AGAIN HIGH. The dd test copies hundreds of megabytes per second, like all other servers. So it appears that something "happens" incrementally to the file system. After a few hours, things gets slower and slower. But if stop all activity and wait for a few hours things get back to normality. I guess this has something to do with delayed writes, but frankly I don't know what I should change.
Apr 20, 2014 at 15:43 review First posts
Apr 20, 2014 at 16:41
Apr 20, 2014 at 15:30 comment added frostschutz Always add bs=1M or something to dd, otherwise you're measuring syscall overhead. smartctl -H is useless. Check kernel logs, full smart data, partition alignment, tune2fs, benchmark disks individually/directly w/o filesystem, ... if nothing comes up, replace cables, disk anyway
Apr 20, 2014 at 15:24 history asked seba CC BY-SA 3.0