Why is a single drive much faster than 4 of them in RAID5?

Question

I have a 2005 vintage server (dual 3GHz Xeons, LSI53C1030T RAID/SCSI controller with 256MB cache, 8GB RAM) and I'm re-purposing it for some light VM storage duty.

First try to was to put 4x300GB drives into a hardware RAID5, and then install Openfiler's LVM and iSCSI on top of it. That resulted in very inconsistent read speeds (20MB/sec to 2GB/sec, but that's probably caching), and a horrible but consistent 8MB/sec write. All these results were measured with both local dd and an actual big file transfer over the network, and both yielded similar results.

So after much reading I found that the aforementioned LSI controller isn't that great for hardware RAID, so I turned off the RAID functionality on the channel with the 4x300GB drives, made the RAID array with mdadm software RAID, and put LVM on top of it. I did more tests, and the results improved (20MB/sec writes), but that's still rather horrible. I spent another day aligning partitions, optimizing chunk, stripe-width, stride sizes, playing with ext4 options, different journaling options, etc, without much observable improvement.

Another experiment I did was running hdparm -tT on /dev/md0 vs /dev/mapper/vg0-lv0 (which was simply a mapping of the entire md0) and I got 2x slowdown when going through the LVM. I've read that LVM can introduce some speed penalties, but cutting the speed in half is not acceptable.

Since none of this was making sense, I went back to basics, made a single partition on a single drive, no LVM, RAID, just plain old SCSI320 and ran some tests on it. I got ~75MB/sec read and ~55MB/sec write with multiple runs and multiple programs.

So if one drive can do 75MB/sec read and 55MB/sec write, why does RAID5 (hardware or software!) of 3 of them gets such horrible speeds? What am I doing wrong? What else should I try?

UPDATE 1: While continuing with experiments, I noticed that one of the disks sometimes didn't want to be partitioned; parted and fdisk would simply refuse to actually write out the partitions to it. So I tried the same commands on all the other disks to make sure it's not a systemic problem, and it looked to be isolated only to that one disk. I proceeded to run smartctl's health tests on it, and everything checked out fine. dmesg was the only source of any indication that there might be something wrong with the drive, albeit with rather cryptic and not particularly helpful messages. Out of sheer curiosity, I pulled out the drive, rebooted, and redid everything I've done so far for software RAID5 without LVM but with ext4 on it. On first try, I got 200MB/sec reads, and 120MB/sec writes, to a five drive array (I found two more 300GB drives in the meantime) when testing with dd dumping 4.2GB files in 64kB blocks onto the new partition. Apparently the drive, while not completely dead, wasn't particularly cooperative, and once out of the equation, everything ran MUCH better.

I feel saner now, 8MB/sec just didn't feel right, no matter which RAID level.

Tomorrow: testing with LVM and maybe going back to hardware RAID.

I'm curious it sounds like the only thing you haven't tried is software or hardware raid without LVM (or did I misread). I think this will help isolate LVM as a possible cause. — Kyle Smith, Commented Jun 8, 2011 at 14:05
Are you sure you have the correct part number? The LSI53C1030T is showing up as a standard SCSI controller, not an RAID controller. lsi.com/storage_home/products_home/standard_product_ics/… — Greg Askew, Commented Jun 8, 2011 at 14:12
straight out of lspci: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI and Kernel driver in use: mptspi. I have upgraded the firmware on the controller, I'll try to retest some of these cases, just to see if it makes any difference. — Marcin, Commented Jun 8, 2011 at 15:31

growse · Accepted Answer · 2011-06-08 13:48:40Z

5

RAID5 is notoriously bad for write performance. The reason for this is that every write to a particular disk needs to update the parity block, so therefore every write requires reads from every other disk in the array + a computation of the parity which is then re-written to the disk where the parity is kept for that particular block.

This takes a long time, compared to just writing a single block.

If you want fast writes, a mirrored configuration is better, such as RAID1 or RAID10.

answered Jun 8, 2011 at 13:48

growse

8,08013 gold badges76 silver badges117 bronze badges

I know it's bad, but I don't think it's 8MB/sec bad, is it? That's kinda useless in today's world, when you can grab a USB enclosure with a $50 drive and get 30mb/sec out of it.
– Marcin
Commented Jun 8, 2011 at 13:53
1

RAID5 shouldn't be that much slower. It's the XOR engine on the RAID HBA or something's misconfigured still (alignment, stripe size, etc) that's the bottleneck.
– Chris S
Commented Jun 8, 2011 at 13:53
I agree the numbers don't look like they can entirely be accounted for by the RAID level, it could be a dodgy controller, bad drives, misconfiguration. My point was that if the Marcin is looking for decent write performance, RAID5 is architecturally the absolutely wrong choice.
– growse
Commented Jun 8, 2011 at 14:07
gets the accept for guessing 'bad drive' correctly.
– Marcin
Commented Jun 8, 2011 at 23:38
Wooohoo! Fluky win! :p
– growse
Commented Jun 9, 2011 at 8:29

Add a comment |

Stack Exchange Network

Why is a single drive much faster than 4 of them in RAID5?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
performance
raid
scsi
.

Hot Network Questions

Why is a single drive much faster than 4 of them in RAID5?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxperformanceraidscsi.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
performance
raid
scsi
.