3

Background

I use Ubuntu 16.04.4 LTS (converted to Server from Desktop) installed on a ZFS filesystem which itself is on an Samsung 840 Evo 120 GB SSD. I also have four HDDs which are two mirrors (1 TB and 2 TB mirrors) that are also ZFS.

I use the system for storage, but also virtualisation. I use KVM virtualisation on Ubuntu and for one of the VMs I store the VM OS disk image (which is mounted on '/') on the SSD and the storage disk image (which is mounted on '/home') on the 1TB HDD mirror.

The system has 16 GB ECC RAM, and on the SSD there is an 8GB SWAP volume (ZLE compression). The VM is set at 8192 MB, so with other services on the host it gives approx. 7.8 GB for ZFS ARC.

I have noticed that when writing a lot of data in the VMs to the VM disk images, the ZFS ARC grows massively in size. While this is expected behaviour of ARC since the data is getting hits, I do have some questions to ask regarding the use of ZFS in relation to storing and using VM images that I am not sure about below.

Case

Please note the following:

  • The VM disk images are not storing any important data - they are merely used for hosting and experimenting so the integrity is not so important;
  • The disk images are stored in a filesystem within a filesystem - so if the directory is '/MyMirror/VM/', then 'MyMirror' is the initial filesystem and the latter ('VM') is the filesystem within;
  • The disk images are of the RAW format (*.img).

Questions

Taking into account the above and considering the case ...

  1. Is there actually any benefit to having the VM disk images in ZFS compared to ext4?
  2. Is there any logical reason/benefit to the ARC caching the VM disk image?
  3. Am I doing it right?

Number 2 is baffling me the most at the moment as I am struggling to see the logic and any potential benefits of the ARC caching the disk images. I could be missing the point entirely (perhaps a cloudy mind), but how is caching the disk image on the host helping the host or even the VM itself?

I understand number 3 may be more of an opinion thing (which I wanted to avoid), but I am unsure if I am going about this the "right" way. Perhaps there is a better way, taking into account my setup, to store and use VMs.

Thanks in advance!

2
  • 1
    Right now this is a very broad post with several questions. Please focus on a single specific question for each of your posts so that we can get a single, correct answer for each question. Commented Feb 27, 2018 at 23:22
  • Perhaps a slight edit. It seems from my reading that the three questions are really asking the same thing, or more precisely, an answer to any one of the three would necessarily cover the other two by way of background and elaboration. Commented Feb 28, 2018 at 2:54

1 Answer 1

2
  1. Is there any benefit to using ZFS for the VM disk storage?

Possibly. ZFS has built-in compression, dedup, snapshots and clones, and many other features which you could use to reduce your storage footprint, more easily replicate these filesystems to another system, and so on. It really depends on your use case. Not to be a zealot ;), but maybe having the flexibility to use these features if you need them is enough to say "yes"?

  1. Is the ARC doing anything useful?

From what you've said, I can't tell if you have a lot of VMs, and if so whether they share any blocks with each other (either through snapshots/clones or though dedup). If so, the cache could benefit you in that having the data hot in one will benefit the performance of all the other ones. If not, the benefits would probably be minimal because the guest OSes usually have a page cache of everything they've read or written anyway. However, this memory will be evicted quickly if there's any pressure from your host OS, because it will only have been accessed once (and will be on the LRU list in the ARC), so I wouldn't worry about the wasted space.

  1. Am I doing it right?

Totally depends on your use case. For the simple case you described, it seems fine, but then again, pretty much anything would be. If you were trying to build a storage appliance for a large farm of VMs:

  • you might consider using ZFS ZVOLs for your storage instead of files inside ZFS filesystems
  • you'd probably run your VMs on a different server from your storage (so that you can more easily scale RAM and CPU, and you can isolate storage performance from VM performance); in that case you'd expose the ZVOLs as iSCSI or FibreChannel targets
  • for performance you might want to add a lot more RAM, enable dedup and compression for storage savings (in RAM and on disk), add a dedicated SLOG device for faster sync writes, add an L2ARC device to expand the cache to SSD
  • for data resiliency you'd probably want to enable RAID-Z or similar
  • to allow for maintenance downtime you'd probably want another storage host on standby that you can replicate data to using zfs send and zfs receive, and then some way to cutover the VM hosts to point at the new location (or maybe some other way to failover; that's just one option)

You can get pretty much arbitrarily complex with your deployment if it's serious enough. But, if you don't have any complaints with how your system works now, what's the harm in keeping it the way it is?

1
  • Thanks for your response, Dan. You've highlighted a lot of good points. At the end of the day, as you pointed out, as it is working then whats the point of changing it? I was initially quite unsure of what benefits ARC has to offer for the VM disk images, but even if it offers next to none in this case then I should just take it as this is the way ZFS works. I would like to run the VMs on their own hardware, but costs for new hardware, electricity and the pain of migrating (if possible) the VMs.
    – Joel
    Commented Mar 2, 2018 at 10:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .