VMFSOver the course of past week I was asked a couple of storage related questions in regards to VMFS volumes and LUN partitions. The topic of the questions were based on something that folks with experience, and knowledge of VMware virtualization platforms are aware about. The questions were focused VMFS volumes and how they work, and the reason as why is not a good practice to create multiple VMFS volumes in a LUN that has been partitioned.  I wanted to take a moment and try to explain it as simple as I can so here is my take on why we don’t want to use multiple VMFS volumes per LUN.  I hope it’s something that can be of help.

VMware’s vStorage VMFS is a clustered file system that is shared among many servers – all or any of which could be writing to a shared LUN.  So the access to the LUN’s (which by the way is a single partition) needs to control for certain important functions.

Another important function is that an ESX server powering on a VM.  There can’t be confusion about which ESX server is running a VM because only that ESX server can write to the VM’s files (or else the VM files might get corrupted.  SO we LOCK the entire LUN (datastore) (which is usually 1 partition) when we startup a VM.  VMFS makes a note in the VMFS metadata to indicate which ESX server has that VM running (locked) or as we say the VM is “registered” on that ESX Server.

Another important function is allocating space.  When we allocate a block on VMFS we can’t have confusion about whether the block is allocated or not – so we lock the entire LUN .

How does VMware lock an entire LUN?  To lock the LUN VMware uses a feature of  iSCSI or FC arrays called a SCSI-2 reserve – it’s VMware’s “distributed locking mechanism – the ESX Server requests and the array  grants a SCSI-2 reserve to the ESX server allowing that server exclusive access to a LUN.  A SCSI-2 reserve is held  until the ESX server releases the SCSI-2 reserve (I  believe).  In VMFS3 VMware tries to hold the reserve for the shortest possible time – just 1 or 2 I/O’s – otherwise of course other ESX servers cannot access the LUN and performance could suffer.

So here’s what I think about multiple VMFS partition:  since it’s not possible to grant a SCSI-2 reserve on a single partition (if a LUN had multiple partitions) it doesn’t make sense to have separate VMFS filesystems (i.e., multiple partitions) on one LUN.  SCSI-2 reserve works only for the whole LUN.   And indeed most VMFS volumes are single partition.

As for thin provisioning these SCSI Reserves apply there too – cause if the VM needs to grow then space needs to allocate for the VM. In order to allocate space ESX server has to get a SCSI-2 reserve granting that ESX host exclusive access to the LUN that the thin-provisioned VM is on so VMFS can allocate more space to the VM.  Could be disastrous for performance if VM grows a lot. CAUSE we have to 1) Lock the VMFS LUN with a SCSI-2 reserve  2) allocate space to the VM 3) release SCSI-2 reserve 3) then do actual I/o to VM  EACH TIME VM fills up a block and needs a new block.

I hope this makes better sense, to the people that asked. I want to thank to Connie Economou for taking the time to help me simplify the topic. Hope this helps.

X