Don’t Use NFS 4 for Your ESXi Storage (and What to Do If It Goes Wrong)

NFS 4.1 brought a lot new features comparing to its predecessor NFS 3: Multipath, “reliable” transmission based on TCP, and better authentication support. And ESXi comes with native support for NFS 4.1. But what if I tell you that, under a very specific circumstance, NFS 4.1 will result in all your VMs being forcefully powered off, and causing data loss?

TL; DR: Avoid NFS 4.1, stick with NFS 3 on your vSphere cluster, especially if your storage backend have relatively poor performance.

Setting up an ESXi Cluster

So you have a handful of brand new ESXi servers, and want VMs to automagically move here and there based on host availability and resource usage; vCenter have you covered with the DRS and HA but obviously you need to put all the hosts into a cluster for these thing to work. What you might not know is that there are 3 ways of creating a cluster which differs in certain things, and you will regret it if you choose the wrong one. Trust me, I learned it the hard way.

Note: we are using ESXi 7.0 and vCenter 7.0 here.

