Today I added my fourth node to my Windows Server 2008 R2 Cluster.
The cluster validation reported none errors, and I was about to Live Migrate one VM over to this new node.
It resulted in the following error:
The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.
Strange, I thought, since my host was identical as the others, with every patch and updates installed.
I came across this Hotfix from MSFT: http://support.microsoft.com/kb/978527
Additional info about the RHS.exe can be found here: http://blogs.msdn.com/b/clustering/archive/2009/06/27/9806160.aspx
But in my case, there was an error related to NOD Eset Anti-Virus software.
After running Procmon, I found some registry keys that referred to Eset that caused the operation to fail.
After manually removing those entries, my Live Migration started to behave.
In general I must say, that I usually don’t install Anti-Virus on my Hyper-V hosts, since there is so much instability related to Hyper-V and especially when it comes to Failover Clustering and Live Migration. But this server was recently used as a stand-alone Hyper-V host, and the uninstall procedure of NOD Eset did not remove everything as it should.
Moral: Be careful if you consider Anti-Virus on your Hyper-V hosts.
You will have some interesting problems with Hyper-V if you are using Anti-Virus on your host and do not configure it to exclude certain files and folders related to Hyper-V
Remember to exclude the following in your Anti-Virus software:
Also exclude the root directory that contain your virtual machines and configuration files, and the following file extensions: