Wednesday, May 9, 2012

VM Monitoring with Hyper-V Clusters in Windows Server 2012


This is a pretty cool feature (at least in the theory, but I`ll explain later).

When you create a Hyper-V cluster with Windows Server 2012, you get some additional benefits of this technical wonder.

We`re talking about VM Monitoring (light). Based on the behavior of a service running within the guest OS (must be Windows Server 2012), you can let your hypervisors in a cluster take actions to recover the services.

In a nutshell, here`s what you will have to do.

1.       Create a Hyper-V Cluster (use traditional clustering or CA File Server over SMB) and deploy some VMs.
2.       Join the VM to the same domain as your cluster – yes, this is a requirement, and configure the firewall to allow an app through the firewall (enable the domain profile).
You can also simplify this operation by firing a simple line of PowerShell in the guest:  Set-NetFirewallRule -DisplayGroup "Virtual Machine Monitoring" -Enabled True




3.       Once this is done, go back to the Failover Cluster Manager, right click your VM and select ‘More Actions à Configure monitoring’

4.       If you have followed the steps above, you should be able to communicate with your VM and get a list of the available services. Note that it states that the VM will be rebooted, which means that you should configure some recovery actions on the services itself in the guest OS. When the guest is not able to solve the problem, Hyper-V Cluster will take over and reboot your VM.

So, what actions will Hyper-V Cluster take to fix the issue with a faulty service within the guest OS?
First, it will restart the VM. If it can’t solve the problem, the VM will be rebooted again and eventually moved to another node in the cluster, if this is allowed by the failover policies on the VM.
How does Hyper-V Cluster know that the service has some problems?

The event ID 1250 is logged to the cluster, telling that the VM is in a critical state. This is detected during some health check interval in the cluster, and the actions can take place.

I mentioned earlier that this is a good thing in theory. Let me explain.
This is about as far as you get from an enterprise solution for monitoring and troubleshooting application issues. The reason is simple, that you simply just can’t reboot a server without notice if just a service fails. In an ideal world, everyone has dedicated servers for their workloads. One server for this and another server for that. But sometimes the ideal isn’t always possible and organizations have several applications on a single server. You won’t reboot your SQL server because the print spooler isn’t running.
And in some other organizations, you`ll need some approval before you can reboot your servers, and you would like to have a complete overview for the sake of SLA, and last but not least, you want to know what the heck just happened, why it happened and what to do. Read: SC 2012 Operations Manager. J
But however, I really like this feature and I find it quite useful for several scenarios and deployments. If I have a bunch of web servers that are load balanced, I can reboot the one who`s having problem, and hopefully the cluster will fix it. So I do definitively see the bright side as well.
Enjoy testing this awesome feature in Windows Server 2012.

Cheers,

5 comments:

Subhasish Bhattacharya (MSFT) said...

Hey Kristian,

Thanks for the blog!

I agree, rebooting the VM on service failures is not always acceptable in enterprise scenarios and as you mentioned this could well trigger Ops Manager warnings. It's for this reason that we added a "Enable automatic recovery for application health monitoring" option under the VM properties settings tab. By unchecking this you can customize your recovery action or trigger a customized recovery action off event 1250 and Ops Manager.

Best,

Subhasish Bhattacharya (Microsoft)

Subhasish Bhattacharya (MSFT) said...

Hey Kristian,

Thanks for the blog!

I agree, rebooting the VM on service failures is not always acceptable in enterprise scenarios and as you mentioned this could well trigger Ops Manager warnings. It's for this reason that we added a "Enable automatic recovery for application health monitoring" option under the VM properties settings tab. By unchecking this you can customize your recovery action or trigger a customized recovery action off event 1250 and Ops Manager.

Best,

Subhasish Bhattacharya (Microsoft)

Subhasish Bhattacharya (MSFT) said...

Hey Kristian,

Thanks for the blog!

I agree, rebooting the VM on service failures is not always acceptable in enterprise scenarios and as you mentioned this could well trigger Ops Manager warnings. It's for this reason that we added a "Enable automatic recovery for application health monitoring" option under the VM properties settings tab. By unchecking this you can customize your recovery action or trigger a customized recovery action off event 1250 and Ops Manager.

Best,

Subhasish Bhattacharya (Microsoft)

Anonymous said...

Hey Kristian,

Thanks for the blog!

I agree, rebooting the VM on service failures is not always acceptable in enterprise scenarios and as you mentioned this could well trigger Ops Manager warnings. It's for this reason that we added a "Enable automatic recovery for application health monitoring" option under the VM properties settings tab. By unchecking this you can customize your recovery action or trigger a customized recovery action off event 1250 and Ops Manager.

Best,

Subhasish Bhattacharya (Microsoft)

Anonymous said...

Hey Kristian,

Thanks for the blog!

I agree, rebooting the VM on service failures is not always acceptable in enterprise scenarios and as you mentioned this could well trigger Ops Manager warnings. It's for this reason that we added a "Enable automatic recovery for application health monitoring" option under the VM properties settings tab. By unchecking this you can customize your recovery action or trigger a customized recovery action off event 1250 and Ops Manager.

Best,

Subhasish Bhattacharya (Microsoft)