Thursday, November 18, 2010

Failover Clustering and Domain Requirements (by example)

In my last post about this subject, I simply referred to things that might happens if you don’t place a DC outside your cluster. Today, we will take a look at it in more detail. 
My lab:
·         2 identical nodes with the Hyper-V role enabled
·         Both nodes are member servers of my domain
·         Cluster configured with CSV
·         ISCSI with MPIO enabled
·         Quorum: File Share Witness
·         One Domain Controller (as a VM on the cluster J )

I`ve simulated the following scenario:

1.       The entire cluster shuts down
2.       Both nodes comes online again
3.       Now what ?

(Ok, I have to admit, that I have cheated a bit so I could demonstrate the stage AFTER you are able to log on to your hosts. Since the DC was powered off, both nodes hade some troubles to login. And if you speculate what opportunities you have if you log on locally, well, here is the answer):

Anyhow, we are now logged in to both nodes, and the cluster service is in the state of ‘stopped’.

Let’s try to start it on both nodes:

Ok ! So far, so good.

Now, let’s try to start up the Failover Cluster Manager Console:

The console shows us that it`s empty. No cluster to manage, so we have to try to add our cluster.
As the error message indicate, we have a DNS lookup problem. That makes sense, since the only DC is powered off.

If we run the cmd ‘cluster node’ on both hosts, you can see that they indicate that everything is fine as far they concern, but don’t know that the other node is ‘joining’ as well.

(When you tell the Cluster Service to start in Windows 2008 R2 Failover Cluster, it just immediately starts. Then it sends out notifications to the other nods that it wants to join the Cluster. It is also calculating the number of ‘votes’ needed to achieve ‘quorum’.  Since there is no DNS connection between the nodes in this example, both nodes will be in a ‘joining’ type mode. They just wait for each other. If both nodes in this example and the witness could come online, the cluster would achieve quorum and go on its way).

Ok, so we have a DNS issue.

Since I know the IP address of my cluster, nodes, and also the witness share, and know that the first thing the DNS client does, is that it checks the local hosts file (c:\windows\system32\drivers\etc\hosts), I will add the DNS names of the involved servers here, and hopefully get the quorum.
(COLD and STONE are nodes in the cluster, CLASH is the cluster name, and SCVMM has the File Share Witness)

Now, I`m able to ping the servers by name, and let`s try to run the cluster node command again:

OK, looking good.

Let us try to add the cluster in Failover Cluster Manager again:

Are we saved ?

- What happens if we try to bring one of our VM online ?

Nothing, you cant.

If we take a look at the event log on one of our nodes, it shows some important information right here:

So, after all this struggle you are still unable to start you VMs. Moral ?
Please plan your cluster configuration carefully, and where you want to place your Domain Controller. This would easily be solved if we had a Domain Controller outside the cluster. And since I already have that, I would like to show what happens after I boot this machine.

My VMs comes online again, ready to play.


fawzi said...

Great post.. finally I can comment :)

Kristian Nese said...

Tnx. Made it a bit easier to find now, since the Norwegian is transelated into English