Virtualization and some coffee: SCVMM

Showing posts with label SCVMM. Show all posts

Monday, January 19, 2015

Business Continuity with SCVMM and Azure Site Recovery

Business Continuity for the management stamp

Back in November, I wrote a blog post about the DR integration in Windows Azure Pack, where service providers can provide managed DR for their tenants - http://kristiannese.blogspot.no/2014/11/windows-azure-pack-with-dr-add-on-asr.html

I’ve been working with many service providers over the last months where both Azure Pack and Azure Site Recovery has been critical components.

However, looking at the relatively big footprint with the DR add-on in Update Rollup 4 for Windows Azure Pack, organizations has started in the other end in order to bring business continuity to their clouds.

For one of the larger service providers, we had to dive deep into the architecture of Hyper-V Replica, SCVMM and Azure Site Recovery before we knew how to design the optimal layout to ensure business continuity.

In each and every ASR design, you must look at your fabric and management stamp and start looking at the recovery design before you create the disaster design. Did I lost you there?

What I’m saying is that it’s relatively easy to perform the heavy lifting of the data, but once the shit hit the fans, you better know what to expect.

In this particular case, we had a common goal:

We want to ensure business continuity for the entire management stamp with a single click, so that tenants can create, manage and operate their workloads without interruption. This should be achieved in an efficient way with a minimal footprint.

When we first saw the release of Azure Site Recovery, it was called “Hyper-V Recovery Manager” and required two SCVMM management stamps to perform DR between sites. The feedback from potential customers were quite loud and clear: people wanted to leverage their existing SCVMM investment and perform DR operations with a single SCVMM management stamp. Microsoft listened and let us now perform DR between SCVMM Clouds, using the same SCVMM server.

Actually, it’s over a year ago since they made this available and diving into my archive I managed to find the following blog post: http://kristiannese.blogspot.no/2013/12/how-to-setup-hyper-v-recovery-manager.html

So IMHO, using a single SCVMM stamp is always preferred whenever it is possible, so that was also my recommendations when it came to the initial design for this case.

In this blog post, I will share my findings and workaround for making this possible, ensuring business continuity for the entire management stamp.

The initial configuration

The first step we had to make when designing the management stamp, was to plan and prepare for SQL AlwaysOn Availability Groups.

System Center 2012 R2 – Virtual Machine Manager, Service Manager, Operations Manager and Orchestrator does all support AlwaysOn Availability Groups.

Why plan for SQL AlwaysOn Availability Groups when we have the traditional SQL Cluster solution available for High-Availability?

This is a really good question – and also very important as this is the key for realizing the big goal here. AlwaysOn is a high-availability and disaster recovery solution that provides an enterprise-level alternative to database mirroring. The solution maximizes the availability of a set of user databases and supports a failover environment for those selected databases.

Compared to a traditional SQL Cluster – that can also use shared VHDXs, this was a no brainer. A shared VHDX would have given us a headache and increased the complexity with Hyper-V Replica.

SQL AlwaysOn Availability Groups let us use local storage for each VM within the cluster configuration, and enable synchronous replication between the nodes on the selected user databases.

Alright, the SQL discussion is now over, and we proceeded to the fabric design.

In total, we would have several Hyper-V Clusters for different kind of workload, such as:

· Management

· Edge

· IaaS

· DR

Since this was a Greenfield project, we had to deploy everything from scratch.

We started with the Hyper-V Management Cluster and from there we deployed two VM instances in a guest cluster configuration, installed with SQL Server for Always On Availability Groups. Our plan was to put the System Center databases – as well as WAP databases onto this database cluster.

Once we had deployed a Highly-Available SCVMM solution, including a HA library server, we performed the initial configuration on the management cluster nodes.

As stated earlier, this is really a chicken and egg scenario. Since we are working with a cluster here, it’s straightforward to configure the nodes – one at a time, putting one node in maintenance mode, move the workload and repeat the process on the remaining node(s). Our desired state configuration at this point is to deploy the logical switch with its profile settings to all nodes, and later provision more storage and define classifications within the fabric.

The description here is relatively high-level, but to summarize: we do the normal fabric stuff in VMM at this point, and prepare the infrastructure to deploy and configure the remaining hosts and clusters.

For more information around the details about the design, I used the following script that I have made available that turns SCVMM into a fabric controller for Windows Azure Pack and Azure Site Recovery integration:

https://gallery.technet.microsoft.com/SCVMM-Fabric-Controller-a1edf8a7

Once the initial configuration was done, we deployed the NVGRE gateway hosts, DR hosts, Iaas hosts, Windows Azure Pack and the remaining System Center components in order to provide service offerings through the tenant portal.

If you are very keen to know more about this process, I recommend to read our whitepaper which covers this end-to-end:

https://gallery.technet.microsoft.com/Hybrid-Cloud-with-NVGRE-aa6e1e9a

Here’s an overview of the design after the initial configuration:

If we look at this from a different – and perhaps a more traditional perspective, mapping the different layers with each other, we have the following architecture and design of SCVMM, Windows Azure Pack, SPF and our host groups:

So far so good. The design of the stamp was finished and we were ready to proceed with the Azure Site Recovery implementation

Integrating Azure Site Recovery

To be honest, at this point we thought the hardest part of the job was done, such as ensuring HA for all the workloads as well as integrating NVGRE to the environment, spinning up complex VM roles just to improve the tenants and so on and so forth.

We added ASR to the solution and was quite confident that this would work as a charm since we had SQL AlwaysOn as part of the solution.

We soon found out that we had to do some engineering before we could celebrate.

Here’s a description of the issue we encountered.

In the Microsoft Azure portal, you configure ASR and perform the mapping between your management servers and clouds and also the VM networks.

As I described earlier in this blog post, the initial design of Azure Site Recovery in an “Enterprise 2 Enterprise” (on-prem 2 on-prem) scenario, was to leverage two SCVMM management servers. Then the administrator had the opportunity to duplicate the network artifacts (network sites, VLAN, IP pools etc) across sites, ensuring that each VM could be brought online on the secondary site with the same IP configuration as on the primary site.

Sounds quite obvious and really something you would expect, yeah?

Moving away from that design and rather use a single SCVMM management server (a single management server, that is highly-available is not the same as two SCVMM management servers), gave us some challenges.

1) We could (of course) not create the same networking artifacts twice within a single SCVMM management server

2) We could not create an empty logical network and map the primary network with this one. This would throw an error

3) We could not use the primary network as our secondary as well, as this would give the VMs a new IP address from the IP pool

4) Although we could update IP addresses in DNS, the customer required to use the exact IP configuration on the secondary site post failover

Ok, what do we do now?

At that time it felt a bit awkward to say that we were struggling to keep the same IP configuration across sites.

After a few more cups of coffee, it was time to dive into the recovery plans in ASR to look for new opportunities.

A recovery plan groups virtual machines together for the purposes of failover and recovery, and it specifies the order in which groups of VMs should fail over. We were going to create several recovery plans, so that we could easily and logically group different kind of workloads together and perform DR in a trusted way

Here’s how the recovery plan for the entire stamp looks like:

So this recovery plan would power off the VMs in a specific order, perform the failover to the secondary site and then power on the VMs again in a certain order specified by the administrator.

What was interesting for us to see, was that we could leverage our Powershell skills as part of these steps.

Each step can have an associated script and a manual task assigned.

We found out that the first thing we had to do before even shutting down the VMs, was to run a powershell script that would verify that the VMs would be connected to the proper virtual switch in Hyper-V.

Ok, but why?

Another good question. Let me explain.

Once you are replicating a virtual machine using Hyper-V Replica, you have the option to assign an alternative IP address to the replica VM. This is very interesting when you have different networks across your sites so that the VMs can be online and available immediately after a failover.

In this specific customer case, the VLAN(s) were stretched and made available on the secondary site as well, hence the requirement to keep the exact network configuration. In addition, all of the VMs had assigned static IP addresses from the SCVMM IP Pools.

However, since we didn’t do any mapping at the end in the portal, just to avoid the errors and the wrong outcome, we decided to handle this with powershell.

When enabling replication on a virtual machine in this environment, and not mapping to a specific VM network, the replica VM would have the following configuration:

As you can see, we are connected to a certain switch, but the “Failover TCP/IP” checkbox was enabled with no info. You probably know what this means? Yes, the VM will come up with an APIPA configuration. No good.

What we did

We created a powershell script that:

a) Detected the active Replica hosts before failover (using the Hyper-V Powershell API)

b) Ensured that the VM(s) were connected to the right virtual switch on Hyper-V (using the Hyper-V Powershell API)

c) Disabled the Failover TCP/IP settings on every VM

a. Of all of the above were successful, the recovery plan could continue to perform the failover

b. If any of the above were failing, the recovery plan was aborted

For this to work, you have to ensure that the following pre-reqs are met:

· Ensure that you have at least one library server in your SCVMM deployment

· If you have a HA SCVMM server deployment as we had, you also have a remote library share (example: \\fileserver.domain.local\libraryshare ). This is where you store your powershell script (nameofscript.ps1) Then you must configure the share as follow:

a. Open the Registry editor

b. Navigate to HKEY_LOCAL_MACHINE_SOFTWARE\Microsoft\Microsoft System Center Virtual Machine Manager Server\DRAdaper/Registration

c. Edit the value ScriptLibraryPath

d. Place the value as \\fileserver.domain.local\libraryshare\. Specify the full fully qualified domain name (FQDN).

e. Provide permission to the share location

This registry setting will replicate across your SCVMM nodes, so you only have to do this once.

Once the script has been placed in the library and the registry changes are implemented, you can associate the script with one or more tasks within a recovery plan as showed below.

Performing the recovery plan(s) now would ensure that every VM that was part of the plan, was brought up at the recovery site with the same IP configuration as on the primary site.

With this, we had a “single-button” DR solution for the entire management stamp, including Windows Azure Pack and its resource providers.

-kn

Tuesday, November 4, 2014

Sessions from TechEd 2014 Barcelona

If you didn’t have time to attend TechEd in Barcelona, or for some reasons missed an important session, you can now watch them all live on-demand on Channel9.

I had the honor to present during this TechEd, and had two sessions.

Planning & Designing Management Stamps for Windows Azure Pack

A topic that I personally think is very interesting, and work with on a day to day basis when designing real world private and public clouds.

http://channel9.msdn.com/Events/TechEd/Europe/2014/CDP-B327

Microsoft Azure Site Recovery: Leveraging Azure as Your Disaster Recovery Site

Together with a living legend, Manoj Jain, we showed how both enterprises as well as hosting services providers (new) can leverage Azure as their DR site for Hyper-V workloads.

http://channel9.msdn.com/Events/TechEd/Europe/2014/CDP-B314

Hopefully you’ll find them interesting.

See you soon!

Monday, October 20, 2014

Understanding Windows Azure Pack and your service offerings

From time to time, I meet with customers (and also other system integrators) that is not fully aware of the definition of cloud computing.

I never expect people to know this to the very nasty details, but have an overview of the following:

· Deployment models

· Service models

· Essential characteristics

What’s particular interesting when discussing Windows Azure Pack, is that the deployment model that’s relevant, is the private cloud. Yes, we are touching your own datacenter with these bits – the one you are in charge of.

For the service models, we are embracing Infrastructure as a Service (IaaS – using the VM Cloud Resource Provider), and Platform as a Service (PaaS – Using the Web Site Cloud Resource Provider).

The essential characteristics are also very important, as we’ll find elasticity, billing/chargeback, self-service, resource pooling and broad network access.

If you combine just self-service and IaaS, this tells us that we empower our users to deploy virtual machines on their own. Right?

So having the flexibility to provide such service, we also rely on the underlying architecture to support this. Due to scalability (elasticity), we need to ensure that these users constantly have access to the solution – no matter what device they are using (broad network access), we need to find out who is consuming what (billing/chargeback), and last but not least – be able to produce these services in an efficient way that makes it cost effective and profitable (resource pooling).

So, it starting to make sense.

There is a reason for what we are seeing and we are providing these services by abstracting the underlying resources into clouds, plans and subscriptions with the Cloud OS.

Implementing a complete IaaS solutions may bring some obstacles to the table.

Organizations tends to think that IaaS is something they have provided for years. Perhaps they have provided virtual machines, but not a complete IaaS solution.

The reason for that is that IaaS is relying on abstraction at every layer. This is not only about virtual compute (memory, CPU), but also about virtual storage and virtual networking.

This is when it gets interesting, using network virtualization.

Remember that self-service is an essential characteristic of the cloud, right?

So delivering IaaS would also mean that the user is able to do stuff with the networking aspect as well, with no interaction from the service provider/cloud administrator.

This is why Software-Defined Networking (NVGRE) is so essential to this service model, and hence we run into the following obstacles.

· The customer (most often service provider) wants to continue to provide managed services, such as:

o Backup (both crash consistent and app consistent)

o Monitoring (above the operating system level, covering the application stack)

This is what they are doing today, with their infrastructure. But this also has a high cost to operate due to all the manual operations needed and involved to get the wheels moving.

Luckily, Windows Azure Pack is able to cover both scenarios, providing a consistent experience to users/tenants no matter if they are running resources in a “legacy” infrastructure, or a new modern IaaS infrastructure.

The following architecture shows that we are using two Virtual Machine Management Stamps.

Both of these are located behind the SPF endpoint – which present the capabilities, capacity and much more to the service management API in Azure Pack.

A cloud administrator then creates a Hosting Plan in the Admin Portal of Azure Pack, which is associated with the legacy cloud in the legacy VMM server. This plan is available for the users/tenants who are subscribing to managed services.

A new plan is created, associated with the IaaS cloud and the IaaS VMM server, available for the users/tenants that need IaaS, without the requirement of managed services. They are dealing with these themselves.

Hopefully this blog post gave you an overview of what’s possible to achieve using Azure Pack and combine both kind of services using a single solution.

(Want more info? – please join my TechEd session in Barcelona next week).

Thursday, October 9, 2014

The specific IP address is already allocated by the pool - SCVMM

Cleaning up in the Fabric

Every now and then, I see fabric environments where the following have occurred:

· They tried to create their Hyper-V Cluster in VMM

· The process failed at some step (it can be many reasons for this, not necessary that’s VMM fault)

· They go to one of the hosts and create the cluster from there

· They refresh the nodes in VMM and the cluster appear

Now, that is quite common actually, and this works great.

One of the reasons why VMM is complaining a lot more than Failover Cluster Manager, is because VMM has high expectations regarding networking, storage etc.

So what happens when you create the cluster outside of VMM, and at the same time, is so rude and steal an IP address from the IP Pool in VMM?

You will see the following in the job view on a regular basis:

Frustrating. So imagine you have added Hyper-V Replica Broker to that cluster as well, stealing another IP from the pool in VMM? Then this can be noisy.

Workaround

First thing first, find out what IP address VMM is referring to.

(Get-SCStaticIPAddressPool).Name

Find the right name in your environment. I will use “MGMT IP Pool Copenhagen” as I know this is a Hyper-V Cluster in that site.

Next, put that in a variable like this:

$IPPool = Get-SCStaticIPAddressPool –Name “MGMT IP Pool Copenhagen”

See which addresses you have registered:

Get-SCIPAddress –StaticIPAddressPool $IPPool

Once you have detected the IP, it is time to remove it.

Get-SCIPAddress –IPAddress “10.0.0.215” | Revoke-SCIPAddress

The only thing left, is to reserve this IP address in the VMM IP Pool so that VMM will ignore it in the future.

Once this is done, perform a refresh of the cluster object in VMM to verify that it is green and happy.

Monday, September 29, 2014

Deploying Web Site Clouds for Windows Azure Pack

On this blog, you have mainly found (useful?) information around Hyper-V, VMM and Azure Pack when it comes to the private cloud area.

However, Azure Pack gives us a lot more than “just” the VM Cloud Resource Provider, that is the holy grail of Infrastructure as a Service offering in your private cloud.

I have attempted to cover the broader aspect of Azure Pack on this blog, like walking through the different portals, APIs and much more, especially related to the backend VM Cloud services.

Recently, I wrote about “Deploying Service Bus for Windows Azure Pack” - http://kristiannese.blogspot.no/2014/09/deploying-service-bus-for-windows-azure.html that shows the consistency related to PaaS between Microsoft Azure in the public cloud, and Azure Pack in the private cloud.

This blog post – which is focusing on the Web Site Cloud does also falls under the PaaS umbrella in Azure Pack.

Without going into the repeating story around “one consistent platform” and all of that, I assume you know this already, but as an introduction to Web Site Clouds in WAP, we can say the following:

The Web Site Clouds enables an on-premises, high-density, multi-tenant web hosting service for service providers and enterprise IT. The web sites provides an experience similar to Microsoft Azure Web Sites. It is scalable, shared, and secure web hosting platform that supports both template web applications and a broad range of programming languages like ASP.NET, PHP and Node.js.

In addition to a web site service, it includes a self-service management portal (tenant portal), uses both SQL and MySQL database servers, integrates with popular source control systems, and offers a customizable web application gallery of popular open source web applications.

Web Site Cloud

The Web Site Cloud consists of at least 6 server roles.

· Controller – Provisions and manage the other web site roles. This is the first role you install and run the Web Site Cloud setup on

· Management Server – This server exposes a REST endpoint that handles management traffic to the WAP Web Site Management API, and you connect the service management portal to this endpoint

· Front End – Accepts web requests from clients, routes requests to web workers, and return web worker worker responses to clients. Front End servers are responsible for load balancing and SSL termination

· Web Worker – These are web servers that process client web requests. Web workers are either shared or reserved to provide differentiated levels of service to customers. Reserved workers are categorized into small, medium and large sizes.

· File Server – Provides file services for hosting web site content. The file server houses all of the application files for every web site that runds on the web site cloud.

· Publisher – Provides content publishing to the web site farm for FTP clients, Visual Studio and WebMatrix through the Web Deploy and FTP protocols

And as everything else, a SQL server is required for the runtime database.

The roles are separated from, and in addition to, the servers that form the (express or distributed) installation of the service management API (Portals and APIs).

Before you start to install the Web Site Cloud, you must deploy and prepare some VM’s.

Obviously everyone is using VMM today, so here’s a short script that will go ahead and create 6 VMs to be used for the Web Site Cloud in your private cloud infrastructure:

$VMNames =@(1..5)

$Template = Get-SCVMTemplate -VMMServer vmm01 -ID "1ee6ba94-b5fc-49f1-8364-a3d1b2da3f40" | where {$_.Name -eq "GEN2 Template"}

Foreach ($VM in $VMNames)

{

$VM = "webroleblog"+$VM

New-SCVirtualMachine -Name $vm -VMTemplate $template -VMHost "hv03" -Path "c:\clusterstorage\csv01" -HighlyAvailable $true -RunAsynchronously

}

Once this is done, you can start by logging into the controller VM where you will install the Web Site Cloud to deploy and configure the other core roles for this resource provider.

Note: Although it is not a requirement I have found officially, I have experienced some issues if .NET 3.5 is not enabled within the OS’s before we start on this process. Through the Web Site Cloud installer, we will download, install and enable many web server features on each guest through an agent. This one seems to fail randomly if .NET 3.5 is left out)

On the controller, download and install Web Platform Installer version 5.0

Once this is done, start Web Platform installer and search for Web Site.

Add “Windows Azure Pack: Websites V2 Update 3” and install it on your server.

Post the installation, you will be prompted by the well-known configuration page of Azure Pack that let you connect to the SQL server, create the database, file server settings and a lot more.

The following screen shots will give you an understanding of what needs to be configured.

After the setup, we logon to our admin portal to add and configure the rest of the Web Site Cloud resource provider.

Click on Web Site Cloud and connect to your newly created environment.

Select a display name, connect to the management server (https://nameofmanagementVM), and type the credentials you specified during the setup that has access to the REST endpoint.

After the web site cloud is added, you can drill into the configuration and click on ‘Roles’.

This is where you will add the additional and required web site roles you need in order to deliver web sites to your customers.

We have already the following in place:

- Controller

- Management

- File server

What we need to add now, is:

- Publisher

- Front End

- Worker

You simply type the DNS or the IP address of the specific server(s) you want to add, and then Azure Pack will deploy and configure these roles for you, through the controller.

Note: The Web Site Cloud has a default domain that you specify during the setup. In our example, we are using paas.systemcenter365.com - which mean that every website that gets deployed, will have a suffix of website.paas.systemcenter365.com

In order for this to work, you must also create some host records for the following roles:

· Web Deploy DNS: publish.paas.systemcenter365.com

· FTP Deploy DNS: ftp.paas.systemcenter365.com

· Front End: *.paas.systemcenter365.com

Also note that you are able to perform lifecycle management of your web site roles directly from the management portal:

Once the web site cloud is configured with the default settings, you can go ahead and add the web site cloud to an existing Hosting Plan.

You can configure the Web Site offering by clicking on the service within the plan, and edit the existing values. We will leave this alone for now, and head over to the tenant portal to see the interaction.

Now we have the Web Sites available in the tenant portal, since this tenant is subscribing to a hosting plan that contains this offering.

If we click on new, we can choose between quick create, custom create and from gallery.

Once the web site is deployed (regardless of the options listed), we can perform several operations post the deployment from the tenant portal.

From a developer perspective, it is very interesting to see that you can download the publish profile, so that you can leverage TFS to deploy your applications. This is a solid value add-on, in addition to the already exposed tenant public API (for more info, see: http://kristiannese.blogspot.no/2014/06/azure-pack-working-with-tenant-public.html )

What I like the most about the web site cloud is the grade of integration with the other PaaS offerings in Azure Pack, just as you would find in Microsoft Azure.

Depending on the solution and application you create, you can link with SQL, MySQL and Service Bus which is basically everything you need in order to fulfill the PaaS solution you are working on.

Hopefully this gave you an overview of the Web Site Cloud in WAP.

Monday, September 1, 2014

Presenting at TechEd Barcelona 2014 - Windows Azure Pack

Hi everyone.

I just want to inform you that I will be presenting at TechEd in Barcelona in October.

This is truly an honor and I am really looking forward to meet my friends from all around the globe.

I have one session that is titled “Planning and Designing Management Stamps for Windows Azure Pack”.

This session will indeed focus on the underlying stamp that we turn into a resource provider for the VM Cloud in Azure Pack.

Throughout the entire session, I will share best practices, things you would like to know and also things you should already know.

This is where you will get the inside tips on how to design and build a management stamp to serve cloud computing with WAP, designed to scale and be fault tolerant.

In essence, I will be explaining and demonstrating my bread and butter and what I have done the last 12 months.

I really hope to see you there and if you have any questions upfront and would like to have answered during the session, please let me know.

Tuesday, July 29, 2014

New whitepaper - Azure Technologies in the Private Cloud

New whitepaper – Azure Technologies in the Private Cloud

Together with Savision, I am glad to announce that a new whitepaper has been published.

The content is focusing on the Cloud OS – and especially the private cloud enabled by Windows Azure Pack.

If you find this interesting, I suggest that you download it and sign up for one of our free webinars in the near future.

Feedback is highly appreciated.

The whitepaper can be downloaded by following this link:

http://www.savision.com/resources/news/free-whitepaper-azure-technologies-private-cloud-mvp-kristian-nese

Monday, July 28, 2014

Workaround for VM Roles and Storage Classifications

Solving classification for your VM Roles with VMM

Since you are reading this blog post, and hopefully this blog now and then, you are most likely familiar with the concept of Gallery Items in Windows Azure Pack.

Useful Resources

If not, I suggest that you read the following resources: http://kristiannese.blogspot.no/2013/10/windows-azure-gallery-items-getting.html

If you want all the details around everything, please download our “Hybrid Cloud with NVGRE (Cloud OS)” whitepaper that put everything into context.

http://gallery.technet.microsoft.com/Hybrid-Cloud-with-NVGRE-aa6e1e9a

My good friend and fellow MVP – Marc van Eijk, will publish a couple of blog posts at “Bulding Clouds blog”, where he dive into the nasty details around VM Roles. Here’s the link to his first contribution: http://blogs.technet.com/b/privatecloud/archive/2014/07/17/the-windows-azure-pack-vm-role-introduction.aspx

Gallery Items and VM Roles

Before we proceed. Gallery Items brings “VM Roles” into Azure Pack, which reminds you a lot of service templates in VMM in the way that they are able to climb up the stack and embrace applications and services during the deployment. However, a VM Role is not completely similar to a service template in VMM as it has no knowledge of any of the profiles (Hardware Profile, Application Profile, SQL Profile and Guest OS Profile).

This is where it gets tricky.

Gallery Items are designed for Azure and brings consistency to the Cloud OS vision, by letting you create VM Roles through the VMRoleAuthoringTool from Codeplex for both Azure Pack (Private Cloud) and Microsoft Azure (Public Cloud).

The components of a VM Role are:

· Resource Definition file (required – and imported in Azure Pack)

· View Definition file (required – presents the GUI/wizard to the tenants)

· Resource Extension (optional – but required when you want to deploy applications, server roles/features and more to your VM Role

The tool let you create, alter and update all these components and you can read more about the news on this blog post: http://blogs.technet.com/b/scvmm/archive/2014/04/22/update-now-available-for-the-virtual-machine-role-authoring-tool.aspx

So far, in 2014, I have been visiting many customers who are trying to adopt Azure Pack and Gallery Items with VM Roles. They want to provide their tenants with brilliant solutions, that are easy to understand and deploy, and can be serviced easily by the cloud administrators.

However, there are some important things to note prior to embracing the VM Roles in Azure Pack, especially when it comes to storage.

· VM Roles are only using Differential disks

· You can’t benefit from storage classifications associated with your VMM Clouds – and determine where the VHDX’s will be stored

Why are we using Differential disks for VM Roles?

This is a frequently asked question. In the VMM world, we are familiar with the BITS operation during VM deployment. Luckily, fast file copy was introduced with VMM 2012 R2 and we can also leverage ODX for deployment now, so hopefully BITS is nothing that you see very often when deploying VMs anymore.

However, in order to speed things up a bit more, we are using Diff disks for VM Roles. This is because we intend to reduce deployment time and improve performance. Shared bits from the parent VHDX would be served up from cache in most scenarios and new VMs simply create a new diff disk and boot up. This goes for both OS disk and data disks for the VM Role. No file copy need to occur (except for the first VM to require the disk). When you then decide to scale a VM Role in Azure Pack (scale out), the new instance can boot almost immediately and start walking through the setup.

Ok, I understand the decision around Differential disks now, but what about storage classification and where to put these disks?

Since VM Roles in Azure Pack are only linked to the disks in the VMM library (by using tags), we can’t map it towards any of the storage classifications.

Out of the box, there is no way to modify this prior to the deployment.

Tip 1 – Default parent disks path on the hosts

In VMM, navigate to Fabric and click on properties on your hosts in the host groups associated with the cloud used by Hosting Plans in Azure Pack.

Here you can specify the default parent disk paths to be used for the virtual machines (VM Roles).

If you have dedicated shares or CSVs, this might be helpful and can streamline where the VM roles are living.

Tip 2 – Live Storage Migration post deployment

At a customer site earlier this year, we ended up by using Powershell to move the disks around after deployment.

This is something we added to SMA afterwards that was automatically triggered post the create operation of every new VM Role.

Here’s the script:

### Get every VM Role in a specific VMM Cloud used by Azure Pack

$vms = Get-SCVirtualMachine | Where-Object {$_.AvailabilitySetNames -cnotcontains $null -and $_.Cloud -like "Service Provider Cloud"}

### Move the storage to the preferred/dedicated directory

foreach ($vm in $vms)

{

Move-SCVirtualMachine -VM $vm -Path "C:\ClusterStorage\CSV01\" -UseLAN -RunAsynchronously

}

As you can see, we are querying virtual machines that has an availability set associated. Each and every time you deploy a VM Role with Azure Pack, the underlying cloud resource in VMM gets an availability set to ensure that when you scale out the VM Role, the workloads are spread on different Hyper-V nodes in a Cluster (assuming you are using a Hyper-V Cluster for your workloads).

That’s it, and hopefully this gave you some ideas and more information around VM Roles in Azure Pack.

Monday, April 14, 2014

Co-existence of WAP and HRM - Fixed!

Co-existence of WAP and HRM is now fixed!

http://support.microsoft.com/kb/2936967/en-us

Microsoft has released an Update Rollup (1) for Hyper-V Recovery Manager.

This update will eliminate the needs for the “Hyper-V Cloud Capability Profile” associated with the HW profiles on the VMs, that also must be present in the cloud in VMM.

Why do we get an Update Rollup that specifically removes this requirement?

When testing HRM and Windows Azure Pack together, we saw that the combination of this capability profile together with Gallery Items (VM Roles with SPF/VMM as resource provider in a VM Cloud) didn’t work very well together.

If you are familiar with VM roles, then you know that they are not associated with any hardware profiles, but uses only resdefpkg (imported in WAP) and resextpkg (imported in VMM). These artifacts will use their own settings and only bind to physical objects in the VMM library like VHD’s, scripts etc.

In other words: the deployment of a VM Role to a cloud with any cloud capability profile associated, would fail. Therefore, you could not have a cloud configured in VMM that could be used by both WAP and HRM.

This is now fixed and you must install the update on your VMM management server (where the HRM provider is installed) and perform a restart.

Thursday, August 4, 2011

Troubleshooting SCVMM crashes (collect traces)

I am currently testing the SCVMM 2012 BETA, and I want to share how you collect SCVMM traces.

(This applies to both SCVMM 2008 R2 and SCVMM 2012)

I had an issue that when I added a WSUS server to the Fabric, the VMMservice crashed. It only crashed when it was properly configured. That means, if I configured the wrong TCP-port for the WSUS connection, I got a correct error message. But when I hit the high notes, it crashed with no mercy.

The recipe that Carmen Summers (the Program Manager of SCVMM) has made available:

From what computer should I collect the traces:

If it`s a console crash issue.

· Please collect the traces from both the computer where you run admin console, and your VMM Server.

If it’s an “Add Hosts” issue,

· Please collect the traces from the VMM server;

If it’s a host status (Needs Attention) or VM issue,

· Please collect the traces from both the VMM server and the host in question.

If it's self-service portal issue,

· Please collect the traces from the Web server and the VMM server

What are the steps to collect traces?

Install DebugView from http://www.microsoft.com/technet/sysinternals/utilities/debugview.mspx on your VMM server, your host in question and your Web server (if it's a self-service portal issue).

Save the following code into a text file and name it as "odsflags.cmd":

@echo off
echo ODS control flags - only trace with set flags will go to ODS

if (%1)==() goto :HELP
if (%1)==(-?) goto :HELP
if (%1)==(/?) goto :HELP

echo Setting flag to %1...
reg ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine" /v ODSFLAGS /t REG_DWORD /d %1 /f
echo Done.
goto :EXIT

:HELP
echo Usage: odsflags [flag], where flag is
echo TRACE_ERROR = 0x2,
echo TRACE_DBG_NORMAL = 0x4,
echo TRACE_DBG_VERBOSE = 0x8,
echo TRACE_PERF = 0x10,
echo TRACE_TEST_INFO = 0x20,
echo TRACE_TEST_WARNING = 0x40,
echo TRACE_TEST_ERROR = 0x80,

:EXIT

· Save the following code into a text file and name it as "odson.reg":

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine]
"ODS"=dword:00000001

· Save the following code into a text file and name it as "odsoff.reg":

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine]
"ODS"=dword:00000000

· Copy the above three files onto your VMM server, your host in question and your Web server (if it's a self-service portal issue).

· In a command window on the machine that you want to capture VMM tracing, run “odson.reg” and “odsflags.cmd 255”. (If you need to collect traces for both VMM Server and the host or the Web server, make sure to run these commands on all computers.)

· Open DebugView and run it as administrator, make sure that in its Capture menu, you have both "Capture Win32" and "Capture Global Win32" checked. You should be able to see tracing from the VMM components showing up in the DebugView. (If you need to collect traces for both VMM Server and the host, make sure to do these steps on all computers.)

· Restart vmmservice on VMM server with “net stop vmmservice” and “net start vmmservice”.

· Restart the agent service on the host with “net stop vmmagent” and “net start vmmagent”.

· Restart the IIS service on the Web server with "iisreset".

· Reproduce the issue that you found.

· Save the output from the DebugView to a text file and email it to the people who can help you diagnose the issue.

· Don't forget to turn off the tracing after you are done collecting by running "odsoff.reg" on the machine

EXAMPLE:

In my case, where the VMMservice crashed when I added the WSUS server, I was able to locate the following in the VMM.LOG afterwards:

00004729 77.38172150 [5092] 13E4.0868::07/27-21:26:31.413#04:UpdateServer.cs(265): Adding Update Server to Pangaea, ServerName - VMM.lab.local, Port - 8530, SSLEnabled - False

00004730 77.38208008 [3668] 0E54.0AD4::07/27-21:26:31.411#21:Callback.cs(53): Client uuid:a3258eb4-18cf-4f58-811f-0692c049677e;id=1 - events processed

00004731 77.48566437 [3668] 0E54.0AD0::07/27-21:26:31.523#24:ConsoleViewModel.cs(294): UI Load: ConsoleViewModel completed AddPage for Jobs - 00:00:00.1359864

00004732 77.48571014 [3668] 0E54.0AD0::07/27-21:26:31.523#24:ConsoleViewModel.cs(303): ConsoleViewModel begin OnClientCacheInitialized

00004733 77.66319275 [3668] 0E54.0AD0::07/27-21:26:31.700#24:ConsoleViewModel.cs(329): UI Load: ConsoleViewModel completed OnClientCacheInitialized - 00:00:00.1769823

00004734 77.94854736 [432]

00004735 77.94854736 [432] *** HR originated: -2147024774

00004736 77.94854736 [432] *** Source File: d:\iso_whid\amd64fre\base\isolation\com\copyout.cpp, line 1302

00004737 77.94854736 [432]

00004738 77.94861603 [432]

00004739 77.94861603 [432] *** HR propagated: -2147024774

00004740 77.94861603 [432] *** Source File: d:\iso_whid\amd64fre\base\isolation\com\enumidentityattribute.cpp, line 144

00004741 77.94861603 [432]

00004742 77.94880676 [432]

00004743 77.94880676 [432] *** HR originated: -2147024774

00004744 77.94880676 [432] *** Source File: d:\iso_whid\amd64fre\base\isolation\com\copyout.cpp, line 1302

00004745 77.94880676 [432]

00004746 77.94882202 [432]

00004747 77.94882202 [432] *** HR propagated: -2147024774

00004748 77.94882202 [432] *** Source File: d:\iso_whid\amd64fre\base\isolation\com\enumidentityattribute.cpp, line 144

00004749 77.94882202 [432]

00004750 77.94924927 [432]

00004751 77.94924927 [432] *** HR originated: -2147024774

00004752 77.94924927 [432] *** Source File: d:\iso_whid\amd64fre\base\isolation\com\copyout.cpp, line 1302

00004753 77.94924927 [432]

00004754 77.94928741 [432]

00004755 77.94928741 [432] *** HR propagated: -2147024774

00004756 77.94928741 [432] *** Source File: d:\iso_whid\amd64fre\base\isolation\com\enumidentityattribute.cpp, line 144

00004757 77.94928741 [432]

00004758 82.27567291 [5092] 13E4.0868::07/27-21:26:36.311#04:WatsonExceptionReport.cs(756): Unhandled exception caught.

00004759 82.27619934 [5092] 13E4.0868::07/27-21:26:36.312#04:WatsonExceptionReport.cs(757): Unhandled exception.

00004760 82.27857208 [5092] 13E4.0868::07/27-21:26:36.314#04:WatsonExceptionReport.cs(757): System.ArgumentOutOfRangeException: An attempt was made to access an invalid or unsupported language.

The last line indicates that this is an issue caused by my regional settings on my servers. Since this is a Beta, there is no support for non US regional settings.

Virtualization and some coffee