Thursday, April 30, 2015

VM Checkpoints in Windows Azure Pack

Fresh from the factory, Update Rollup 6 has been released and shipped by Microsoft.

This isn’t a blog post that will point out all the bug fixes and the amazing work all of the teams has been doing, but rather point you towards a highly requested feature, that finally made its way to the tenant portal in Windows Azure Pack.

With Update Rollup 6, we now supports creation and restore of Hyper-V checkpoints on virtual machines, provided by the VM Cloud Resource Provider.

Tenants that have deployed virtual machines may now create checkpoints and restore them on their own, without any interaction from the cloud provider.

Let us have a closer look at how this actually works, how to configure it and what additional steps you might want to take as part of this implementation.

Enabling create, view and restore of virtual machine checkpoints at the Hosting Plan level

Once the UR6 is installed for WAP and the underlying resource provider, you will notice some changes in the admin portal.

First, navigate to a Hosting Plan of yours – that contains the VM Cloud Resource Provider.
When you scroll down, you can see that we have some settings related to checkpoints.

Create, view and restore virtual machine checkpoints – will let the tenants that has subscriptions based on this hosting plan, be able to perform these actions on their virtual machines.

View and restore virtual machine checkpoints – let the tenants view and restore virtual machine checkpoints, but not create them. This can for example be performed by the cloud provider on an agreed schedule.

When you enable either of these options, an update job is taking place at the plan level and communicates the changes back to VMM, ensuring that the tenants will have permissions to take these actions in the tenant portal once it has completed.

If we switch over to the tenant portal, we can see that when we drill into one of the existing VMs (click on the VMàDashboard) we have some new actions available.

If you would manage checkpoints for your VM Roles, you can of course do that too, but you then have to drill into each specific instance, as the VM role potentially can have multiple instances when supporting scale-out.

To create a new checkpoint, simply click on Checkpoint and type the name of the checkpoint and eventually a description.

If we switch back to the fabric and VMM, we can see that a VMM job has completed with details about the checkpoint process for this specific tenant, with the name and description we typed.

If we would like to perform the same operation again, creating an additional checkpoint on the same virtual machine, we get a message telling us that the existing checkpoint will be deleted.

This is because that the current checkpoint integration in WAP will only keep one checkpoint, and avoid the scenario where you could potentially have a long chain of differential disks.

When we create the second checkpoint, we can switch back to VMM to see what’s actually happening:

First, a new checkpoint is created.
Second, the previous checkpoint is deleted.

When we explore the checkpoints settings on the VM itself afterwards, we see that we only have the latest checkpoint listed.

Regarding the restore process, we can also perform this from the same view in the tenant portal.
Once you click on the restore button, the tenant portal will show you the metadata of the available checkpoint, such as name, description and when it was created. Once you click the confirm button, the restore process will start in VMM.

Now what?

If you are familiar with how checkpoints in Hyper-V works, then you know that each static disk will  be either .vhd or .vhdx – depending on the format you are using (.vhdx was introduced with Windows Server 2012 and should be the preferred format, but Azure is still using .vhd).
Once you create a checkpoint, a new disk (.avhd or .avhdx) will be created– a differential disk, containing all the new write operations, while read operations will occur on both the parent disk (vhdx) and the newly created differential disk. 

To summarize, this might not be an ideal situation when it comes to performance, life-cycle management and storage optimization.

Since we don’t have any action in the tenant portal to perform a delete operation, this can be scary in some scenarios.
The fact that the VM will always run on a checkpoint once a checkpoint is created, means you will always be able to restore to your latest checkpoint from the portal.

In order to solve this challenge, we can leverage the integration of Service Management Automation in Azure Pack.
One of the best things with Azure Pack and the VM Cloud resource provider, is that we can extend it and create valued added solutions and services by linking certain actions happening in the tenant portal, to automated tasks that are executed by a SMA runbook in the backend.

The following screenshot shows that there’s an event related to creation of VMM Checkpoints performed by the tenant, which can easily be linked to a runbook.

Here’s an example of a runbook that will check for checkpoints created on VMs belonging to a specific VMM Cloud that is used in a Hosting Plan in WAP. If there’s any checkpoints that exists, they will be deleted and the VMs will have their disks merged back to a static disk (.vhd/.vhdx).
Wokflow to check for - and eventually delete old VM checkpoints

 workflow delete-scvmcheckpoint  
    # Connection to access VMM server. 
    $VmmConnection = Get-AutomationConnection -Name 'SCVMM'  
    $VmmServerName = $VmmConnection.ComputerName  

 # Import VMM module. 
Import-Module virtualmachinemanager  
# Connect to VMM server. 
Get-SCVMMServer -ComputerName $Using:VmmServerName  

$vms = Get-SCVirtualMachine | Where-Object {$_.Cloud -like "*Copenhagen IaaS*" -and  $_.VMCheckpoints }
       foreach ($vm in $vms)
Get-SCVMCheckpoint -VM $vm | Remove-SCVMCheckpoint -RunAsynchronously

}-PSComputerName $VmmServerName -PSCredential $VmmCredential 

This simple code can so be added to a schedule that will execute this runbook on a daily basis – as an example, ensuring that no VMs in the cloud will run on a checkpoint on a long term.

Thanks for reading!

No comments: