Sunday, January 4, 2009

Hardware-Assisted Virtualization

"If you're booing now, all I can tell you is it's gonna get worse." Those were the words of comedian D.L. Hughley after someone booed his first joke during a performance. Interestingly enough, Hughley's response got me thinking about a particular set of problems associated with hardware-assisted virtualization.

Now that x86 virtualization is a mainstream element of the production infrastructure, management issues that previously were not considered are now beginning to pop up. Over the next several issues, I'll use this column to address some of the boos I'm hearing with regards to virtualization management. Although developing tools are starting to ease the management burden, emerging technologies -- hardware-assisted virtualization, single- and multiroot I/O virtualization and storage virtualization -- threaten to make virtualization's underlying management hardships even worse. This month I'm going to start by examining the new challenges presented by hardware-assisted virtualization.

Hardware-assisted virtualization was introduced by AMD and Intel a couple of years ago. It's known as AMD Virtualization and Intel Virtualization Technology, respectively, and is required by select hypervisors, namely Xen and Hyper-V. VMware did not latch onto hardware-assisted virtualization in its first iteration because VMware's binary translation technology -- which provides the trapping and emulation needed for privileged-mode CPU instructions in the virtual machine (VM) guest -- could outperform what both companies could do on the bare metal.

Hardware-Assist Architecture
One of the core elements of first-generation hardware-assisted virtualization was the creation of a new layer in the x86 CPU ring architecture, known as Ring -1. With hardware-assisted virtualization, hypervisors that support the technology could load at Ring -1 and guest OSes could access the CPU at Ring 0, just as they normally would when running on a physical host. So VM guest OSes could be virtualized without any required modifications to the guest OS. Previously, paravirtualization of the guest OS kernel-adopted by the major Linux vendors-was used to overcome performance latency associated with privileged CPU instruction trapping and emulation.

All was good in the hardware-assisted virtualization universe when the technology was initially shipped. Organizations deployed it and the technology worked as expected on Xen and Hyper-V hypervisors. VMware didn't start to adopt hardware-assisted virtualization until the summer 2008 release of ESX Server 3.5 Update 2, which officially supported AMD's second-generation hardware-assisted virtualization features such as hardware-assisted memory virtualization, known as Rapid Virtualization Indexing (RVI) and sometimes referred to as nested paging. AMD RVI has yielded substantial performance improvements for multithreaded enterprise applications such as Exchange, Oracle and XenApp. An equivalent feature from Intel, known as Extended Page Tables, is expected to ship in early 2009.

No Mixing and Matching
The presence of multiple hardware-assisted virtualization generations in the same physical host cluster can cause significant mobility problems, such as the inability to live migrate a VM from one physical host to another. Mixing CPU generations in the same physical cluster is problematic because the hypervisor doesn't emulate the CPU, so the CPU seen by the hypervisor is presented to the VM guest OS. The lack of emulation is important as it allows applications within the VM's guest OS to take advantage of low-level CPU features. However, mixing CPU generations in the same physical cluster could cause unforeseen problems such as the inability to live migrate VMs or having to reactivate an application whose activation is bound to a particular CPU type.

Intel and AMD thought of the potential problems caused by mixed CPU generations within the same physical cluster and developed Extended Migration (AMD) and Flex Migration (Intel). Extended Migration and Flex Migration allow the hypervisor to mask the underlying physical CPU and present it to the VM guest OS as an earlier CPU generation. In essence, this allows different CPU generations to reside in the same physical cluster. But there is a tradeoff: The cluster hardware's CPU features run at the lowest common denominator. Note that Extended Migration and Flex Migration do not provide CPU interoperability, so you still must commit to AMD or Intel in any given cluster; mixing Intel and AMD together in the same physical cluster is not permitted by any hypervisor.

Now let's assume that a particular ESX cluster is undergoing a hardware refresh or that budget restrictions are forcing you to scale out a hypervisor cluster over the period of a year. In either case, you may be faced with having multiple CPU generations reside within the cluster. In this situation, the first question to be addressed is: "Does my hypervisor support Extended or Flex Migration?" ESX Server 3.5 Update 2 is one of the few hypervisors that supports this feature; however, you'll need to enable CPU masking on each VM in the cluster. You can do this with the VMware Infrastructure client by accessing a VM's Properties, clicking the Options tab and then clicking the "Hide the Nx flag from guest" radio button.

Of course, enabling CPU ID masking per VM can be an arduous task, to say the least, and VMware has offered a scripted solution to automate the process on a large scale.

Feature Disablement
Now let's look at another potential management problem with hardware-assisted virtualization-automatic feature disablement. AMD's RVI has shown substantial performance improvements for many applications, while some applications will perform better without the feature enabled. That being said, RVI can be enabled or disabled on a per-VM basis.

Figure 1
[Click on image for larger view.]
Figure 1. Enabling CPU ID masking on a VMware ESX Server VM.

Now here's the problem. Assume that a VM with RVI enabled is started at the organization's disaster recovery site following a major outage. The hardware at the recovery site doesn't support RVI, the difference is detected by the hypervisor, and RVI is automatically disabled when the VM starts at the recovery site. Sure, the application may run slower without RVI, but at least it's still available.

The catch is that while the hypervisor's default RVI handling allowed the VM to run at the recovery site in spite of the difference in hardware features, what will happen when the VM fails back to the original production site? You guessed it. RVI will remain disabled. So to use RVI again, you'd need to manually re-enable it, or have your fail-back procedures involve restoring the original production VM configuration file -- the .VMX file -- for each VM from backup. So while the default hypervisor behavior is designed to get you through the disaster, it may also create some performance problems once you recover VMs back to the original source site, or to a new site using new hardware.

RVI is just a single example. Many new hardware-assisted virtualization features are on the horizon, and hypervisors will continue to have to deal with interoperability issues by disabling features that are not available to all physical nodes in a cluster. Hypervisor-management tools will also need the intelligence to alert administrators when particular features can't be used due to limitations in cluster hardware. Site failover tools such as VMware's Site Recovery Manager will need similar capabilities. I'd prefer to know that I will lose a certain performance feature before a failure occurs, rather than leaving it up to the hypervisor to automatically disable that feature for me.

Hypervisor vendor hardware-assisted virtualization implementations are beginning to mature along with AMD's and Intel's offerings in the space. As these new features emerge and become a part of your virtual infrastructure, you're going to need to look at your deployment processes to ensure that applications take advantage of features such as RVI when it makes sense. You'll also need to ensure that your organization's site failover and fail-back procedures take hardware platform differences-and how the hypervisor responds to those differences-into account. Of course, I haven't even mentioned the issues with moving a VM to "the cloud." While treating a service provider's physical infrastructure as a cloud sounds great on paper, differences in hardware-assisted virtualization make that notion impossible. Instead, VMs are going to need to use standards like open virtualization format to advertise their hardware-assisted virtualization requirements to the cloud provider.

Hardware-assisted virtualization is an important technology. Moving forward, it will allow us to virtualize applications whose workloads made the thought of virtualizing them laughable. Still, new management challenges brought about by hardware-assisted virtualization are no laughing matter, and ignoring these challenges may lead to boos from upper management, something no IT administrator wants to hear.

Source :