Many VMware customers have virtualized their ROBO (Remote Office Branch Office) offices in order to reap the benefits provided by VMware Infrastructure such as hardware cost savings, business continuity and high availability, and lower maintenance costs. Because of ROI considerations and the desire to keep management of ESX hosts and virtual machines centralized, many of these customers choose to keep one VirtualCenter server and configure it to manage ESX hosts over the WAN.
This practice has been used by large enterprises who have ESX Servers distributed over large geographical distances.
VCPs and IT Architects planning to manage distributed VMware Infrastructure where VirtualCenter is accessed over a wide area network in a remote location.
1.1 Challenge
1.1.1 WAN Performance of VC commands
1.1.2 Recommendations
2 Customer Scenarios
2.1 ABC International
2.1.1 VirtualCenter Server Configuration
2.1.2 Workloads
2.1.3 Proven Practices
2.1.3.1 Administrative Practice
2.1.3.2 Remote Console Access Practice
2.1.3.3 Storage Practice
2.1.3.4 Remote Patching Practice
2.1.3.5 Remote Provisioning Practice
2.1.3.6 Remote Monitoring Practice
2.1.3.7 Business Continuity Practice
2.1.4 Summary
Proven Practice: ROBO - Managing Remote ESX Hosts Over WAN with VirtualCenter |
|---|
Many VMware customers have virtualized their ROBO (Remote Office Branch Office) offices in order to reap the benefits provided by VMware Infrastructure such as hardware cost savings, business continuity and high availability, and lower maintenance costs. Because of ROI considerations and the desire to keep management of ESX hosts and virtual machines centralized, many of these customers choose to keep one VirtualCenter server and configure it to manage ESX hosts over the WAN.
Note: we use ‘site' and ‘datacenter' interchangeably in this document.
1.1 Challenge |
|---|
WAN is characterized as low-speed and high latency. Using a VirtualCenter (VC) server to manage ESX hosts over the WAN poses some challenges in sustaining practical response times. Some VC commands are carried out primarily on the host side and some may require moderate to high interactions between ESX hosts and the VC Server. The latter are the ones that will produce visible impact to response times for VMware Infrastructure (VI) administrators.
VMware has gathered some WAN performance data of VC commands and we will present them in the next section.
1.1.1 WAN Performance of VC commands |
In the performance benchmarks, we examined the VC commands below:
Add ESX host
Remove ESX host
Browse datastore
Add virtual machine
Clone virtual machine
Power on virtual machine
Activate virtual machine console
Power off virtual machine
Remove virtual machine
Power on multiple virtual machines
Power off multiple virtual machines
Change focus to ESX host
The performance of VC commands listed above is examined with the following network configurations:
Bandwidth | Delay (RTT) | Error Rate | Other Traffic |
56 kbps | 200-300 ms | 0.05%, 0.1% | Up to 60% |
128 kbps | 200-300 ms | 0.05%, 0.1% | Up to 60% |
192 kbps | 500-650 ms | 0.05% | Up to 50% |
256 kbps | 150-250 ms | 0.05% | Up to 50% |
From our performance studies, we have divided the VC commands according to the extent that low speed, high latency links impact them. See table below:
Low to moderate impact | High Impact |
Remove ESX host | Add ESX host |
Clone virtual machine | Activate virtual machine console |
Power on virtual machine | Change focus to an ESX host |
Power off virtual machine | |
Browse datastore | |
Add virtual machine | |
Remove virtual machine | |
Power on multiple virtual machines | |
Power off multiple virtual machines |
Low impact: minor added delay and a small cosmetic effect on progress bar.
Moderate Impact: added delay, but usability only slightly impacted.
High Impact: showing problems such as command time-out, VC freeze for a few seconds, and highly impacted usability.
VC commands that demonstrate low to moderate impact on WAN links can be used more liberally, despite the network properties. Yet VC commands that show high impact have requirements on network properties for them to sustain quality in usability and response times. The following table examines each of the High Impact VC commands in detail and suggests known workarounds.
High Impact VC commands | Networking that cause high impact | Symptoms | Known Workarounds |
Add ESX host | <= 128 kbps |
|
|
Activate Virtual Machine (VM) Console | <= 256 kbps, delay> 100 ms |
|
|
Change focus to an ESX host | <= 128kbps, delay < 150 ms |
|
|
1.1.2 Recommendations |
From the benchmark results presented above, we have come to the following recommendations:
Do not use links <= 56 kbps. 128 kbps is generally useable
Manually upgrade vCenter Agent (vpxa) on the VMware ESX host before adding ESX to VC
Set heartbeat timeout to a higher value to reduce host disconnects
Reduce the size and color depth (number of colors) on all affected VM consoles to smaller values, e.g. 800x600/16-bit.
Use RDP to connect to Windows VM
Use ssh or putty to connect to Unix VM Use
2 Customer Scenarios |
|---|
VMware customers have deployed VMware Infrastructure in distributed environments and sustained practical VirtualCenter Server management of ESX hosts over the WAN. In this section, we will present a customer scenario for your reference. Note that we will constantly add more customer scenarios in this document and also more details on existing customer scenarios. The updates can be expected on a monthly basis.
If you are interested in providing some data to help us understand your ROBO practices, you can fill out a survey at
http://www.surveymethods.com/EndUser.aspx?AF8BE7FFABE8FCF8.
2.1 ABC International |
|---|
ABC International is a company with branches that are distributed around the world. Its IT environment has the following characteristics:
1. Datacenters are distributed in 4 main geographical areas (U.S., Europe, Asia, and Middle East).
2. Each geographical area is linked to the site hosting VirtualCenter server via network links of similar characteristics. See ABC International Network Architecture diagram below. There are 4 types of network links:
OC12 (622 Mbps)
T3 (43.3 Mbps)
T1 (1.5 Mbps), average latency ~ 169 ms
Satellite (1-6Mbps), latency between 650 - 700 ms
3. Minimum or no IT coverage on remote sites.
4. Maximum number of ESX hosts per site is 32; minimum number of ESX hosts per site is 2. The majority of remote sites have only 2 ESX hosts each.
5. The growth rate of ESX hosts is 5% per year.
6. The growth rate of virtual machines is 5 to 10% per year.
2.1.1 VirtualCenter Server Configuration |
ABC International has installed one VirtualCenter Server in its California datacenter. (Refer to ABC International Network Architecture diagram.)
The VC Server coexists on the same hardware as the VC database server. The hardware specification is as below:
DL 360 G5
2 way dual core
6 GB memory
The software specification is as below:
Operating system: Windows Server 2003
VC Server 2.5
Database server: SQL 2005 with full recovery. The database data is nightly dumped to a CIFS share that is replicated to the datacenter Massachusetts.
2.1.2 Workloads |
ABC International runs the following workloads in their remote sites:
Domain Controllers
File/Print Servers
Others
2.1.3 Proven Practices |
In this section, we will present the VC management practices that have been proven at ABC International. These practices are divided into several categories: Administrative Practice, Remote Console Access Practice, Storage Practice, Remote Patching Practice, Remote Provisioning Practices, Remote Monitoring Practice and Business Continuity Practice.
2.1.3.1 Administrative Practice
Small number of administrators. ABC International has 15 IT administrators. Out of the 15 VI administrators, they have 4 core VI administrators who are all based in the United States (California and Massachusetts). Others are Application administrators who log on when monitoring application performance.
Privileges. ABC International sets the VC permissions at datacenter object or host object level. Other customers should consider setting permissions also at the virtual machine level.
2.1.3.2 Remote Console Access Practice
Use RDP. VI administrators at ABC International use RDP clients to access RDP capable virtual machines hosted at remote sites. Alternatively consider providing an RDP destination machine that has a VI Client installed which is on a local LAN to the virtual machines you're trying to reach. This way you can use the VI Client and the virtual machine Console to access RDP and non-RDP accessible virtual machines effectively.
2.1.3.3 Storage Practice
Image storage replication. ABC International stores virtual machine templates, installation (iso) files, host images and virtual machine images in a NFS share on their storage appliance that has the capability to do remote incremental replication (only the deltas are replicated). This NFS share is located in California and is replicated to all other sites. This practice is useful for remote provisioning, for example.
2.1.3.4 Remote Patching Practice
Stage locally, then replicate. When patches are available for ESX Server or virtual machines (e.g. guest OS updates, application software updates), the patches are applied to the host image(s) or the virtual machine image(s) located in California. These patches are tested before putting in the NFS share mentioned in section 2.1.3.3 for replication to other sites. Since the replication frequency is every 24 hours, the image lag time on any remote site will at most be 24 hours.
Using the replicated images, the VI administrators then ‘refresh' the ESX host(s) or the virtual machine(s).
2.1.3.5 Remote Provisioning Practice
Stage locally, then replicate. ABC International has a remote provisioning practice that is similar to their Remote Patching Practice. The images are created locally in California and replicated via storage replication to remote sites for provisioning. See section 2.1.3.3 and 2.1.3.4 for more details.
2.1.3.6 Remote Monitoring Practice
Leverage VC for performance and uptime monitoring. ABC International leverages VirtualCenter for the performance data associated with ESX host(s) and virtual machine(s). The graph rendering speed could be slow, especially when the ESX hosts are connected via the 1-6 Mbps satellite link; yet the overall experience was still practical for ABC International.
Besides VirtualCenter, ABC International also leverages vCharterPro by Vizioncore for virtual machine performance monitoring.
2.1.3.7 Business Continuity Practice
Leverage VMware DRS and HA. ABC International leverages VMware Infrastructure services such as DRS and HA to improve the availability of their virtual machines (and workloads). The availability of 1GE network links allows ABC International to leverage full DRS and HA automation at every site.
2.1.4 Summary |
|---|
ABC International has an IT environment that is distributed geographically over the world, including the Unites States, Europe, Asia and Middle East. The company is a relatively large VMware shop, with more than 100 ESX hosts and around 900 virtual machines deployed in its datacenters. When it comes to managing these ESX hosts and virtual machines by a single VirtualCenter Server, the IT administrators have a challenge with the relatively slower links (slowest link as 1 Mbps with 650 ms latency) to the remote sites.
This challenge is met by the IT administrators through the following proven practices:
Administrative Practice:
Small administrator number: 4 core VI administrators and 11 application administrators
Remote Console Access Practice:
Use RDP instead of virtual machine console in VMware Infrastructure Client.
Storage Practice
Image storage replication: replicate the NFS share, which is used to store virtual machine templates, installation (iso) files, host images and virtual machine images, to all remote sites.
Nightly dump the VirtualCenter database to a CIFS share that is replicated from California to Massachusetts.
Remote Patching Practice
Stage patched virtual machines or ESX hosts locally and have the images replicated to remote sites for refreshing.
Remote Provisioning Practice
Stage patched virtual machines or ESX hosts locally and have the images replicated to remote sites for provisioning.
Business Continuity Practice
Leverage DRS and HA clusters within each datacenter.
Resources
AuthorDesmond Chan, dchan@vmware.com
DisclaimerYou use this proven practice at your discretion. VMware and the author do not guarantee any results from the use of this proven practice. This proven practice is provided on an as-is basis and is for demonstration purposes only. |
This is great, we have a single VC in London controlling VI in Zurich, Abingdon and two sites in London, this helps me understand the occasional issues we have experienced and will help me plan for challenges in the event of a WAN failover to IPSEC.
This is quite interesting, but I'm missing information on how much bandwidth the statistical information consumes. Are there any figures available ? This might heavily depend on the number of VMs and hosts and the level of detail. And in my opinion this has a bigger impact on the decision to use a central VC installation or not.