VI3.Blueprint Capacity Workshop

VERSION 2 Published

Created on: Mar 24, 2009 10:54 AM by Steve Chambers - Last Modified:  Mar 25, 2009 8:11 AM by Steve Chambers

Introduction

 

Capacity Management is now a must-have capability for VMware Infrastructure, not a "nice to have" or "I'll do it later". VMware Infrastructure builds software mainframes, and mainframes have a lot of compute resource that must be well managed to:

 

  • Ensure the Enterprise receives ROI for their IT resources

  • Ensure capacity levels support established service level targets – Quality of Service

  • Ensure capacity is forecasted based on business events

 

The first step to efficient VI3 and effective capacity management is to bring together the VMware experts and Capacity Management experts: this workshop starts that process.

 

The outcome of this workshop is (a) a join between VMware and capacity management, in terms of understanding, and (b) a list of design decisions/research to make sure VI3 Capacity Management is fit for purpose.

 

Intended Audience

 

VMware Certified Professionals (VCPs) and Capacity Management professionals.

 

Outline

 

  • Key Metrics for VI3

  • Increasing Efficiency

  • Intelligent Reporting

  • Intelligent Trending / Forecasting

  • Intelligent Modeling

  • ITIL Capacity Management

  • 7-point action plan

 

1. Key Metrics for VI3

 

 

This part of the workshop explores which metrics to use, monitor, alert and the reasons why.

 

 

Tool Requirements

 

Capacity data for VI3 is spread around multiple components, so you need an efficient and effective way to centralize this intelligently into a capacity management database. The tool to do this needs to have the following characteristics:

 

  • Data capture/collect/storage

    • Simple, consistent data retrieval

    • Scalable, accessible database

**Auto-manage data (aggregate, copy, delete)

  • Automated reporting

    • How it was yesterday / last week/ this year

    • How it might be if things carry on the same way

    • Create reports in HTML, Word, Excel, PDF,…

  • Hands-on tool needed

  • Capacity data lives on different levels and you need a tool to bring all the data together

 

2. Increasing Efficiency

 

 

Capacity management is about increasing the efficiency of your VI3 by providing the right resources to the right workloads at the right time - and NOT over-provisioning (hurts ROI), and NOT under-provisioning (hurts service levels).

 

Designing for efficiency

 

There are capacity management influences on the design and implementation of VI within Enterprise

 

  • Allocate VMs just 1 vCPU vs. multiple vCPU’s by default - only use SMP (2vCPU and 4vCPU) if you have data to back up that decision.

  • Transparent Page Sharing

  • Rogue and Under-Utilized VMs

  • Identify VMs that are idle (< 10% CPU?) and/or rogue (administrator created on a Friday afternoon, but it is not really needed so it can be turned off, archived and probably deleted).

 

CPU efficiency

 

See how unefficient 2vCPUs can be in the following diagram:

 

viops_metron_cpu_ready.png

Figure 1 - %READY time increase on a 2vCPU guest where there is contention for CPU

 

Figure 1 illustrates the potential for blocking work on a guest system that has more than one virtual CPU. Guest VM1GBVIF021 (the area graph) has two VCPU's allocated to it, but is spending much more time wanting to run but being unable to do so than VM1GBVAP199 (the bar graph) which has only 1 VCPU. This effect is magnified even further with a 4-way guest.

 

Memory efficiency

 

In Figure 2 there are some memory statistics from a single guest system using VMware's Transparent Page Sharing (TPS) system which significantly increases the efficiency of VMs:Hosts, allowing for an even greater density of VMs on a single host. From the first column we can see that it has been granted use of 768 MB of memory.

 

viops_metron_tps.png

Figure 2 - Guest memory sharing

 

It’s using about 53 MB for its own things, which is pretty small. For ESX to support this VM is costing about 70 MB of memory, again fairly small.

 

Now look at the memory shared between VMs - about 272 MB. So without the transparent page sharing, EVERY VM on this host would want another 272 MB of real memory to be able to run.

 

With all this sharing going on, there is no pressure on real memory, as can be seen by the zeroes in the “Swapped Out” and “reclamation” columns.

 

Rogue/Idle VMs

 

Figure 3 shows how you can identify guests that are idle or rogue by simply finding the ones that consistently use less than 10% of CPU - with this list, you can investigate whether to turn off, archive or isolate these VMs so they free up precious capacity.

 

viops_metron_rogue.png

Figure 3 - Identifying rogue/idle VMs

 

3. Intelligent Reporting

 

 

The keys to intelligent reporting are:

 

  • Different schedules for different reports: Daily / Weekly / Monthly

  • Simple Dashboard

  • Use Web Portal Publishing instead of paper printouts (not just eco, but faster and easier!)

  • Intelligent data for easy understanding, and not time consuming to prepare or understand

 

Resource Pool

 

Figure 4 shows a Resource Pool report.

 

viops_metron_rp_report.png

Figure 4 - Resource Pool Report

 

A resource pool consists of an amount of CPU and Memory. Therefore reporting will only need to cover these 2 areas. As a resource pool is populated by VMs you may want to overlay the VM usage of the Resource Pool like we did previously with the ESX Host.

 

Monitor the utilisation of the resource pool against it’s Reservation and it’s Limit. If pools are consistently using more than their reservations it may be time to reassess the settings you have.

 

Cluster

 

Figure 5 shows a cluster dashboard.

 

viops_metron_cl_report.png

Figure 5 - Cluster Dashboard

 

With a cluster we need to report all the same items we were looking at for the ESX Host. But as we get to these higher levels there is likely to be significantly more management interest in the “health” of the Virtual Infrastructure. So it’s at these levels you might need to start including management type reports.

 

Figure 6 shows a cluster report.

 

viops_metron_cl_report2.png

Figure 6 - Cluster Report

 

Host

 

Figure 7 shows a host report.

 

viops_metron_host_report.png

Figure 7 - Host Report

 

Given that a host has the same hardware as a normal OS/Hardware type system then we need to monitor the same items. We still need to monitor CPU, Memory, Disk and Network cards, against agreed thresholds.

 

Figure 8 shows a host report, focusing on overhead

 

viops_metron_host_report2.png

Figure 8 - Host Report (Overhead)

 

While we need to monitor all the usual items, we can introduce some unique VMware data as a comparison. Here we can see the Pink area representing the observed CPU usage of the ESX Host. The stacked line graph in front represents the CPU usage of the ESX Host by the Guest VMs.

 

As you can see at the beginning of the graph some VMs generate significant overheads in ESX. In this case the workload in the “blue” VM (VM001) was almost entirely graphical.

 

Figure 9 shows a host report for memory.

 

viops_metron_host_memory.png

Figure 9 - Host Memory Report

 

This chart illustrates both the usage of real memory in GB by an ESX host and the average memory percentage used by all the VMs is currently supporting.

 

The area graph is the percentage of memory used by the VMs, and generally tracks the shape of the line graph, which provides detail of the amount of memory ESX is really using. The more variable nature of the ESX host memory line can be explained by the additional work ESX is performing on behalf of guests, which they do not “see”.

 

Guest

 

Figure 10 shows stacked VM CPU.

 

viops_metron_vm_cpu.png

Figure 10 - Stacked vCPU

 

This chart illustrates the potential for blocking work on a guest system that has more than one virtual CPU. Guest VM1GBVIF021 (the area graph) has two VCPU's allocated to it, but is spending much more time wanting to run but being unable to do so than VM1GBVAP199 (the bar graph) which has only 1 VCPU.

 

This effect is magnified even further with a 4-way guest.

 

Alerting

 

We've talked about metrics, talked about monitoring and reports, now for the alerting approach:

 

  • Determine what to alert on

  • Determine how often to alert

  • Determine what reports to have available when an alert is received

    • Have a tool box of reports that you run when an alert is received

    • Review past reports to determine if it is an anomaly or indication of a future problem where action needs to take place

 

4. Intelligent Trending / Forecasting

 

 

Figure 11 shows a threshold/trend alert.

 

viops_metron_alert.png

Figure 11 - Threshold alerting

 

  • Forecasting

    • When will I run out of capacity

    • Business data needed

  • Trending

    • Straight-line trends

    • Trends with a point in time increase “Dog Leg Trend”

 

Figure 12 shows a trend report that you can build alerts on using thresholds.

 

viops_metron_trend.png

Figure 12 - Trends and Thresholds

 

Figure 13 shows a dog-leg trend report, which shows the likely effect of a 20% jump in CPU utilisation from more work were added to the cluster at a given date/time.

 

viops_metron_dogleg.png

Figure 13 - Dog-leg Trend

 

5. Intelligent Modeling

 

 

Modeling is used to answer questions like: Where do I put the next VMs? Assume all things equal, only capacity is different between two clusters, then which cluster is best for my next VM?

 

Cluster capacity attributes that might affect your decision include: what are the goals in terms of capacity (e.g. Cluster 1 <50% full Gold Standard, Cluster 2 <80% full Low Cost Standard)

 

Ideally, from modeling you want to get a suggestion “at a glance” without having to study reports.

 

Modeling requirements:

  • More detailed than cluster level

  • Unit of planning = Host

  • Make workloads = VMs

  • Must be simple to set up and use

  • Must be able to incorporate business information or application data, if available

 

Typical Modeling scenario: Growing Virtual Workloads

 

You have a 4-CPU ESX Server currently running 5 virtual machines and your model requires that you grow all workloads by 90% over 10 quarters. Figure 14 shows the effect of this growth

 

viops_metron_growth.png

Figure 14 - Model growth at 90% over 10 quarters

 

You can then model what happens if you add more storage, in Figure 15.

 

viops_metron_growth2.png

Figure 15 - Modeling additional storage

 

You can then model the addition of more CPU, in Figure 16:

 

viops_metron_growth3.png

Figure 16 - Modeling additional CPU

 

6. ITIL Capacity Management

 

 

7. 7-point action plan

 

 

1. People

 

Have the VMware team talk with the Capacity team. Use this slide deck & VI:OPS

Improve your knowledge of ITIL, VI is part of a larger entity. See VIOPS and Metron training webinars

 

2. Tools

 

Automate laborious activities with a tool such as Athene. See the whole picture and make informed decisions. Fast payback.

 

3. Monitoring

 

Focus on the key metrics and create processes and reports around them.

 

4. Reporting

 

Set up reports shown in this presentation and meet regularly with Stakeholders to review.

 

5. Trending

 

Create charts of your capacity trends

Use this workshop and VI:OPS as a guide and automate it using a tool such as Athene

 

6. Modeling

 

Run scenarios on a regular basis. It’s easy to do with a tool. See when capacity and service levels are impacted. How long do you have to react and what decisions need to be made?

 

7. Improve

 

use your new skills and knowledge to improve efficiency of infrastructure. Measure it, use KPI’s in presentation

 

Capacity Management Best Practices

 

What should everyone be doing at a minimum:

 

  • Design

    • 1vCPU – avoid SMP contention, get more density ( 4x more VMs per Host)

    • Use Transparent Page Sharing, get more density (reduce memory needs for VMs)

  • Metric

  • Process

    • Track impact of idle + rogue VMs (remove resources not being used)

    • Capacity management hooked into release and change procedures

 

Resources

Author

Big thanks to Metron who provided all of the expertise for this workshop. If you want to talk Capacity Management and VMware, they are your primary team to work with.

 

Reach out to the Metron team on VIOPS:

 

Reach out to the VIOPS team on Capacity Management:

 

Disclaimer

standard text

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Average User Rating
(0 ratings)




There are no comments on this document

More Like This

  • Retrieving data ...

Incoming Links