This is the first of a series of technical posts speaking about vSphere Availability.
The intended audience for this is people that might have some experience with VMware, admins that have inherited a solution or such.
Please note, if you are an experienced vSphere Admin you might find this reading kind of boring or easy.
Throughout my years in virtualization support I’ve noticed that some admins that have inherited vSphere Clusters or some App teams whose VMs failed over needed a detailed explanation of how availability features of vSphere work.
Let’s admit it, downtime in IT is a bad word. Years ago if you had to do a planned maintenance task such as upgrading firmware or upgrading a host you would have to schedule a change window, most companies avoid this kind of changes since applications can’t (or don’t want to) take downtime.
The main purpose of vMotion is to allow you to migrate your VMs from on ESXi host to another without any interruption to the guest operating system or application. This is what we call a live migration. This process allows IT administrators to perform maintenance on Physical hardware without having to take downtime.
How does it work?
Note: this is a high level summary of what happens when you trigger a vMotion. If you want a detailed version please visit this link
When a vMotion request is issued the vCenter Server will perform a compatibility check in which it will validate the following:
– Network communication over vMotion network between source and destination host.
– Access to a shared datastore between source and destination esxi host.
– Is the portgroup created on the destination host?
– Does destination host have the same CPU family and level as source host?
Once all compatibility checks are passed vCenter issues a migration specification to both Source and Destination ESXi that includes:
-The VM that is being migrated
– Virtual Machine configuration (vHardware, settings, etc)
– Source ESXi host
– Destination ESXi host
– vMotion network details.By now the vMotion process should be are arround 22% on the recent tasks process, the VM in the source host will continue funcioning normally.
At this point a copy of all memory pages and vCPU data will begin in the background generating a “shadow VM” on the destinaton host.
Once Memory pages and vCPU pages are finally copied source virtual machine will be Shut Down and the destination VM will be Powered On but boot process will be directed to the copied memory pages.
(Please note data on vDisks will not be migrated on a normal vMotion since the data disks remain on the datastore and it remains unaltered. Vitual disks can be migrated on a storage vMotion. I will cover it on a future post).
– Since the moment vMotion is triggered you cannot edit the virtual machine settings. This is a protection mechanism to avoid VMs becoming corrupt.
– At the moment of VM Switchover you might lose up to 3 pings, but not more than that.
– VMs with passtrough devices (Like RAW Device Mappings, Direct Path IO, and such) lose vMotion capabilites. This is because it is not possible to share those devices accross ESXi hosts.
– VMs with operations in progress like snapshots, backups, hang settings can’t be migrated.
– VMs with ISOs mounted will fail to vMotion (check KB https://kb.vmware.com/s/article/2148813)
– VMs with VMware Tools upgrade process in progress will fail to vMotion.
– vMotion will not cover you in a disaster scenario. For that purpose we have vSPhere HA and Fault tolerance (I will speak about them in future posts)
– Shared storage access both on source and destination hosts (SAN, NAS, FC, FCoE)
– A VMkernel port for vMotion on each host (if you are using vSphere Standard Switches please make sure to keep the same name across all hosts)
– Use the same subnet for vMotion across all hosts
– vMotion requires at least 250 MBps dedicated troghput for vMotion, more troughput ensures quicker migrations.
For further information please check: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vcenterhost.doc/GUID-3B41119A-1276-404B-8BFB-A32409052449.html
¿Does vMotion cause issues?
vMotion is an excelent tool that has revolutionized the IT market. It helps vSphere and Cloud admins avoid planned maintenance on a daily basis. In a Large size company vMotions happen thousands of times a day without causing any issues.
While there most of the applications in the market (I would say 95%) won’t notice vMotion. Some extemely latency sensitive apps will experience issues when the mgiration happens. For those cases please check this.
The main objective of all the products we will discuss in this series of posts is to provide Business Continuity, either it is to avoid downtime during planned maintenance, minimize (or avoid) downtime during hardware failures or keep business critical applications safe in case any disaster occurs.
None of this products will do the trick if your vSphere design does not take redundancy into consideration (NICs, PSUs, Shared Storage, using multipathing and such)
In addition to that you must be prepared for the lightning to strike and have a Disaster Recovery Plan and test it frequently.