vSphere - HA

#WIP #vSphere #VCP vSphere HA is an integrated feature in vSphere that allows for High Availability/Failover to be implemented in a Cluster. In it's usual configuration HA will detect failures defined in it's settings and automatically restarts workloads on other host. This will cause a downtime on the VM. High Availability with no downtime is generally only possible when using [[vSphere - Fault Tolerance.md|vSphere - Fault Tolerance]] . Therefore this is not a replacement for [[MSCS]]/ [[WSFC]] ## Concepts ### Agent (FDM Agent) The HA Agent is running on each ESXi host and reports general host health, Heart beating, VM placement, logging and initiates failover/ restarts between the hosts. Although [[vSphere - vCenter.md]] is required to configure the agents, it is not needed for HA to function. The agent is resilient to network interruptions and APD conditions, it uses another communication path if one failed. The FDM Agent is communicating either over the managing network or, if applicable through the vSAN network. #### logging can be found on each ESXi host under```/var/log/FDM.log``` - useful for troubleshooting restarts that have an unknown source. - will inform on which resources the restart resulted (Storage, Network etc.) ### HOSTD Agent FDM directly talks to [[vSphere - ESXi.md#HOSTD]] component of ESXi host. FDM relies on [[vSphere - ESXi.md#HOSTD]] to function. ### vCenter [[vSphere - vCenter.md]] is responsible to push out the FDM agent to all ESXi hosts. Any changes done on the HA configuration have to be communicated via the [[vSphere - vCenter.md]]. [[#VM Monitoring (VM-HA|vSphere - HA \> VM Monitoring (VM-HA|[VM-HA)]]]]) is also depending on [[vSphere - vCenter.md]] for information about the VMs and heartbeats. vCenter is not involved when HA responds to failure. ### Master(Primary) / Slave(Secondary) Relationship #### Master Agent/Primary Usually only one Master agent in a Cluster. Master will claim responsibility for a VM by taking ownership of the datstore which the VM resides on. The master is responsible for exchanging state information with vCenter. It will also initiate a restart of a VM when a host has failed. To take ownership of a datastore on each datastore a file is created like this: ```/<root of datastore>/.vSphere-HA/<cluster-specific- directory>/protectedlist ``` this file includes a list of VMs which are protected by HA #### Election A master is elected everytime agents are not able to contact a master, thus it is first configured when HA is enabled or if the following scenarios occur: - host failure - network partition or isolation - disconnection from vCenter - maintenance mode - HA reconfiguration The Slaves send constant heartbeats to the master, if they have not received new network heartbeats from the master they will start Election. Election takes around 15 secods using UDP. Host with he gratest number of connected datastores will be elected master. Each Slave will establish a single, encrypted TCP connection to the Master. The connection is [[SSL]] based. The slaves will only communicate with the master, not amongst themselves. #### Master/Primary Failure response If the master fails the locked file on each datastore expires and the new master will relock the file. #### Slave/Secondary Monitors the state of the VMs it is running and informs master about the state. Also monitors the health of the master by heartbeats. #### Slave monitoring If a slave is not sending heartbeats to the Master, the master will be responsible reporting this state to vCenter and determines if VMs need to be restarted. #### Files There are different files created by the Master and Slaves. Power on files are stored on the heartbeat datastores, the rest can be found on each ESXi host in ```/etc/opt/vmware/fdm``` some files are not human readable, for this the script ```/opt/vmware/fdm/fdm/prettyPrint.sh clusterconfig``` can be used. ##### Power on This file is written by a host to the VM's datastore to track if the VM is powered on, this file is also used by the slaves to communicate with the master in case of a Management Network failure. will contain either a 0(not-isolated) or a 1(isolated) ##### clusterconfig (local) contains configuration details of the cluster ##### vmmetadata (local) contains compatiblity info matrix for every HA protected VM ##### fdm.cfg (local) configuration files for logging ##### hostlist (local) list of hosts participating in the cluster, including hostname, IP, MAC and heartbeat datastores. #### Heartbeating actions taken by HA is determined by which state the host enters. The state is determined by checking both heartbeat sources. ##### Network heartbeating Each slave will send a heartbeat to its master and the master sends a heartbeat to each of the slaves. These heartbeats are sent by default every second. ##### Datastore Heartbeating can prevent unnecessary restart attempts from occuring, as it allows vSphere HA to determine wheter a host is isolated form the network or is completely unavailable. Datastore heartbeating is done by checking the heartbeating datastores for the poweron files. HA will determine heartbeat regions on the heartbeat datastore and check whether the region has been updated. Is only really useful in non converged infrastructures, as a NIC failure will probably lead to both heartbeats failing anyways. #### Failures When a failure occurs a Host can enter different states ![[_media/vSphere - HA-2023-12-19.png]] test ##### Isolation - A single Host cannot communicate with the rest of the cluster over the network - determined by reachability of isolation address - datastore heartbeats are still received - Restarts are done based on the configured isolation response in HA's configuration ###### Isolation response - triggered after 30 seconds of detected isolation - is to be configured on each Cluster - Disabled (default) - the state of the VMs remains unchanged if isolation occurs - Power off and restart VMs - VMs are powered off and restarted - Shut down and restart VMs - Vms will be shut down using [[vSphere - VMware Tools.md]] - if this is not successfull within 5 minutes, VMs will be powered off. - deferring from the default is only useful if it is unlikely that only the management network can fail. Otherwise this will lead to unnecessary restarts if the VM networks are still accesible - If a VM is shutdown because of a configured Isolation response, the shutdown is documented in the VM's directory with a power-off file. ![[_media/vSphere - HA-2023-12-19-3.png|test]] ![[_media/vSphere - HA-2023-12-19-1-.png]] ##### Partition - two or more hosts can communicate with each other but no longer can communicate with the remaining (two or more) hosts. If a cluster enters a partitioned state, each of the partitions will elect its own master. When the partition is corrected a new Master is elected for the whole cluster again. Determination of the Partition is done by the master if it cannot communicate with the host over management Network but observes heartbeats via the heartbeat datatstores. ##### Failed - neither network heartbeat or datastore heartbeats are received - restart of VMs is initiated ![[]] ![[_media/vSphere - HA-2023-12-19-2.png]] #### VM component protection (VMCP) - requires ESXi 6.0 - not supported with - FT - RDM - vSAN - vVols ##### PDL The PDL condition applies when ESXi receives a SCSI sense code, that a LUN has become unavailable to the host. THis indicates that the Storage array unrepresented the LUN from the ESXi host or is set offline. There are different responses that VMCP can take in this case: - Disabled - Does nothing, VM will crash and stays that way - issue event - event will be generated - Power off and restart VMs - The VM process on the ESXi Host is killed and will be restarted on a host that still has access to the LUN ##### APD The APD condition applies when access to a LUN is lost but the reason is unknown to ESXi(no SCSI sense code received), typically related to a storage network problem. When an APD event occurs the Host starts a timer, after 140 seconds the APD condition is declared. After that another timer is started by HA (default 3 minutes) after that HA will take action based on the chosen configuration - Disabled - nothing is done - Issue Events - Events are displayed - Power off and restart VMs - Conservative - tries to power off and restart the VM on another host - only if it knows that another host can reach the datastore - Power off and restart VMs - Aggressive - tries to power off and restart the VM on another host - even if it does not know other hosts can reach the datastore - ### Features #### VM Monitoring (VM-HA) This feature can be used to detect failures on a Guest OS level and restart the VM itself. [[vSphere - VMware Tools.md]] is necessary for this to work. Can be activated in HA configuration file. responsibility of the vCenter. #### Admission Control Is used to reserve resources to tolerate a failure, for example: 4 Hosts, with 25% Admission Control or 1 host failures cluster tolerates will result on those resources to be not accesible by VMs/Services. In case of a failure those 25%/1Host reservation can be used to failover the Resources on the failed host. Admission control can be configured in 3 modes - Cluster resource percentage - reserves a certain percentage of resources in a cluster for HA failover - Slot policy - need to define how many hosts can fail, slot size for failure failover is calculated through CPU and memory size of workloads. - takes into account memory reservations, shares and selecting the largest value of each - the largest host is chosen and the amount of reserved capacity is calculated based on the largest host. - Dedicated failover hosts - one or more ESXi hosts are reserved for failover only, cannot be used for any workloads. Percentage degredation can be configured for the VMS to tolerate. This is based on the DRS score and is calculated based ##### Parameters - `das.isolationaddressX` - ping IP address used to determine if anetwork host is isolated, can be multiple IPs (0-9) - `das.heartbeatDSPerHost` - number of heartbeat datastores to use - `das.slotcpuinmhz` - defines maximum bound CPU slot size - `das.slotmeminmb` - defines maximum bound memory slot size - `das.isolationshutdowntimeout` - time a VM waits to power down (default 300secs) - `das.ignoreRedundantNetWarning` - ignores the no HA redundant network warning message. #### Restart priorities Priorities can be set to allow for mission critical VMs to be restarted ahead of less prioritized VMs. #### HA Orchestrated Restart Can be used to specify dependencies between VMs using VM to VM rules. For example: 1. Restart DNS VMs 2. Restart vCenter VM (depending on DNS to function) 3. Restart other VMs that are depending on vCenter #### ProActive HA Is used to migrate VMs from non-critical failed hosts to healthy hosts to prevent the need of a complete HA restart incase the host fails completely. For example: if a HBA Redundancy on a single host fails, ProActive HA will migrate the VMs off the unhealthy host to a host that is still healthy. ## Requirements Different requirements for each of the features. General requirements for just HA: - minimum of 2 ESXi hosts - 4GB RAM - vCenter (for configuration, HA will still work even if vCenter is down) - shared storage for VMs - pingable gateway - vSphere standard license ### Networking requirements HA is communicating via 8182 UDP&TCP these ports need to be accesible between the HA agents running on the ESXi hosts. ## Resources