ESXi 6.7 Update 3 (14320388) repeating Alarm ‘Host hardware sensor state’

After upgrading some of our ESXi hosts to ESXi 6.7 U3 we started seeing a lot of repeating alarm ‘Host hardware sensor state’.

Example:

Alarm ‘Host hardware sensor state’ on <esxi_hostname> triggered by event 13762875 ‘Sensor -1 type , Description Intel Corporation Sky Lake-E Ubox Registers #8 state assert for . Part Name/Number N/A N/A Manufacturer N/A’
Alarm ‘Host hardware sensor state’ on <esxi_hostname> triggered by event 13762874 ‘Sensor -1 type , Description Intel Corporation Sky Lake-E IOAPIC #5 state assert for . Part Name/Number N/A N/A Manufacturer N/A’
Alarm ‘Host hardware sensor state’ on <esxi_hostname> triggered by event 13762873 ‘Sensor -1 type , Description Intel Corporation Sky Lake-E RAS #5 state assert for . Part Name/Number N/A N/A Manufacturer N/A’

We see this issue on HPE Gen9 and Gen10 servers. Other have reported that this is also issue on other hardware (Reddit thread) . Currently we disabled to the alarm since it was spamming our events and also our syslog.

 

vMotion fails with error – “Failed to receive migration”

At some point I noticed that vMotion for several VMs failed with message “Failed to receive migration”.

After some investigation I discovered that VM advanced setting “mks.enable3d” had been changed from TRUE to FALSE without powering off the VM. After power cycling the VM vMotion started to work again. But I was not able to power cycle all the VMs so I changed the mks.enable3d setting value back to TRUE and then vMotion also started to work again.

I guess that’s why you should change advanced settings on powered off VMs instead of powered on VMs.

VM disk consolidation fails – “Unable to access file since it is locked”

Couple of time per month I’m seeing errors during backup where VM has orphaned snapshots are locked and they are preventing new backups to be performed. Under Tasks I see several failed tasks – “Consolidate virtual machine disk files” with status “Unable to access file since it is locked”

To unlock the file I usually restart the management agents of the host from the console where the VM was located when error occurred.

I have wrote about this type of issue before when it happened to me on ESXi 5.5 – VM DISK CONSOLIDATION FAILURES

ESXi stops sending syslog after uprade from 6.5 U2 to 6.7 U2.

Recently we upgraded a lot of ESXi hosts to 6.7U2. After a while I noticed that a volume of logs in our syslog server had decreased. After some investigation I discovered that after the upgrade the “syslog” rule in the ESXi firewall was no longer enabled.

By running this PowerCLI command it enabled the rule again and logs appeared in the syslog server again – Get-VMHost | Get-VMHostFirewallException | where {$_.Name.StartsWith(‘syslog’)} | Set-VMHostFirewallException -Enabled $true

HPE iLO problem with Embedded Flash/SD-CARD

Some time ago I discovered two HPE BL490c Gen9 servers with iLO in “Degraded” status. From diagnostic page it was visible that error was related with Embedded Flash/SD-CARD – “Embedded media manager failed initialization”. The Login banner was also showing a warning.

With ILO4 firmware 2.61 or newer there is a “Format” button to format the embedded Flash/SD-CARD. If you format the embedded Flash/SD-CARD the iLO will reset and and hopefully the error is fixed. It worked on one of my servers. The other one was still showing error after iLO reset. Then I performed a power-cycle to the blade server using E-FUSE process. Logged into Onboard Administrator and issued “server reset <bay_number>”. After the server re-started the error about the iLO disappeared.

Advisory from HPE regarding the issue – https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097

 

VMWare virtual machine exluded from Veritas Netbackup query based policy

Recently I discovered that some of our VMs were no longer backed up by NetBackup with no obvious reason. After some investigation we discovered that all the VMs that were no longer backed up had one or more Independent Persistent hard disks configured. This alone should not be a problem NetBackup should just ignore those disks while performing a backup from the VM but we also had configured a filter in NetBackup not to include VMs with RDM ( Raw Device Mapping) – “AND NOT VMHasRDM Equal TRUE”. It seems that NetBackup considers Independent Persistent disks and RDM disks as the same. By removing the filter from the query the VMs were included in to the backup again.

 

vMotion stuck at 7%

After logging into vCenter (6.7 U2) I noticed several vMotions stuck at 7%. I also found several errors in tasks – “A general system error occurred: Operation timed out”. When I tried to vMotion a VM manually I got and error. vCenter failed to validate the target. After some investigation in different log files I decided to restart vCenter. After vCenter restart all the stuck vMotions were gone and everything was working again.