After upgrading to ESXi 7 U2 (17630552) some of my hosts started dying after some time. All the affected hosts had one thing in common. ESXi is installed on to SD-card. Hosts, where ESXi was installed to SSD, do not seem to have this issue.
06.05.2021 update – the VMware ESXi 7.0.2 build 17867351 seems to be also affected with same problem.
I saw the following error messages:
Bootbank cannot be found at path ‘/bootbank’ hostd performance has degraded due to high system latency
A lot of NMP warnings about vmhba32 and vmhba33
As of now, I have rolled those hosts back to ESXi 7 U1. I will have to see if this error is related to U2 or I have bad SD cards.
I was patching some hosts to VMware ESXi 7.0.2 build 17630552 with Lifecycle Manager and some of the hosts failed to boot. I was seeing the message “Failed to load crypto64.efi. Fatal error: 15 (Not found). The issue happened with HPE Gen9 and Dell servers.
In my lab I discovered a host that had a root user password something else that I usually use. I was not able to figure it out. Since the host was connected to the vCenter I used a PowerShell script to change it. NB! Make sure the new password meets the complexity requirements.
We were looking into amount of ESXi logs we were collecting and we discovered that two applications in ESXi were on verbose logging level although we had set “config.HostAgent.log.level” to info. Those applications were rhttpproxy and fdm. They we generating millions of lines per day.
To reduce rhttpproxy log level you need to edit /etc/vmware/rhttpproxy/config.xml and replace the “verbose” value in log level section with “info” for example. After this restart the rhttpproxy service.
fdm (HA agent)
To change HA agent log level you need to modify the cluster advanced settings and add option “das.config.log.level” with value “info”. After this disable High Availability on the cluster and reenable the High Availability.
Powercli lines to do this: New-AdvancedSetting -Entity $cluster -Type ClusterHA -Name ‘das.config.log.level’ -Value info Set-Cluster -Cluster $cluster -HAEnabled:$false Set-Cluster -Cluster $cluster -HAEnabled:$true
We noticed that NTP service is not starting after ESXi 7 patching although it’s configured to “Start and Stop with host”. According to VMware KB article (link) there is no fix for this issue at the moment.
I wrote a small script which I run every 5 minutes to check and start NTP service if it’s stopped.
Recently when we were solving the USB/SD-card boot issue (link) we noticed that “Scan entity” tasks on some hosts took more than 15 minutes. In VMware Community there is a thread describing similar issue – https://communities.vmware.com/thread/470949. One post there mentioned that the problem only exists on hosts with FC storage.
Based on that information we did a test and results were following. Scan entity task took about 16 minutes on a host with 100+ FC LUNs. Scan entity task took about 1 minute on a same host with all FC LUNs disconnected (FC ports disabled). We also tested on host with about 10 LUNs – scan entity tasks took about 3 minutes. I have created a case to VMware to understand how is LUN count related with scanning for patches.
Update: I never got any good solution from VMware support. The problem has disappeared for now.
We had several Dell servers that have ESXi installed on SD card which still showed missing patches after installing the 7.0.1 build 16850804. When we restarted the ESXi host it reverted back to 7.0.0. After some investigation we noticed issues with /bootbank and /altbootbank. We also noticed this issue on a freshly installed ESXi 7.0.1 build 16850804.
We have run into a issue where 2 of our VMs frozed and after investigation we discovered it was issue with VMFS6 heap size on ESXi 7. The error in the ESXi is “WARNING: Heap: 3651: Heap vmfs3 already at its maximum size. Cannot expand.”
Error from VM side: There is no more space for virtual disk ‘vm_1.vmdk’. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session.