ESXi 7.0 U2 – Bootbank cannot be found at path ‘/bootbank’

After upgrading to ESXi 7 U2 (17630552) some of my hosts started dying after some time. All the affected hosts had one thing in common. ESXi is installed on to SD-card. Hosts, where ESXi was installed to SSD, do not seem to have this issue.

06.05.2021 update – the VMware ESXi 7.0.2 build 17867351 seems to be also affected with same problem.

I saw the following error messages:

Bootbank cannot be found at path ‘/bootbank’
hostd performance has degraded due to high system latency

A lot of NMP warnings about vmhba32 and vmhba33


As of now, I have rolled those hosts back to ESXi 7 U1. I will have to see if this error is related to U2 or I have bad SD cards.

Links to forum posts about this issue:

https://communities.vmware.com/t5/ESXi-Discussions/New-ESXI-host-v7-0-2-Bootbank-cannot-be-found-at-path-bootbank/td-p/2843538

https://www.reddit.com/r/vmware/comments/mtwgem/702_unresponsive_host/

https://www.dell.com/community/PowerEdge-Hardware-General/VMware-7-0-U2-losing-contact-with-SD-card/td-p/7842787

VMware ESXi 7 fails to boot after upgrade to Update 2 (17630552)

I was patching some hosts to VMware ESXi 7.0.2 build 17630552 with Lifecycle Manager and some of the hosts failed to boot. I was seeing the message “Failed to load crypto64.efi. Fatal error: 15 (Not found). The issue happened with HPE Gen9 and Dell servers.

After some searching, I found a Reddit thread about the same error – https://www.reddit.com/r/vmware/comments/m1glgg/70_u1_u2_broken_boot/. In this thread, several people have had the same issue and the fix for this is to upgrade the broken host using an ISO mapped through iLO or iDRAC.

I can confirm that using ISO to apply the upgrade after the Lifecycle Manager (VUM) upgrade had failed to fix the issue for me.

UPDATE: William Lam has described a workaround to prevent this – use Upgrade baseline instead of Patch baseline. Link to his post – https://www.virtuallyghetto.com/2021/03/esxi-7-0-update-2-upgrade-issue-failed-to-load-crypto64-efi.html

Create new local user to ESXi using esxcli through Powershell

To create additional local users on ESXi hosts and assign roles to them I’m using following PowerShell script:

$esx_hosts = Get-VMHost my_hosts* | where {$_.ConnectionState -ne “NotResponding”}

foreach ($esx_host in $esx_hosts) {
Write-Host $esx_host -ForegroundColor Green
$esxcli = Get-EsxCli -VMhost $esx_host -V2

$arguments1 = $esxcli.system.account.add.CreateArgs()
$arguments1.id = ‘my_local_user1’
$arguments1.password = ‘localuserpassword’
$arguments1.passwordconfirmation = ‘localuserpassword’
$arguments1.description = ‘local_user’
$arguments1

$esxcli.system.account.add.Invoke($arguments1)

$arguments2 = $esxcli.system.permission.set.CreateArgs()
$arguments2.id = ‘my_local_user1’
$arguments2.role = ‘Admin’
$arguments2

$esxcli.system.permission.set.Invoke($arguments2)
}

Change ESXi root password using esxcli through PowerShell

In my lab I discovered a host that had a root user password something else that I usually use. I was not able to figure it out. Since the host was connected to the vCenter  I used a PowerShell script to change it. NB! Make sure the new password meets the complexity requirements.

The script is following:

$esx_hosts = Get-VMHost my_host

foreach ($esx_host in $esx_hosts) {
  $esxcli = Get-EsxCli -VMhost $esx_host -V2
  $arguments = $esxcli.system.account.set.CreateArgs()
  $arguments.id = ‘root’
  $arguments.password = ‘Password123’
  $arguments.passwordconfirmation = ‘Password123’
  $arguments

  $esxcli.system.account.set.Invoke($arguments)
}

This script can be used to change root or any other local user password on all ESXi hosts connected to vCenter.

Reduce amount of ESXi logs

We were looking into amount of ESXi logs we were collecting and we discovered that two applications in ESXi were on verbose logging level although we had set “config.HostAgent.log.level” to info. Those applications were rhttpproxy and fdm. They we generating millions of lines per day.

rhttpproxy

To reduce rhttpproxy log level you need to edit /etc/vmware/rhttpproxy/config.xml and replace the “verbose” value in log level section with “info” for example. After this restart the rhttpproxy service.

fdm (HA agent)

To change HA agent log level you need to modify the cluster advanced settings and add option “das.config.log.level” with value “info”. After this disable High Availability on the cluster and reenable the High Availability.

Powercli lines to do this:
New-AdvancedSetting -Entity $cluster -Type ClusterHA -Name ‘das.config.log.level’ -Value info
Set-Cluster -Cluster $cluster -HAEnabled:$false
Set-Cluster -Cluster $cluster -HAEnabled:$true

NTP service not starting on ESXi 7 after restart.

We noticed that NTP service is not starting after ESXi 7 patching although it’s configured to “Start and Stop with host”. According to VMware KB article (link) there is no fix for this issue at the moment.

I wrote a small script which I run every 5 minutes to check and start NTP service if it’s stopped.

$esx_hosts = Get-VMHost -State Maintenance,Connected
$esx_hosts | Get-VMHostService | Where-Object {$_.key -eq “ntpd” -and $_.Running -ne “True”} | Start-VMHostService 

Scan entity (Check Compliance) task takes a long time.

Recently when we were solving the USB/SD-card boot issue (link) we noticed that “Scan entity” tasks on some hosts took more than 15 minutes. In VMware Community there is a thread describing similar issue – https://communities.vmware.com/thread/470949. One post there mentioned that the problem only exists on hosts with FC storage.

Based on that information we did a test and results were following. Scan entity task took about 16 minutes on a host with 100+ FC LUNs. Scan entity task took about 1 minute on a same host with all FC LUNs disconnected (FC ports disabled). We also tested on host with about 10 LUNs – scan entity tasks took about 3 minutes. I have created a case to VMware to understand how is LUN count related with scanning for patches.

Update: I never got any good solution from VMware support. The problem has disappeared for now.

ESXi 7.0.1 looses access to USB/SD-card.

We had several Dell servers that have ESXi installed on SD card which still showed missing patches after installing the 7.0.1 build 16850804. When we restarted the ESXi host it reverted back to 7.0.0. After some investigation we noticed issues with /bootbank and /altbootbank. We also noticed this issue on a freshly installed ESXi 7.0.1 build 16850804.

Some links about the issue:
https://www.reddit.com/r/vmware/comments/j92b40/fix_for_usb_booted_esxi_7_hosts_losing_access_to/
https://kb.vmware.com/s/article/2149444

To fix the issue is to add a new parameter to boot.cfg kernelopt line. The parameter is devListStabilityCount=10. We also added this line to 6.7.0 before upgrading it to 7.0.1.

VMFS6 heap size issue on ESXi 7 affecting running VMs – Updated

We have run into a issue where 2 of our VMs frozed and after investigation we discovered it was issue with VMFS6 heap size on ESXi 7. The error in the ESXi is “WARNING: Heap: 3651: Heap vmfs3 already at its maximum size. Cannot expand.”

Error from VM side: There is no more space for virtual disk ‘vm_1.vmdk’. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session.

VMware has KB article about it: VMFS-6 heap memory exhaustion on vSphere 7.0 ESXi hosts (80188)

This issue is fixed in ESXi 7 Update 1.

To workaround this issue follow the steps below (from KB article):

Create Eager zeroed thick disk on all of the mounted VMFS6 datastores.
vmkfstools -c 10M -d eagerzeroedthick /vmfs/volumes/datastore/eztDisk

Delete the Eager zeroed thick disk created in step 1.
vmkfstools -U /vmfs/volumes/datastore/eztDisk

Workaround has to be done for each datastore on each host.

Checking ESXi firewall status via PowerCLI

I discovered that some hosts did not had firewall enabled in our environment. So I wrote a small powershell script to check the status of firewall and enable firewall if not enabled.

The script:

$esx_hosts = Get-VMHost -State Maintenance,Connected

foreach ($esx_host in $esx_hosts)
{
Write-Host $esx_host checking
$esxcli= get-esxcli -VMHost $esx_host -V2
$fw_status = ($esxcli.network.firewall.get.invoke()).Enabled
Write-Host $esx_host – $fw_status

if ($fw_status -eq “false”) {
Write-Host Enabling FW -ForegroundColor Green
$arguments = $esxcli.network.firewall.set.CreateArgs()
$arguments.enabled = “true”
$esxcli.network.firewall.set.invoke($arguments)
}
}