Change LUN queue depth in ESXi 6.7

Some time ago I had to change default queue depth for all LUNs in cluster.

First I needed to determine which driver (module) my HBA is using. I used following script for that.

### Script start 
$esx_hosts = get-vmhost esxi1*

foreach ($esx_host in $esx_hosts) {
Write-Host $esx_host
$esxcli = Get-EsxCli -VMhost $esx_host -V2
$esxcli.storage.core.adapter.list.invoke() |select HBAName, Driver, Description
}
### Script end

Output looks like this

To change the LUN queue depth parameter I used following script

### Script start
$esx_hosts = get-vmhost esx1*

foreach($esx_host in $esx_hosts){
Write-Host $esx_host
$esxcli=get-esxcli -VMHost $esx_host -V2
$args1 = $esxcli.system.module.parameters.set.createArgs()
$args1.parameterstring = “lpfc_lun_queue_depth=128”
$args1.module = “brcmfcoe”
$esxcli.system.module.parameters.set.invoke($args1)
}
### Script end

After running this you need to restart ESXi host.

After that I used following script to set “Maximum Outstanding Disk Requests for virtual machines”

### Script start
$esx_hosts = Get-VMHost esx1*

foreach ($esx_host in $esx_hosts)
{
$esxcli=get-esxcli -VMHost $esx_host -V2
$devices = $esxcli.storage.core.device.list.invoke()
foreach ($device in $devices)
{
 if ($device.device -imatch “naa.6”)
 {
  $arguments3 = $esxcli.storage.core.device.set.CreateArgs()
  $arguments3.device = $device.device
  $arguments3.schednumreqoutstanding = 128
  Write-Host $device.Device
  $esxcli.storage.core.device.set.invoke($arguments3)
  }
 }
}
### Script end

To check the LUN queue depth I use following script

### Script start
$esx_hosts = Get-VMHost esx1*
foreach($esx_host in $esx_hosts){
$esxcli=get-esxcli -VMHost $esx_host -V2
$ds_list = $esxcli.storage.core.device.list.invoke()

foreach ($ds1 in $ds_list) {
 $arguments3 = $esxcli.storage.core.device.list.CreateArgs()
 $arguments3.device = $ds1.device
 $esxcli.storage.core.device.list.Invoke($arguments3) | select Device,DeviceMaxQueueDepth,NoofoutstandingIOswithcompetingworlds
 }
}
### Script end

 

 

vCenter 6.7 Update 3 issue with rsyslog

Recently we noticed that logs from vCenter did not reach our log servers. When I restarted rsyslog on the vCenter appliance they started to work again but after some time it stopped again. There is a thread in VMware community about this issue – https://communities.vmware.com/thread/618178. In short the fix most likely will be available in next patch for vCenter. There is also described  in the thread a workaround how to manually update rsyslog package on the appliance.

ESXi 6.7 Update 3 (14320388) repeating Alarm ‘Host hardware sensor state’

Update: This issue has been fixed in ESXi 6.7 build 15018017.

After upgrading some of our ESXi hosts to ESXi 6.7 U3 we started seeing a lot of repeating alarm ‘Host hardware sensor state’.

Example:

Alarm ‘Host hardware sensor state’ on <esxi_hostname> triggered by event 13762875 ‘Sensor -1 type , Description Intel Corporation Sky Lake-E Ubox Registers #8 state assert for . Part Name/Number N/A N/A Manufacturer N/A’
Alarm ‘Host hardware sensor state’ on <esxi_hostname> triggered by event 13762874 ‘Sensor -1 type , Description Intel Corporation Sky Lake-E IOAPIC #5 state assert for . Part Name/Number N/A N/A Manufacturer N/A’
Alarm ‘Host hardware sensor state’ on <esxi_hostname> triggered by event 13762873 ‘Sensor -1 type , Description Intel Corporation Sky Lake-E RAS #5 state assert for . Part Name/Number N/A N/A Manufacturer N/A’

We see this issue on HPE Gen9 and Gen10 servers. Other have reported that this is also issue on other hardware (Reddit thread) . Currently we disabled to the alarm since it was spamming our events and also our syslog.

 

vMotion fails with error – “Failed to receive migration”

At some point I noticed that vMotion for several VMs failed with message “Failed to receive migration”.

After some investigation I discovered that VM advanced setting “mks.enable3d” had been changed from TRUE to FALSE without powering off the VM. After power cycling the VM vMotion started to work again. But I was not able to power cycle all the VMs so I changed the mks.enable3d setting value back to TRUE and then vMotion also started to work again.

I guess that’s why you should change advanced settings on powered off VMs instead of powered on VMs.

VM disk consolidation fails – “Unable to access file since it is locked”

Couple of time per month I’m seeing errors during backup where VM has orphaned snapshots are locked and they are preventing new backups to be performed. Under Tasks I see several failed tasks – “Consolidate virtual machine disk files” with status “Unable to access file since it is locked”

To unlock the file I usually restart the management agents of the host from the console where the VM was located when error occurred.

I have wrote about this type of issue before when it happened to me on ESXi 5.5 – VM DISK CONSOLIDATION FAILURES

ESXi stops sending syslog after uprade from 6.5 U2 to 6.7 U2.

Recently we upgraded a lot of ESXi hosts to 6.7U2. After a while I noticed that a volume of logs in our syslog server had decreased. After some investigation I discovered that after the upgrade the “syslog” rule in the ESXi firewall was no longer enabled.

By running this PowerCLI command it enabled the rule again and logs appeared in the syslog server again – Get-VMHost | Get-VMHostFirewallException | where {$_.Name.StartsWith(‘syslog’)} | Set-VMHostFirewallException -Enabled $true

HPE iLO problem with Embedded Flash/SD-CARD

Some time ago I discovered two HPE BL490c Gen9 servers with iLO in “Degraded” status. From diagnostic page it was visible that error was related with Embedded Flash/SD-CARD – “Embedded media manager failed initialization”. The Login banner was also showing a warning.

With ILO4 firmware 2.61 or newer there is a “Format” button to format the embedded Flash/SD-CARD. If you format the embedded Flash/SD-CARD the iLO will reset and and hopefully the error is fixed. It worked on one of my servers. The other one was still showing error after iLO reset. Then I performed a power-cycle to the blade server using E-FUSE process. Logged into Onboard Administrator and issued “server reset <bay_number>”. After the server re-started the error about the iLO disappeared.

Advisory from HPE regarding the issue – https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097