VMWare vCenter crashing when deploying a VM from a template.

Recently we encountered an issue when we tried to deploy VMs from templates it crashed vCenter Server. VM deployment got stuck between 70%-95% and after a few minutes vCenter Web Clients were not responding and we had to restart vCenter services to get it back up and running.

It seems that templates were missing some info – check https://kb.vmware.com/s/article/20565429. After converting templates back to VMs and again back to templates the deployments were successful and vCenter Server no longer crashed.

Update (13.07.2018) – Same issue affects Veritas NetBackup to get an inventory from a vCenter Server – error “Validation constraint violation”. More info – https://www.veritas.com/support/en_US/article.100033934

 

Advertisements

Cross Site vMotion valitation error: Cannot complete login due to an incorrect user name or password.

After migration from vSphere 5.5 to vSphere 6.5 U2 we had issues with custom certificates and Site Recovery Manager – see the previous post.

Now I have discovered another error – when using both HTML5 or Flex client on one the vCenter servers and trying to perform a cross site vMotion we are getting validation error: “Cannot complete login due to an incorrect user name or password.”. The error does not appear when we use clients on another vCenter Server.

We have opened a case in VMWare but currently no solution yet. They have scanned through the logs and found an error that indicates still issues with certificates – “com.vmware.vim.vmomi.client.http.impl.ThumbprintTrustManager Server certificate chain is not trusted but thumbprint matches”

I will update the blog post as things progress with this issue.

 

VMWare vCenter 6.5 U2 and SRM 8.1 – Server certificate chain is not trusted and thumbprint verification is not configured

Recently during upgrade we stumbled on a issue with SRM not been able to work with vSphere vCenter 6.5 U2 which was migrated from vSphere 5.5. SRM 8.1 went into error loop after creating a site pair. Looking into different SRM log files we discovered error in the dr.log in “C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\runtime\srm-client\logs” folder. Error was –  com.vmware.vim.vmomi.client.exception.SslException: Failed to connect to Lookup Service at https://<vcenterhostname>/lookupservice/sdk. Reason: com.vmware.vim.vmomi.core.exception.CertificateValidationException: Server certificate chain is not trusted and thumbprint verification is not configured

After few days and no usable help from VMWare support we decided to try process described in couple of blog posts and KB articles:

https://vlenzker.net/2016/11/vcenter-6-5-srm-vsphere-replication-nsx-problems-after-ssl-change-ls_update_certs-py/
https://theithollow.com/2017/03/13/nsx-issues-replacing-vmware-self-signed-certs/
http://www.vjenner.com/2015/12/nsx-manager-6-2-lookup-service-error-when-using-vmca-enterprise-mode/
http://martinsvspace.blogspot.com/2015/08/srm-6-nightmare.html
https://kb.vmware.com/s/article/2121701
https://kb.vmware.com/kb/2121689

Before we did anything we created snapshots from vCenter servers while they were both turned off at the same time.

After determining that we had issues with one cert which was not updated we performed the fix against both vCenters and in one them 7 services were updated by ls_update_certs.py. After that SRM worked correctly.

 

The required VMWare Tools ISO image does not exist or is inaccessible.

Recently I deployed some VMs from an template which I had created some years ago. When I tried to update VMWare Tools I received an error “The required VMWare Tools ISO image does not exist or is inaccessible.”

After some digging around in Google I found a thread in VMware forums which also pointed me to a right direction. I had set following advanced option in VM configuration – isolation.tools.autoInstall.disable = TRUE.

To allow VMWare Tools ISO mounting I set the value from TRUE to FALSE.

Modify VMware Update Manager host reboot timeouts in vSphere vCenter 6.5 appliance

I recently changed from Windows based VMware Update Manager (VUM) to Update Manager which is embedded in to the appliance of vCenter. In old VUM I had increased host reboot timeouts to allow host firmware patching during reboot without timing out remediation job.  In appliance the vci-integrity.xml file located in “/usr/lib/vmware-updatemgr/bin”. You need to restart VUM service or appliance after the change.

Lines which need to be change are following:

<HostRebootWaitMaxSeconds>1800</HostRebootWaitMaxSeconds>
<HostRebootWaitMinSeconds>600</HostRebootWaitMinSeconds>

Changed the values to:

<HostRebootWaitMaxSeconds>5400</HostRebootWaitMaxSeconds>
<HostRebootWaitMinSeconds>1800</HostRebootWaitMinSeconds>

This change allows me to patch ESXi host and install new firmware’s with a same reboot and with as least operations as possible.

Illegal OpCode while booting a HPE Proliant server

I was installing a new ESXi and after some steps I got an error “Illegal OpCode” while booting. It happened after ESXi patching with VUM. After some debugging I found the issue.

The server had local storage where I created a VMFS datastore before patching. In BIOS boot order was CD/DVD ROM, Hard Disk and USB. ESXi was installed onto USB. The error happened when server tried to boot from disk which contained VMFS datastore. After I moved USB before Hard Disk in boot order server booted correctly.