Couple of days ago I had to do move several VMs from one datastore to another. For two VMs the migration button was grayed out. I had seen this before where backup software during backup disables vMotion/Storage vMotion and does not remove the lock after the backup is finished. Until now I always went into to the database and deleted those “locks” from vpx_disabled_methods table but it seems that it is also possible to clear them using a MOB (Managed Object Browser). I followed the instructions on the KB article and removed the “locks” from database – https://kb.vmware.com/s/article/1029926.
Recently I discovered that some of our VMs were no longer backed up by NetBackup with no obvious reason. After some investigation we discovered that all the VMs that were no longer backed up had one or more Independent Persistent hard disks configured. This alone should not be a problem NetBackup should just ignore those disks while performing a backup from the VM but we also had configured a filter in NetBackup not to include VMs with RDM ( Raw Device Mapping) – “AND NOT VMHasRDM Equal TRUE”. It seems that NetBackup considers Independent Persistent disks and RDM disks as the same. By removing the filter from the query the VMs were included in to the backup again.
Recently we had a case where we tried to restore a VM and it failed. Although all backups finished successfully. We also noticed that single file recovery from that VM was not available. After taking another look at the backup jobs we noticed that the affected VM had only backed up 5 files instead of thousands of files which is normal when “Enable file recovery from VM backup” is enabled.
After some investigation together with Veritas we discovered that Changed Block Tracking (CBT) file was corrupted. We deleted the cbt files from VM directory when VM was powered off. After VM was powered on again new cbt files were created. After that everything started to work correctly.
VMWare KB article about enabling/disabling Changed Block Tracking (CBT) – https://kb.vmware.com/s/article/1031873
Recently I noticed some Linux VM backups were failing and sometimes even crashing with following errors:
An error occurred while taking a snapshot: msg.snapshot.error-QUIESCINGERROR.
An error occurred while saving the snapshot: msg.snapshot.error-QUIESCINGERROR.
On closer look another error was visible in hostd.log file – Error when enabling the sync provider.
All of these VMs had one thing in common – they were running Docker containers.
I was not able to figure out why it happened but I was able to find a workaround – disable the VMWare Sync driver.
Copy-paste from Veritas KB article – https://www.veritas.com/support/en_US/article.000021419
Steps to Disable VMware vmsync driver
To prevent the vmsync driver from being called during the quiesce phase of a VMware snapshot, edit the VMware Tools configuration file as follows:
1) Open a console session to the Redhat Linux virtual machine.
2) Navigate to the /etc/vmware-tools directory
3) Using a text editor, modify the tools.conf file with the following entry
enableSyncDriver = false
Note: If the tools.conf file does not exist, create a new empty file and add the above parameters.
After upgrading to VMWare ESXi 5.5 U3 we started seeing random snapshot errors during backups with following message – “An error occurred while saving the snapshot: Change tracking target file already exists.”. Issue is caused by leftover cbt file that is not deleted when snapshot is removed by backup software.
After submitting several logs and traces to VMWare they acknowledged that issue exists and it will be fixed for ESXi 5.5 in June patch release and for ESXi 6.0 in July patch release.
Right now when we detect a problematic VM we browse the datastore and delete the leftover cbt file.
Some time ago we suddenly saw a significant drop in free space on our EMC Data Domain appliances. So we started to investigate what was causing it.
Our setup is Symantec Netbackup, Linux based Media server + DD Boost and Data Domain appliances.
We checked the usual things:
- Was some new backups added recently? … No
- Any unusual changes in size of the backups? … No
All the usual things did not pan out. Next thing we struggled was a questions how to see how much space each backup is taking in data domain and what data has the worst dedupe ratio. We could not find much from manuals and nothing much about it was available online.
Finally we found location in Data Domain System Manager where you can download a file “Compression Details” which contains each backup job, time, size of the backup, post dedupe and compression size and dedupe ratio.
How to find it? Go to Data Domain System Manager -> Data Management -> Storage Units > select Storage Unit -> Download Compression Details. After some number crunching you are getting a TSV file.
After saving the TSV file it was easy to open it in Excel and sort data into different columns. Columns that we were interested were post_lc and bytes/post_lc. Post_lc contains a byte value of data saved to Data Domain and bytes/post_lc is dedupe-compression ratio. Using values in post_lc we were able to identify backup which was causing us to run out of free space in Data Domain appliance.
Using the Compression Details file it is possible to see what backups dedupe nicely and which are not.
Just for clarification I’m not sure if this solution is valid for any other solutions other than Symantec Netbackup, Linux Media server + DD Boost and EMC Data Domain.