Finding backup jobs that dedupe badly in EMC Data Domain

Some time ago we suddenly saw a significant drop in free space on our EMC Data Domain appliances.  So we started to investigate what was causing it.

Our setup is Symantec Netbackup, Linux based Media server + DD Boost and Data Domain appliances.

We checked the usual things:

  1. Was some new backups added recently? … No
  2. Any unusual changes in size of the backups? … No

All the usual things did not pan out. Next thing we struggled was a questions how to see how much space each backup is taking in data domain and what data has the worst dedupe ratio. We could not find much from manuals and nothing much about it was available online.

Finally we found location in Data Domain System Manager where you can download a file “Compression Details” which contains each backup job, time, size of the backup, post dedupe and compression size and dedupe ratio.

How to find it? Go to Data Domain System Manager -> Data Management -> Storage Units > select Storage Unit -> Download Compression Details. After some number crunching you are getting a TSV file.

data_domain_dedupe_results

 

After saving the TSV file it was easy to open it in Excel and sort data into different columns. Columns that we were interested were post_lc and bytes/post_lc. Post_lc contains a byte value of data saved to Data Domain and bytes/post_lc is dedupe-compression ratio. Using values in post_lc we were able to identify backup which was causing us to run out of free space in Data Domain appliance.

Using the Compression Details file it is possible to see what backups dedupe nicely and which are not.

Just for clarification I’m not sure if this solution is valid for any other solutions other than Symantec Netbackup, Linux Media server + DD Boost and EMC Data Domain.