Finding backup jobs that dedupe badly in EMC Data Domain

Some time ago we suddenly saw a significant drop in free space on our EMC Data Domain appliances.  So we started to investigate what was causing it.

Our setup is Symantec Netbackup, Linux based Media server + DD Boost and Data Domain appliances.

We checked the usual things:

  1. Was some new backups added recently? … No
  2. Any unusual changes in size of the backups? … No

All the usual things did not pan out. Next thing we struggled was a questions how to see how much space each backup is taking in data domain and what data has the worst dedupe ratio. We could not find much from manuals and nothing much about it was available online.

Finally we found location in Data Domain System Manager where you can download a file “Compression Details” which contains each backup job, time, size of the backup, post dedupe and compression size and dedupe ratio.

How to find it? Go to Data Domain System Manager -> Data Management -> Storage Units > select Storage Unit -> Download Compression Details. After some number crunching you are getting a TSV file.

data_domain_dedupe_results

 

After saving the TSV file it was easy to open it in Excel and sort data into different columns. Columns that we were interested were post_lc and bytes/post_lc. Post_lc contains a byte value of data saved to Data Domain and bytes/post_lc is dedupe-compression ratio. Using values in post_lc we were able to identify backup which was causing us to run out of free space in Data Domain appliance.

Using the Compression Details file it is possible to see what backups dedupe nicely and which are not.

Just for clarification I’m not sure if this solution is valid for any other solutions other than Symantec Netbackup, Linux Media server + DD Boost and EMC Data Domain.

 

Advertisements

3 thoughts on “Finding backup jobs that dedupe badly in EMC Data Domain

  1. This works well for NetBackup, where the backups are reasonably labeled. I tried doing the same process for my Avamar environment and it is just a bunch of GSAN checkpoints. Oh well.

  2. Hi Kalle,
    I try to figure out which Veeam Backup File takes how much space after Dedup. Because in Veeam Backup & Replication you only see the value before data was send to ddboost process. Can you more explain how you I can interpret the columns of this TSV file. I tried to map the size to my backup files but it doesn’t fit.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s