Running data deduplication on Hyper-V host

With Windows Server 2012 R2 Microsoft announced a extended feature of data deduplication – deduplication for Virtual Desktop Infrastructure (VDI) deployments. It means that data deduplication is now able to optimize open files.  But only supported configuration is when storage for Hyper-V host is connected remotely through SMB. So running deduplication directly on Hyper-V host is unsupported. Also currently deduplication is only supported for VDI workloads not for server workloads. But unsupported does not mean that it does not work.  So I decided to try dedupe some Windows Server virtual machines running on Hyper-V which are stored locally on SAN disk.

My setup:

  • 1 Hyper-V 2012 R2 host
  • 1TB disk from EMC VNX array – NTFS, 64K
  • Four Windows Server 2008 R2 virtual machines with low  IO requirement

Deduplication settings:

  • Data deduplication type: Virtual Desktop Infrastructure (VDI) server
  • Deduplicate files older than (in days): 0
  • Custom file extentions to exclude: empty
  • Excluded folders: empty
  • Schedule: Throughput optimization from 2:00 AM. Duration 4 hours.
  • Background optimization disabled.

By limiting optimization window from 2:00 AM to 6:00 AM I avoid high load during daytime.

deduplication_settings

So far I have had no problems. I collected and analyzed some disk performance logs from Hyper-V host and guests and I did not see any problems. There were some latency spikes during optimization process but nothing really bad that would affect the guests.

With four virtual machines on the volume deduplication achieved the ratio of 65% which comes to 113GB of space savings in this case.

diskH_dedupe_info

Over the course of the day new data which has been written and not yet optimized will consume additional space on the volume. After optimization process has moved unique data  to ChunkStore virtual machines are reduced to consume small amount of space again. ChunkStore is located on the same disk under  System Volume Information\Dedup folder.  As more unique data has been processed by optimization ChunkStore folder will grow. To clean out the data no longer in use the deduplication runs garbage collection job every week. Below you see screenshots from TreeSizeFree. First screenshot is disk usage before optimization. Second screenshot is after the optimization and third is after garbage collection process.

dedup_jobs

Even though I have not had any problems I do not recommend anyone to run any important workload in such configuration. There has do be a reason why Microsoft is not supporting it. Hopefully in future it will be supported. Meanwhile anyone running deduplication directly on Hyper-V host has to take the risk that something may go wrong.  If anyone has experience with similar setup please let me know in the comments section.