Deduplication and compression with ScaleIO

ScaleIO does not support deduplication and compression natively at the moment. Since ScaleIO can use almost any disk device I decided to test ScaleIO combined with QUADStor storage virtualization software which enables deduplication and compression.

For a test I built a small setup – three CentOS 7 servers with 200GB local disk running QUADStor software and ScaleIO SDS software and one ScaleIO MDM server and ScaleIO client based on Windows Server 2012 R2. On each CentOS server QUADStor was used to create 150GB disk with compression and deduplication enabled. The same 150GB disk was used by ScaleIO SDS as storage device.

To the client machine I presented one 200GB disk. To test the deduplication I copied some iso files to that disk. Below it is visible that my test data resulted almost 2x deduplication ratio. Deduplication ratio is affected by the way ScaleIO works – it distributes data to several nodes. Example: block “A” from “dataset1” will end up on servers “One” and “Two”. Block “A” from “dataset2” will end up on servers “One” and “Three”. On server “One” block “A” will deduplicated since it already had the block but on server “Three” the block “A” will not be deduplicated since it’s unique for this server.

QuadStor stats

I did not perform any performance test since my test systems were running on single host and on singe SSD drive.


For conclusion I can say that using 3rd party software it is possible to add features to ScaleIO – deduplication, tiering, etc. Mixing and matching different software can add complexity but sometimes the added value makes sense.

Related posts 

Enabling data deduplication in Linux with QUADStor

Speeding up writes for ScaleIO with DRAM

Automatic storage tiering with ScaleIO


QUADStor homepage