In my opinion the honest answer is that you can’t estimate. There is no good and accurate way to estimate deduplication and compression ratios because there are many variables that will affect the ratio. There are estimation tools available from different vendors but you will get most accurate numbers by testing different solutions with your actual data. I have tested several solutions and deduplication and/or compression ratio has varied between 2.5x to 8x.
- Test different solutions – to understand how well different solutions work. Example solution #1 has dedupe ratio of 4 which looks good but solution #2 has dedupe ration 8 with same data.
- Try to use the same test data during different tests so the results would be comparable.
- Microsoft Windows Server 2012 R2 – built in post-process deduplication engine. Check this page for more information.
- QuadStor software – inline deduplication and compression. Check this page for more information.
- Nutanix Community Edition – has both deduplication and compression options.
- All Flash Arrays – most AFA-s include deduplication and/or compression for data reduction. If you are interested of AFA-s most vendors can hook you up with POC equipment which you can use to test the solution. AFA vendors to check EMC, Pure Storage, Kaminario, SolidFire, etc.
Results will vary between different solutions. Deduplication works well for similar data (VDI, Server OS disks), compression works better for databases (Oracle, MSSQL). Deduplication ratio is also affected from deduplication chunk size – 512bytes, 4K, 8K, 16K, etc. Usually smaller chunk size results better ratio.
More info about this topic can be found on following links.