Enabling data deduplication in Linux with QUADStor

QUADStor is storage virtualization software which features inline deduplication and/or compression. It can be used to present disks to local server or remote servers over iSCSI, FC and Infiniband. Local disks can be shared with NFS or SMB. QUADStor supports VAAI for VMware (ISCSI and FC) and ODX for Microsoft Windows. Documentation and downloads are available at QUADStor homepage – http://www.quadstor.com.

Config

Virtual Machine in VMware Workstation
6 vCPUs
16GB RAM
25GB virtual disk on Samsung 840 EVO
CentOS 7 64 bit
QUADStor 3.1.81.

Installation

Install needed packages
yum install httpd gcc perl kernel-devel sg3_utils

Install QUADStor
rpm -i quadstor-virt-3.1.81-rhel.x86_64.rpm

Make Apache and QUADStor start during boot
chkconfig httpd on
chkconfig quadstor on

Configuration

Creating a VDisk

  1. Open web browser and go to http://<yourserver>/
  2. Go to “Physical Storage” and add your disk which you want to enable deduplication to “Default” storage pool. When adding the disk enable Log and Compression.
  3. Wait until disk has been initialized.
  4. Go to “Virtual Disks” and add VDisk. Specify name and size and click Submit. Modify the VDisk and enable compression. Since VDisk is thin provisioned it’s size can be bigger than physical disk.

Accessing the VDisk locally

  1. Identify the VDisk using fdisk -l command. In my system the VDisk appeared as /dev/sdc
  2. Use fdisk or some other tools to create partition to the disk.
  3. Create file system to newly created partition -> mkfs.ext4 /dev/sdc1 
  4. Mount the file system.

Performance

Before testing large block writes and small block writes the disk was cleaned and all data was unmapped.  Test file /mnt/a.iso was cached into memory before testing. System cache was flushed before each read test with following command – “sync; echo 3 > /proc/sys/vm/drop_caches”.

Looking at read speed results I can assume that QUADStor has it’s own cache where it caches read blocks. Every read test after first test was above 1GB/s.

One concerning thing I noticed during write testing was that Linux load spiked significantly – up to 30 (1 min).

Large block write

First write

dd if=/mnt/a.iso of=/mnt/dedupe/a.iso bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 52.8046 s, 78.6 MB/s

Second write

dd if=/mnt/a.iso of=/mnt/dedupe/a2.iso bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 33.1746 s, 125 MB/s

Third write

dd if=/mnt/a.iso of=/mnt/dedupe/a3.iso bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 10.3697 s, 400 MB/s

Small block writes

First write

dd if=/mnt/a.iso of=/mnt/dedupe/a1.iso bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 12.5778 s, 330 MB/s

Second write

dd if=/mnt/a.iso of=/mnt/dedupe/a2.iso bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 11.1917 s, 371 MB/s

Third write

dd if=/mnt/a.iso of=/mnt/dedupe/a3.iso bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 11.9529 s, 347 MB/s

Large block reads

First read

dd if=/mnt/dedupe/a1.iso of=/dev/null bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 8.77728 s, 473 MB/s

Second read

dd if=/mnt/dedupe/a2.iso of=/dev/null bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 2.97584 s, 1.4 GB/s

Third read

dd if=/mnt/dedupe/a3.iso of=/dev/null bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 3.08643 s, 1.3 GB/s

Small block reads

First read

dd if=/mnt/dedupe/a1.iso of=/dev/null bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 3.60003 s, 1.2 GB/s

Second read

dd if=/mnt/dedupe/a2.iso of=/dev/null bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 3.24465 s, 1.3 GB/s

Third read

dd if=/mnt/dedupe/a3.iso of=/dev/null bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 3.34678 s, 1.2 GB/s

Use cases

Backup destination – deduplication is very effective on reducing the disk space needed to store backups.
Image library – store for ISO files and VM templates. Shared with SMB or NFS to virtualization hosts.
Document library – store for all kind of documents.
Virtual Machine storage – storage for virtual machines.

Deduplication ratios

I also did another test to measure deduplication ratios. I stored about 800GB of Windows virtual machines on QUADStor disk and I achieved more than 4x deduplication ratio.

Conclustion

QUADStor is interesting software and I will definitely will look into using it more. Performance seems acceptable and deduplication ratios seem compelling.