Deduplication and compression with ScaleIO

ScaleIO does not support deduplication and compression natively at the moment. Since ScaleIO can use almost any disk device I decided to test ScaleIO combined with QUADStor storage virtualization software which enables deduplication and compression.

For a test I built a small setup – three CentOS 7 servers with 200GB local disk running QUADStor software and ScaleIO SDS software and one ScaleIO MDM server and ScaleIO client based on Windows Server 2012 R2. On each CentOS server QUADStor was used to create 150GB disk with compression and deduplication enabled. The same 150GB disk was used by ScaleIO SDS as storage device.

To the client machine I presented one 200GB disk. To test the deduplication I copied some iso files to that disk. Below it is visible that my test data resulted almost 2x deduplication ratio. Deduplication ratio is affected by the way ScaleIO works – it distributes data to several nodes. Example: block “A” from “dataset1” will end up on servers “One” and “Two”. Block “A” from “dataset2” will end up on servers “One” and “Three”. On server “One” block “A” will deduplicated since it already had the block but on server “Three” the block “A” will not be deduplicated since it’s unique for this server.

QuadStor stats

I did not perform any performance test since my test systems were running on single host and on singe SSD drive.

Conclusion

For conclusion I can say that using 3rd party software it is possible to add features to ScaleIO – deduplication, tiering, etc. Mixing and matching different software can add complexity but sometimes the added value makes sense.

Related posts 

Enabling data deduplication in Linux with QUADStor

Speeding up writes for ScaleIO with DRAM

Automatic storage tiering with ScaleIO

Links

QUADStor homepage

Advertisements

Enabling data deduplication in Linux with QUADStor

QUADStor is storage virtualization software which features inline deduplication and/or compression. It can be used to present disks to local server or remote servers over iSCSI, FC and Infiniband. Local disks can be shared with NFS or SMB. QUADStor supports VAAI for VMware (ISCSI and FC) and ODX for Microsoft Windows. Documentation and downloads are available at QUADStor homepage – http://www.quadstor.com.

Config

Virtual Machine in VMware Workstation
6 vCPUs
16GB RAM
25GB virtual disk on Samsung 840 EVO
CentOS 7 64 bit
QUADStor 3.1.81.

Installation

Install needed packages
yum install httpd gcc perl kernel-devel sg3_utils

Install QUADStor
rpm -i quadstor-virt-3.1.81-rhel.x86_64.rpm

Make Apache and QUADStor start during boot
chkconfig httpd on
chkconfig quadstor on

Configuration

Creating a VDisk

  1. Open web browser and go to http://<yourserver>/
  2. Go to “Physical Storage” and add your disk which you want to enable deduplication to “Default” storage pool. When adding the disk enable Log and Compression.
  3. Wait until disk has been initialized.
  4. Go to “Virtual Disks” and add VDisk. Specify name and size and click Submit. Modify the VDisk and enable compression. Since VDisk is thin provisioned it’s size can be bigger than physical disk.

Accessing the VDisk locally

  1. Identify the VDisk using fdisk -l command. In my system the VDisk appeared as /dev/sdc
  2. Use fdisk or some other tools to create partition to the disk.
  3. Create file system to newly created partition -> mkfs.ext4 /dev/sdc1 
  4. Mount the file system.

Performance

Before testing large block writes and small block writes the disk was cleaned and all data was unmapped.  Test file /mnt/a.iso was cached into memory before testing. System cache was flushed before each read test with following command – “sync; echo 3 > /proc/sys/vm/drop_caches”.

Looking at read speed results I can assume that QUADStor has it’s own cache where it caches read blocks. Every read test after first test was above 1GB/s.

One concerning thing I noticed during write testing was that Linux load spiked significantly – up to 30 (1 min).

Large block write

First write

dd if=/mnt/a.iso of=/mnt/dedupe/a.iso bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 52.8046 s, 78.6 MB/s

Second write

dd if=/mnt/a.iso of=/mnt/dedupe/a2.iso bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 33.1746 s, 125 MB/s

Third write

dd if=/mnt/a.iso of=/mnt/dedupe/a3.iso bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 10.3697 s, 400 MB/s

Small block writes

First write

dd if=/mnt/a.iso of=/mnt/dedupe/a1.iso bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 12.5778 s, 330 MB/s

Second write

dd if=/mnt/a.iso of=/mnt/dedupe/a2.iso bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 11.1917 s, 371 MB/s

Third write

dd if=/mnt/a.iso of=/mnt/dedupe/a3.iso bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 11.9529 s, 347 MB/s

Large block reads

First read

dd if=/mnt/dedupe/a1.iso of=/dev/null bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 8.77728 s, 473 MB/s

Second read

dd if=/mnt/dedupe/a2.iso of=/dev/null bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 2.97584 s, 1.4 GB/s

Third read

dd if=/mnt/dedupe/a3.iso of=/dev/null bs=128K
31648+0 records in
31648+0 records out
4148166656 bytes (4.1 GB) copied, 3.08643 s, 1.3 GB/s

Small block reads

First read

dd if=/mnt/dedupe/a1.iso of=/dev/null bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 3.60003 s, 1.2 GB/s

Second read

dd if=/mnt/dedupe/a2.iso of=/dev/null bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 3.24465 s, 1.3 GB/s

Third read

dd if=/mnt/dedupe/a3.iso of=/dev/null bs=8K
506368+0 records in
506368+0 records out
4148166656 bytes (4.1 GB) copied, 3.34678 s, 1.2 GB/s

Use cases

Backup destination – deduplication is very effective on reducing the disk space needed to store backups.
Image library – store for ISO files and VM templates. Shared with SMB or NFS to virtualization hosts.
Document library – store for all kind of documents.
Virtual Machine storage – storage for virtual machines.

Deduplication ratios

I also did another test to measure deduplication ratios. I stored about 800GB of Windows virtual machines on QUADStor disk and I achieved more than 4x deduplication ratio.

Conclustion

QUADStor is interesting software and I will definitely will look into using it more. Performance seems acceptable and deduplication ratios seem compelling.

Red Hat Summit 2014 presentations

I went through some Red Hat Summit 2014 presentations and found few interesting things. Presentations are available at Red Hat website – https://www.redhat.com/summit/2014/presentations/.

Linux Containers in RHEL 7 – Key Takeways (Link to presentation)

  • Application isolation mechanism for Light-weight multi-tenancy
  • Application centric packaging w/ Docker image-based containers
  • Linux Containers Productization
    • Key kernel enablers – full support in RHEL 7 GA
    • Docker 1.0 – shipped with RHEL 7 GA
  • Linux Container Certification
  • Red Hat and Docker partnership to build enterprise grade Docker
    containers

RHEL roadmap (Link to presentation)

Theoretical Limits on X86_64

  • Logical CPU – maximum 5120 logical CPUs
  • Memory – maximum 64T

RHEL 7 will support XFS, ext4, 3, 2, NFS, and GFS2

  • Maximum supported filesystem sizes increase
    • XFS 100TB -> 500TB
    • ext4 16TB -> 50TB
  • btrfs is a technology preview feature in RHEL 7

Red Hat Enterprise Linux 7 has XFS as the new default file
system

  • XFS will be the default for boot, root and user data partitions on all
    supported architectures
  • Included without additional charge as part of RHEL 7 subscription

RHEL 7 Storage Enhancements

  • New protocols and driver support
    • Shipping NVMe driver for standard PCI-e SSD’s
    • Support for 16Gb/s FC and 12Gb/s SAS-3
    • Linux-IO SCSI Target (LIO)
    • User-specified action on SCSI events, e.g. LUN create/delete, thin provisioning threshold reached, parameter change.
  • LVM
    • RAID, thin provisioning and snapshot enhancements
    • Tiered storage, using LVM/DM cache, in technology preview

 Red Hat Enterprise Virtualization Hypervisor roadmap (Link to presentation)

Performance: Windows Guest Improvements

  • Make Windows guests think they are running on Hyper-V

Scalability: Large Guests

  • Host: 160 cores; 4TiB RAM
  • Virtual Machine CPU Limit : 160 vCPUs
  • RHEL6 4000GiB guest RAM
  • RHEL7 4 TiB guest RAM

Storage reclamation – part 3 – Linux

This is the third post in the “Space reclamation” series focusing on Linux.

Like in Windows and in VMware ESXi also in Linux dead space is left behind when data is removed.

For space reclamation under Linux there is available a Linux version of EMC StorReclaim with same options as Windows version.

EMC StorReclaim

EMC StorReclaim

To get EMC StorReclaim contact your EMC representative.

With newer Linux operating systems there is a native trim operations available to get back unused space. I will write about trim in the future when I have had more time to test it.

If anyone knows more tools that can be used to reclaim space in Linux please let me know in the comments sections.

Other posts in this series:

Storage reclamation – part 1 – VMWare vSphere

Storage reclamation – part 2 – Windows

Storage reclamation – part 4 – Zero fill and array level reclamation