ScaleIO 2.0 now available

EMC has released ScaleIO 2.0 couple of days ago. More information – https://community.emc.com/docs/DOC-52581

Some new features (source ScaleIO 2.0 release notes):

  • Extended MDM cluster – introduces the option of a 5-node MDM cluster, which is able to withstand two points of failure.
  • Read Flash Cache (RFcache) – use PCI flash cards and/or SSDs for caching of the HDDs in the SDS.
  • User authentication using Active Directory (AD) over LDAP.
  • The multiple SDS feature – allows the installation of multiple SDSs on a single Linux or VMware-based server.
  • Oscillating failure handling – provides the ability to handle error situations, and to reduce their impact on normal system operation. This feature detects and reports various oscillating failures, in cases when components fail repeatedly and cause unnecessary failovers.
  • Instant maintenance mode – allows you to restart a server that hosts an SDS, without initiating data migration or exposing the system to the danger of having only a single copy of data.
  • Communication between the ScaleIO system and ESRS (EMC Secure Remote Support) servers is now supported – this feature replaces the call-home mechanism. It allows authorized access to the ScaleIO system for support sessions.
  • Authenticate communication between the ScaleIO MDM and SDS components, and between the MDM and external components, using a Public and Private Key (Key-Pair) associated with a certificate – this will allow strong authentication of components associated with a given ScaleIO system. A Certificate Authority certificate or self-signed certificate can be used.
  • In-flight checksum protection provided for data reads and writes – this feature addresses errors that change the payload during the transit through the ScaleIO system.
  • Performance profiles – predefined settings that affect system performance.

ScaleIO can be downloaded from EMC website – http://www.emc.com/products-solutions/trial-software-download/scaleio.htm. ScaleIO 2.0 supports VMWare (5.5 and 6.0), Linux and Windows.

More info about ScaleIO 2.0 can be found from Chad Sakac blog: http://virtualgeek.typepad.com/virtual_geek/2016/03/scaleio-20-the-march-towards-a-software-defined-future-continues.html

Check out all of my posts about ScaleIO from here.

SSD caching could decrease performance – part 2

In the second part of the “SSD caching could decrease performance” I will cover IO read and write ratio and IO size affects to SSD. Part 1 is accessible here.

Read IO and write IO ratio

Most real workloads are mixed IO workloads – both disk reads and writes. Read and Write ratio is split between disk reads and disk writes. Many cases enterprise SSD disks have equally good read and write performance. But lately MLC and especially TLC drives have made their way into enterprise market and with some of them read and write performance is not equal. In addition SSD disks may become slower over time due to small amount of free blocks. To mitigate the free block issue SSD vendors are installing extra capacity inside the disks – example Intel DC S3700 SSD has about 32% extra capacity.

SSD disks usually handle reads better than writes. Before selecting your SSD disk vendor and model I recommend to do some research. If possible purchase some different models that would suite your needs and test them in real life scenarios.

IO size

When it comes to performance IO size matters. Large IO could potentially overload a small number of SSD disks and with it affect the performance. I would avoid caching workloads with IO size above 128KB. In my personal experience I have seen a situation with a database where SSD caching was hindering performance due to database multi-block reads.

My recommendations for a more successful SSD caching project

  • Determine VMs that would benefit from SSD caching – VMs doing certain amount or more IO.
  • Analyze the IO workloads – no point of doing read caching when server is only writing. IO size.
  • Check your hardware – controller speeds and capabilities. No point to connect fast SSD to a crappy controller.
  • Find a suitable SSD disks for caching. Price vs performance.
  • Talk with storage admins – might be that SSD in array would make more sense than SSD in server.

Nutanix – No rebuild capacity available

Nutanix resilency status

“No rebuild capacity available” warning appears to Nutanix clusters when amount of free disk space in clusters is lower than capacity of a single node. This means when you loose one node it is not possible to fully rebuild data resiliency. There are options to fix this – free up space or add more nodes to your cluster. You can free up space by removing obsolete VMs or reclaim dead space within running VMs. I covered here how to reclaim space within VMs.

To check how much storage capacity single node provides go to Hardware -> Table -> select one host -> check “Storage Capacity” value.

Storage space reclamation in VMware with Nutanix

I have been researching storage reclamation for a while. When I got my hands on a Nutanix running VMware I was interested how we could get maximum storage space efficiency out of that. Since Nutanix presents NFS share to ESXi hosts the datastore level reclaim will not be needed. This left me with in-guest reclamation.

After some testing I discovered that writing zeros to virtual machine disks had a interesting affect to VMDK files that resided on Nutanix. VM size had shrinked to a size which it actually consumes. No dead space was left inside VMDK files.

VM size before writing zeros to disk

VM size before writing zeros to disk

VM size after writing zeros to disk

VM size after writing zeros to disk

Storage container space did not change immediately – it was still using the same amount of space as before. Container space usage went down by next morning. Analysis page showed that container space usage went down gradually over night.

Writing zeros

Sdelete is the most widely known tool to write zeros to Windows disks. But caveat using Sdelete is that for some short time the disk is full which can cause problems for applications. I found a better solution – “Davidt Fast Space reclaimer” script written by David Tan. The script generates a 1GB file filled with zeros and copies it until less than 1GB of free space is left on the drive. You can download the script from here.

There is also another script written by Chris Duck called “PowerShell Alternative to SDelete” which can be found from here.

There are also commercial products available for Windows that will write zeros over dead space – Raxco PerfectStorage and Condusiv V-locity. They might be worth to checking out.

For Linux I wrote a script my self that will write zeros until there is less than 1GB space free from total mountpoint size minus 10%. I did minus 10% because to avoid out of space condition and unneeded triggers by monitoring software. My shell scripting skills are not that good so all ideas and suggestions are welcome how to make this script better.

— Script begins here —

#!/bin/bash
# Author: Kalle Pihelgas
# Version: 1.0
# Date: 10.12.2014

# list of mount points with ext3/ext4 file system
localmountpoints=`mount | grep “type ext” | awk ‘{ print $3 }’ `

# loop throug all mountpoints
for mountpoint in $localmountpoints; do

# Get free space
freespace=`df -k $mountpoint | tail -1 | tr -s ‘ ‘ | cut -d’ ‘ -f4 `
freespaceint=`echo $freespace`
# end of free space

# get 10% from total size
totalspace=`df -k $mountpoint | tail -1 | tr -s ‘ ‘ | cut -d’ ‘ -f2 `
totalspaceint=`echo $totalspace`
ten_totalspace=`echo $totalspaceint*0.1 | bc`
# end getting 10% from total space

# get free space amount that will be filled with zeros
freespace=`echo “($freespaceint-$ten_totalspace)/1024/1024” | bc`

# counter for zero files
a=1

# write zeros until 10% if free
while [ $freespace -gt 0 ]
do
echo Mount point: $mountpoint, zeros left to write: $freespace GB
dd if=/dev/zero of=$mountpoint/zerofile$a bs=1M count=1000
sleep 5
a=`expr $a + 1`

# Get free space again
freespacenew=`df -k $mountpoint | tail -1 | tr -s ‘ ‘ | cut -d’ ‘ -f4 `
freespaceint=`echo $freespacenew`
freespace=`echo “($freespaceint-$ten_totalspace)/1024/1024” | bc`
# end of free space recalculation
done
rm -rf $mountpoint/zerofile*
done

— Script ends here —

DISCLAIMER! I have not tested these scripts extensively. So I urge you to test these scripts thoroughly before running them on your server! I will not be responsible for any damage caused by these scripts!

Storage reclamation – part 4 – Zero fill and array level reclamation

One way to reclaim storage space is overwrite the dead space with zeros and then get rid of them.

Writing Zeros

One thing to keep in mind that when you write zeros to the disk/LUN it will grow to full size.

Windows

Sdelete – free tool from Microsoft that can be used to write zeros to disks.
Command: sdelete -z E:\

Raxco PerfectDisk – a commercial tool to intelligently overwrite dead space with zeros.

Condusiv V-locity – a commercial tool to overwrite dead space with zeros.

Linux

dd – is a command-line utility for Unix and Unix-like operating systems which can be used to copy disks and files.
Command: dd if=/dev/zero of=/home/zerofile.0 bs=1M

Getting rid of zeros in VMware level

Zeros can be removed using Storage vMotion. When performing Storage vMotion thin must be selected and the source and destination datastores have to have different block size. I have used following combinations:
VMFS5 -> select thin -> VMFS3 (8MB block) -> VMFS5
VMFS5 -> select thin -> NFS -> VMFS5

Reclaim zeros in array level

Different arrays support zero space reclamation in different ways. Check your vendor documents how exactly accomplish this.

EMC VMAX

Command can be executed to reclaim zero space from a LUN.
Solution Enabler command example: symconfigure -cmd “start free on tdev <TDEVID> start_cyl=0 end_cyl=last_cyl type=reclaim;” commit -sid <SID>

Hitachi HUSVM

Zero space from LUN can be reclaimed by running “Reclaim Zero Pages” from Hitachi Command Suite.

EMC VNX

Performing LUN migration will reclaim zero space from LUN. In addition compressing and then uncompressing the LUN will also discard zeros.

Other arrays

Any array that supports inline compression and/or deduplication will probably reclaim any zero space during write operation.

Other posts in this series:

Storage reclamation – part 1 – VMWare vSphere

Storage reclamation – part 2 – Windows

Storage reclamation – part 3  Linux

Red Hat Summit 2014 presentations

I went through some Red Hat Summit 2014 presentations and found few interesting things. Presentations are available at Red Hat website – https://www.redhat.com/summit/2014/presentations/.

Linux Containers in RHEL 7 – Key Takeways (Link to presentation)

  • Application isolation mechanism for Light-weight multi-tenancy
  • Application centric packaging w/ Docker image-based containers
  • Linux Containers Productization
    • Key kernel enablers – full support in RHEL 7 GA
    • Docker 1.0 – shipped with RHEL 7 GA
  • Linux Container Certification
  • Red Hat and Docker partnership to build enterprise grade Docker
    containers

RHEL roadmap (Link to presentation)

Theoretical Limits on X86_64

  • Logical CPU – maximum 5120 logical CPUs
  • Memory – maximum 64T

RHEL 7 will support XFS, ext4, 3, 2, NFS, and GFS2

  • Maximum supported filesystem sizes increase
    • XFS 100TB -> 500TB
    • ext4 16TB -> 50TB
  • btrfs is a technology preview feature in RHEL 7

Red Hat Enterprise Linux 7 has XFS as the new default file
system

  • XFS will be the default for boot, root and user data partitions on all
    supported architectures
  • Included without additional charge as part of RHEL 7 subscription

RHEL 7 Storage Enhancements

  • New protocols and driver support
    • Shipping NVMe driver for standard PCI-e SSD’s
    • Support for 16Gb/s FC and 12Gb/s SAS-3
    • Linux-IO SCSI Target (LIO)
    • User-specified action on SCSI events, e.g. LUN create/delete, thin provisioning threshold reached, parameter change.
  • LVM
    • RAID, thin provisioning and snapshot enhancements
    • Tiered storage, using LVM/DM cache, in technology preview

 Red Hat Enterprise Virtualization Hypervisor roadmap (Link to presentation)

Performance: Windows Guest Improvements

  • Make Windows guests think they are running on Hyper-V

Scalability: Large Guests

  • Host: 160 cores; 4TiB RAM
  • Virtual Machine CPU Limit : 160 vCPUs
  • RHEL6 4000GiB guest RAM
  • RHEL7 4 TiB guest RAM

Missing Array Pair when adding protection groups in VMware SRM

Recently I reinstalled my infra test environment and stumbled on a problem in VMware SRM while creating protection groups. My problem was that the location where I should have selected the Array Pair for which I wanted to create protection groups was empty.

After doing some digging in logs I discovered following lines in SRM log:

[01212 error ‘DatastoreGroupManager’ opID=420f60c] Device ’82’ matches two different devices ‘naa.60060e8013294d005020294d00000052’ and ‘hitachi_hus_vm0_0052’
[01212 error ‘DatastoreGroupManager’ opID=420f60c] Device ’83’ matches two different devices ‘naa.60060e8013294d005020294d00000053’ and ‘hitachi_hus_vm0_0053’
[01212 verbose ‘DatastoreGroupManager’ opID=420f60c] Matched 0 devices of total 60
[01212 warning ‘DatastoreGroupManager’ opID=420f60c] No replicated datastores found for array pair ‘array-pair-7037’
[01212 verbose ‘DatastoreGroupManager’ opID=420f60c] Recomputed datastore groups for array pair ‘array-pair-7037’: 0 replicated datastores, 0 replicated RDMs, 0 free devices, 0 datastore groups

Device 82 and device 83 were the luns which are replicated and which contain VMs. The problem was that these devices were identified into two different ESXi devices. After checking ESXi hosts I figured out the problem – out of three hosts two hosts had 3rd party multipath software installed. This caused the luns to be identified differently. After I unpresented the luns from the host which did not had 3rd party multipathing software installed the array pair showed up in SRM.

It seems that it’s needed to keep ESXi hosts similar if they have been presented replicated luns which you want to use with VMware SRM.