What’s new in Virtual SAN 6.2
In Virtual SAN 6.2 VMware introduced many new features. Some of the coolest features in this release are dedupe & compression. These features were requested by VMware customers and gladly they listened to the customer. When talking about Dedupe and Compression one first needs to find out why an organisation would want to use dedupe & compression and what these features actually do. One of the many reasons for using dedupe and compression is to lower TCO for customers. The reason for this is with dedupe and compression the customer benefits from space efficiency as the VSAN cluster will not utilize as much storage as it would if it was not using dedupe and compression, hence saving dollars. It is also important to note that Dedupe and Compression are supported on All Flash VSAN configurations only.
What are Dedupe and Compression
The basics of deduplication can be seen in the figure below. What happens is the blocks of data will stay in the cache tier while they are being accessed regularly but once this trend stops the deduplication engine will check to see if the block of data that is in the cache tier has already been stored on the capacity tier. Therefore only storing unique chunks of data.
So imagine if a customer has lots of VM’s sharing a datastore and these VM’s keep using the same block of data due to a certain file being written to often. Each time a duplicate copy of data is stored space is wasted. These blocks of data should only be stored once if we want to ensure we are storing efficiently. The deduplication and compression operation happens during the destage from the cache tier to the capacity tier.
Hashing is used to track each block of data, hashing is the process of creating a short fixed-length data string from a large block of data. The hash identifies the data chunk and is used in the deduplication process to determine if the chunk has been stored before. Compression is enabled at the cluster level along with dedupe. It will not be enabled using Storage Policy based Management. The default block size for dedupe will be 4k. For each unique 4k block compression will only be performed if the output block size will be smaller than the fixed compression block size. The goal is to get this 4k block compressed to a size of 2k as seen below. A compressed block will be allocated and tracked in translation maps.
Enabling Dedupe & Compression
To enable dedupe and compression is not rocket science by any means. Simply go to the VSAN Cluster and enable it from the Edit Virtual SAN Settings screen. Once dedupe has been enabled, all hosts and disk groups in the cluster will participate in deduplication. We will sometimes speak about dedupe domains, these will be the same as a disk group. So the best way to think about this is that all redundant copies of data in the disk group will be reduced to a single copy, however redundant copies across disk groups will not be deduped. So the space efficiency is limited to the disk group. This means that all components that are in a disk group will share one single copy if multiple components are using the same block.
Dedupe can be enabled and disabled on a live cluster however there are some implications to doing this, turning on dedupe means going through all disk groups in the cluster and evacuating all of the data and reformatting the disk group. After this VSAN will perform dedupe on the disk groups.
So it’s a rolling upgrade. VMware decided to not introduce dedupe and compression separately, so once you enable deduplication you are also enabling compression as seen below.
Dedupe is an IO intensive operation, in a non dedupe world we just write the data from tier 1 to tier 2, however with dedupe things remain the same for the first part. However for the destage part there are more operations that need to be performed. IO will go through an additional dedupe path. This will happen regardless of the data being dedupe friendly or not.
Read – When performing a read, extra reads need to be sent to the capacity SSD so we can find the logical Virsto address and therefore find the physical capacity(SSD) address.
Write – During destage extra writes are required to the Translation Map and to the Hash Map tables. The translation map and hash map tables are used to reduce overheads. So it needs to be accounted for that this overhead is incurred and we are using a 4k block size.
When looking in the Summary screen for the Datastore, different capacities and dedupe ratio can be viewed. Logical capacity is a new term. It is the capacity footprint seen if dedupe and compression are not turned on. So in the example below the Physical used is 10G and the dedupe ratio is 3.2. Therefore logical capacity is 32G. CLOM needs to be aware of these capacities both logical and physical because of placement.
In summary dedupe and compression are fantastic features that are going to be very useful to our customers that have all flash configurations, it will reduce their TCO and from a technical stand point they do not really need to learn anything new so there is no ramp up on the technology from a learning perspective.