Tuesday 5 April 2016

StarWind Virtual SAN review - Part 5 - Performance tests

In this part of StarWind Virtual SAN review I will be testing and comparing performance of different disk types using different combination of features. But let me first start with a short description of my test bed.
  • 2 x ESXi servers with Avoton CPUs - C2750DI4
  • Each ESXI host has: 
    • 1 x 7.2 HDD 
    • 1 x 256GB SSD. 
    • 6 x 1GB NICs. All NICs are connected to a single TP-Link switch.
  • First pair of NICs on each host is used for management and VMs traffic, another pair is dedicated for iSCSI traffic to Synology box and the last two NICs are reserved for StarWind servers. 
  • All VMs, including StarWind Virtual SAN Servers, run on a shared iSCSI storage (Synology). 
  • Each ESXi server hosts one StarWind Virtual SAN and has two local VMFS datastores - created on HDD and SSD disks
  • The test disks, attached to Virtual SAN servers are placed on local HDD (data) and SSD (L2 cache). Thus, there is no other workload present on test disks.
  • Each Virtual SAN server has two additional disks - one per each local datastore. That's where the test disks and corresponding Level 2 cache files are created. 

There is another important thing to mention before you get to the test results. I don't have any intention to provide test results that could be valid for real-life scenarios for the simple reason that my Lab setup doesn't meet the minimum hardware requirements for StarWind, e.g. Intel Xeon E5620 (or higher) CPU. 

My idea was to compare different combination of disks and features and to see how each feature contributes to performance. For instances, I was curious to check how many times Flat disk with L1 and L2 caches is faster than regular disk, or how HA Disks' performance depends on the multipathing policy. 

Here is all all combinations I wanted to test summed up in one table
StarWind Tests Config



The plan was to get the baseline performance running IOmeter natively in StarWind so that I could compare it with the productivity of the Virtual SAN disks running on top of StarWind server and its overhead.

Then I created 11 different StarWind devices as per the table above and presented them to the ESXI hosts. I have also adjusted the number of IOPs to 1 before ESXi hosts switch the path to Virtual SAN servers as per VMware KB2069356. That means each host will be talking to both Virtual SAN appliances (the local and the remote ones) when StarWind device is mirrored to another appliance.

Then I have created a Test VM with 11 disks placed on corresponding Virtual SAN datastores.


That's the final picture of my datastores

And that's how the disks in Test VM look like




I hope you get the idea of the test setup now.

IOmeter was chosen as a test tool, however, it was taking IOmeter too long to create test file so I used RDFC (Random DataFile creator) utility to create test files.

Here is the IOmeter test specification: 

  • 20GB Test File 
  • 4KB Block size
  • 32 outstanding IOPs 
  • 100% Random, 67% Reads, 33% Writes, 
  • Duration 2 minutes.

The table below depicts the final test results using Round Robin multipathing




Now let’s talk about performance results with regard to each disk type.  

Flat

As you can see Flat device result is very close to the baseline result. That means the overhead of StarWind is minimal and can easily be neglected.

Flat-HA

Here you can see benefit of Read Striping when ESXi hosts take advantage of all disk spindles in both ESXi servers (387 vs 287). However, this positive effect applies to Reads only. If you have Write intensive applications, it is better to use Fixed multipathing policy.

Here is a nice article which also proves benefits of mutlipathing on StarWind

Flat-L1

L1 cache proved to be pretty helpful as it lets Virtual SAN servers deal with IO bursts. During the IOmeter test Virtual SAN cached all Writes in RAM while HDD served Reads only. Once the test was stopped I could see Virtual SAN de-staging all Writes onto HDD.

Here is how it looks in the performance graph



Flat-L1-HA

This disk config reflects further performance improvement, again due to Read Striping across both hosts.

While testing this particular disk type I found out how important it is for StarWind product to have the correct IOPS limit for Round Robin policy.

At first run I have default 1000 IOPS limit and what I noticed that both hosts didn’t seem to cache any writes. As soon as IOmeter started I could see that both Virtual SAN servers started to push writes to HDD. It looked like L1 cache wasn't utilised at all. 
This didn’t happen in previous scenario with standalone Flat-L1. However, when I changed IOPS limit to 1 the caching started to work again.

Here is what, I believe, happens with IOPS limit set to default value of 1000.
  • ESXi host sends 1000 IOs over the first path to the first Virtual SAN server.
  • Writes are stored in L1 cache. Reads are serviced from HDD.
  • ESXi host send another 1000 reads and writes over the second path to the second Virtual SAN server
  • While second Virtual SAN processes these IOs the first StarWind server thinks that workload is dropped to zero and starts de-staging data stored in RAM.
  • When ESXi host switches back to the first iSCSI path the second StarWind server starts dumping cache data on the HDD as well.

So you end up with the situation when both Virtual SAN servers have plenty of cache to accommodate gigabytes of Write IOs, but StarWind doesn’t take much advantage of it since it keeps data in the cache for very short period of time and constantly flushes the cache data to HDD, thus, impacting Read performance.

Therefore, with IOPS=1000 I was getting around 310 IOPS in IOmeter, but once IOPS limit was changed to 1 the performance jumped up to 600.


Flat-L1-L2

Even though the results are pretty good we can see that the presence of L2 cache didn't improve results as much as I expected. I had a chat with StarWind engineers and I was told L2 caching algorithm is going to be improved in the next build. Also, it ought to be noted that my test runs were pretty short to get enough data cached in L2 and to get significant advantage of it.

Flat-L1-L2-HA

Another example of benefits of Read Striping with Round Robin multipathing. Also, if you compare this result with Flat-L1-HA you can see that L2 cache hasn’t been useful at all. Again, let’s wait for a newer build and hope SSD commodity disks contribute more into overall performance.

LSFS

LSFS shows serious increase in performance compared to Flat disk -  341 vs 287. This difference is valid only for this particular test configuration. As it was mentioned before LSFS is pretty good in dealing with Random Writes, but it isn't efficient for Sequential Reads, for instance.

LSFS-HA

No surprises here. The result is almost doubled compared to standalone LSFS (341 vs 654) due to double number of Virtual SAN servers and physical disks servicing IOmeter.

LSFS-DD

Deduplication increases performance of the LSFS even further (359 vs 341). This happens because LSFS computes each data block's hash on the fly and will compare it with hash table stored in RAM. If this data already exists on the LSFS disk Virtual SAN will only have to update metadata table without writing actual data which can save some Write IOPs.

Deduplication applies to cache data too, which means more data can be stored in cache, thus, increasing cache hits ratio.

LSFS-DD-HA

Again, the performance is almost doubled by using Round Robin - 359 vs 690 IOPS.

LSFS-DD-L1-L2-HA

That's the final test scenario where I wanted to see how Virtual SAN behaves with all bells and whistles enabled. As you can see I could get amazing result of 740 IOPS on my very humble test hardware with data being mirrored between two 7.2K HDDs only while keeping latency under 50ms.

I have also tested the HA enabled devices in Fixed mode just to see what kind of performance I get if I blindly follow best practices documentation. 

I merged both - RR and Fixed - results into one table just to see what gives me the best results in my particular configuration




As you can see in all configurations the Round Robin multipathing was outperforming Fixed PSP. So, I need StarWind need to review and update their Best Practices documentation that says Fixed PSP is the only supported configuration on ESXi hosts. 





I know there are more device combinations and more combinations of IOmeter specifications to try, but I have that much of time on these tests. If there is interest in more test combinations and on different hardware I might write another post on this.

Again, I would like to stress out that these test results are given here for the sake of comparison of features and disk types. The test system does not meet the minimal system requirements and therefore test results are valid only for this particaluar setup. 




No comments:

Post a Comment