Monday, November 5, 2018

What if Data Locality was not a design choice in Nutanix AOS?



  • What if data locality was not considered in Nutanix AOS architecture like a few other HCI architectures available in the market? 
  • Would AOS suffer the same as other architectures for high throughput workloads such as DSS? 
  • When would the network bandwidth become bottleneck for such workloads (E.g., 10Gb network)?
Note: Please see my earlier post on whether data locality really makes difference.

X-Ray as usual comes to the rescue to answer these questions with in a couple of hours and help see the results ourselves visually. By keeping other design choices of AOS architecture remain the same, we wanted to experiment the results with and without data locality turned on.

I have got AOS 5.9 build with ESX 6.5, tuned the following params in AOS 5.9 and executed X-Ray's builtin scenario "Database Colocation: High Intensity".

  • Oplog's data locality (DL) is turned off  
  • Extent Store's data locality (DL) is turned off
  • Range Cache (RC) - DRAM data cache is also turned off
    • This removes cache impact in our experiments

The following charts were created from X-Ray runs. For readability, I have broken single screenshot into multiple images below.

  • Blue: Data Locality On
  • Green: Data Locality Off


OLTP IOPS stay steady and there is no impact to this workload.

Absence of data locality leads to slight increase in latency. Even during high throughput (DSS) workloads with large reads, there is no significant impact to latency.

DSS workloads are also unaffected even when data locality is turned off. I know this is not the case with other architectures.

Data locality is visible as network traffic charts indicates below.

Network traffic jumped to about 1.5GB/s. This all-flash cluster got a single 10Gbits network. It can handle network traffic upto 1.2GB/s which is the target throughput expected from two DSS workloads on two different nodes.

Nutanix AOS places two copies of the data for RF2 evenly across all nodes unlike other architectures. This leads to about 50% of data happen to be local and hence total network usage is only about half of workload generated.

The summary:
  • Nutanix AOS is still better off than other known HCI architectures even when data locality is turned off.
  • Network traffic is relatively low compared to other HCI architectures as data is evenly distributed across all disks and all nodes in Nutanix AOS.
Refer to my earlier post on whether data locality really makes difference.

No comments:

Post a Comment