On Networking Field Day 17 VeloCloud by VMWARE presented at the VMWARE HQ.
After the aquisition of VMWARE the VeloCloud Team put together an interesting overview on their SD-WAN solution and the features and capabilities it has.
I will focus on one point that to me is especially interesting:
“The Elephant flow problem”
All SDWAN vendors can use multiple WAN uplinks to load balance traffic across these connections. But that is in most cases only per session based load balancing. Works in most cases fine when we have a lot of sessions that can simply be destributed across multiple links.
It becomes more difficult when it comes to “Elephant Flows”. A typical example is backup traffic or large file transfers. I have seen more than one time that a backup wasn´t finished over night and at the next morning when the first useres are showing up everything was terrible slow and the WAN uplink was still at 100% utilization.
In the past we normally solved the problem with more bandwidth. If you have just one big pipe your heavy elephant can run faster across the big road with more lanes. When you have multiple WAN Uplinks there are some challenges that need to be adressed.
Will use here the Autobahn as comparison to descripe the Problem. It helps for your heavy transport vehicle to have more lanes on the Autobahn. But if you have only 3 roads to transport one big load you need to disassemble it into multiple smaller packets that than can be loaded into smaller Trucks and transported to the destination.
Now comes the challenging part. Before the packet can be send out to the LAN Interface it needs to be reassembled.
Out of order packets: some packets will arrive not in the right order, so they need to be buffered until all Parts have arrived before they can be reassembled.
Packet Loss: Maybe some of the packets will need to be retransmitted
Track the Link Quality: during the hole process the link characteristics may change regarding to latency and throughput
Packet Size: On the internet uplinks the maximum MTU can be smaller than on a private WAN. For the IPSEC encryption and additional internal headers the maximum payload that can be forwarded needs also to be reduced. VeloCloud has also a feature that addresses this problem and can provide a virtual MSS Maximum Segmant Size for TCP packets.
The Networkautobahn View
Amazing Feature. The elephant Flow problem was not solved by L2 Link aggregation in the LAN or L3 Routing in the WAN. One Flow was forwarded only over one Link and that was all we got. To get it done right is quite challenging and maybe one of the reasons why it wasn’t available sooner.
I still can remember some Netscreen devices that melt down when they had to do some packet dis/reassembleing over IPSEC tunnels. The CPU was at 100% load and you had nearly no throughput.
Also to get Jumbo Frames transferred accross WAN Links makes me excited. That is in particular interesting if you like to run NSX across your SDWAN infrastructure.
I would like to see that in action and also curious about how much impact that will have on the CPU of the VeloCloud Edge devices.