Data centers must be purpose-built to handle current and future workloads - evolving rapidly and driven by high volumes of end users, application types, cluster nodes, and overall data movement in the cloud. In turn, the cross-sectional bandwidth of these cloud-scale data center networks is increasing rapidly, outpacing the increase in physical link speeds. ECMP (Equal Cost Multipathing) and port-channeling are common implementations that construct point-to-point higher capacity logical paths using multiple redundant parallel physical paths. Traditionally, both ECMP and port-channel implementations attempt to distribute flows uniformly across physical links that form the logical path. Deciding which flows use which physical link has been traditionally based on a static hash of a fixed set of fields from the packet header. This static hashing scheme is sub-optimal and gives rise to network polarization whereby multiple traffic flows may traverse and burden the same link and leave other links underutilized.
This inflexible, static approach no longer suits the scalability needs of today’s data centers. Fortunately for network designers, a range of new hashing enhancements improve network performance and overcome limitations imposed by traditional schemes. To capitalize on these advancements, network operators must understand current data center traffic trends, their implications and how related hashing enhancements can effectively impact cloud network performance.
Table of Contents
- Understanding Traditional Hashing Mechanisms
- Key Implications of Today’s Data Center Traffic
- Resilient Hashing Effectively Addresses Link Failure
- Newer Encapsulations and Protocols Managed by Flexible Hashing
- Symmetric Hashing Ensures Packet Traceability and Stateful Protocol Debug
- Intelligent Hashing Optimizes Cloud Network Visibility and Performance for the Long-Term
Traditional load balancing systems split traffic bound through a logical fat link to multiple outgoing physical links as shown in Figure 1. Typically, the physical link corresponding to a flow is ascertained by calculating a hash based on packet header fields and a subsequent modulo operation based on the number of physical links. A good load balancing system should be able to evenly split the traffic to the multiple outgoing links. In addition, packets belonging to the same flow should flow out in order to the end destination.
This static hashing scheme has worked well in carrier and enterprise networks and was later borrowed for use in the data center. However, considering current trends in the data center traffic patterns, this scheme is no longer effective for data centers and cloud networks.
Web, application, and database server applications running as VMs (virtual machines) that can reside in any server in any rack – coupled with the increased use of clustered applications (such as Hadoop) in modern data centers – results in increased east-west traffic patterns in data center networks. Such east-west traffic includes server-to-server, server-to-storage, and server rack-to-server rack. This trend is changing the inherent design of network topologies, from oversubscribed and tiered networks to fast, fat, and flat networks which require new features in network switches.
Driven by the latest silicon advances, increased bandwidth and port densities of switch-on-a-chip systems, traditional 3-tier network designs are quickly being replaced by these fast, fat and flat networks which are comprised of resilient and flexible CLOS topologies with very high cross-sectional bandwidth.
For designers, this shift results in some essential implications for network deployments. Data centers of massive scale, for example those using several thousand links, will experience frequent link failures resulting in network polarization. Even under such failure conditions, deployed networks must perform normally and deliver packets in order. Further, due to newer protocols and encapsulations introduced routinely as a means to improve data center automation and network management, newer packet header fields redefine flows. This in turn redefines how packets need to be treated. The introduction of new network features makes it even more critical to incorporate the general ability to debug and trace packets. Lastly, with the adoption of cloud hosting services, security has become ever more prominent and essential in network operations. Stateful packet inspection and intrusion detection systems will only continue to gain importance. Driven by these requirements, network operators are turning to switch architectures optimized for cloud networking.
Figure 2 demonstrates traditional static hashing during an episode of link failure.
M physical links are used to form a logical fat pipe. The static hash scheme uses a modulo-M operation to associate a flow with a physical link. In case of a link failure, this modulo operation changes to a modulo-(M-1) operation. In this scenario, even the flows that did not originally flow through the failed link may be assigned a new link. This in turn, may temporarily result in out of order packet delivery for even the flows that were not using the failed link.
In contrast, the Smart-Hash Resilient Hashing (Fig. 3) scheme incorporates a resilient hashing engine to associate flows with physical ports. In case of link failure, only the affected flows are redistributed uniformly across the remaining good physical links. Flows originally using the good links remain unaffected and are not reassigned to a new link.
Data center features are evolving rapidly, illustrated by the accelerated adoption of tunneling protocols such as VXLAN (Virtual Extended LAN) and NVGRE (Network Virtualization using Generic Routing Encapsulation). The common advantage of these tunneling schemes is that transit switches in the network primarily operate on the outer headers, so only switches in the periphery need to treat these packets differently.
From the vantage point of the L2oL3 (layer 2 over layer 3) transit switch, the packet’s inner header changes more frequently than its outer header. Traditional static hashing schemes are based on a standard set of fields in the outer packet header. With minimal variation in the outer header, it is difficult for network transit switches to distribute packets evenly across physical paths. Flexible Hashing enhances the hash to include more packet header fields at programmable offsets, and also supports L3, L3 Tunneling, and L4 packets.
Symmetric Hashing ensures that packets belonging to the same bi-directional network communication travel the same physical paths in both directions. This is necessary for intrusion detection systems, placed inline on the network to analyze higher level bi-directional transactions at the packet level. For network designers, this is also a useful and convenient debug feature. For example, protocol inspection devices can simply rely on the data captured on a single physical port. With this data, the analyzers will be able to reconstruct higher level transactions.
Figure 4 illustrates two service modules connected to the ToR (top of rack) switch through an EtherChannel. Service module0 must see IP traffic flowing in both directions. Symmetric Hashing in the ToR normalizes the hash computation and yields the same hash value for packets flowing in both directions. As a result, packets flowing in both directions travel through port0 to the intrusion detection device.
Data center networks are a hotbed for innovation and are seeing unprecedented growth in recent years. Large enterprises are building enormous data centers with massive scale. Others are choosing to host their data centers in the cloud. Without smarter and more sophisticated hashing technologies that provide greater flexibility and network visibility, data centers of today’s massive scale may suffer from inefficiencies and performance challenges when deploying new and rapidly evolving protocols.
Traditional static hashing schemes work well for enterprise and carrier networks, yet the same approach – often adopted for data center networks – is proving the need for smarter solutions, optimized for cloud network environments and performance demands. More intelligent solutions, demonstrated by Broadcom’s Smart-Hash technology and its range of enhancements, are necessary to effectively manage the requirements imposed by current trends in cloud and data center networking. This type of advanced technology offers an alternative to design approaches based on static hashing schemes that can lead to prohibitively poor application performance under typical data center traffic loads. Cloud network operators are already facing daunting challenges in scaling their network infrastructure to tomorrow’s workloads – by understanding alternative hashing options, network designs are future-proofed and ready to deliver on critical long-term performance requirements.