In a short time, network infrastructure bandwidth has scaled exponentially to 10 Gbits/s, with designs for next-generation Ethernet promising between 40 and 100 Gbits/s. At the same time, the complexity and breadth of applications that run over these backbones are drastically changing.
Accordingly, to accommodate the ever-increasing need for network security, control, and visibility that’s required for network traffic, communications equipment needs to be protocol-, content-, and application-aware at increasingly higher speeds.
Deep packet inspection (DPI) is the ability to analyze and understand network traffic at L2-L7 for security, service assurance, quality of service (QoS), and application rate-limiting. DPI provides more extensive and detailed flow awareness to network applications than simple L2-L4 classification by examining the packet contents, as well as the packet headers (Fig. 1). DPI also enables network administrators to examine traffic at all network layers across a series of datagrams, giving insight into the source, destination, application, and intent of the traffic in question.
In contrast to DPI, traditional classification only provides L2-L4 header analysis and is not a dependable mechanism to determine protocol and application—nor is it an adequate technique to analyze specific application-level details within a flow between a set of hosts. Many protocols don’t use standard Internet Protocol (IP) values or non-standard and negotiated TCP/ UDP port numbers for connection establishment. Application and protocol identification is often buried further in the packet or spread across several packets in the transaction, rendering individual packet header analysis ineffective.
This need for greater content awareness applies not only to fixed LAN/WAN-based networks, but increasingly to mobile networks, too. With the bandwidths offered by 3G wireless and Long-Term Evolution (LTE) networks (up to 100-Mbit/s downloads), along with the converged data, voice, and video services that users will utilize over these networks, we can expect wireless networks to support all of the same services as fixed networks and share vulnerabilities to the same types of threats.
DPI AND PLATFORM CHALLENGES
To satisfy the needs of network operators for DPI-based solutions, the platforms on which these applications are hosted share a common set of requirements. Deep-packet-inspection implementations must:
• Support traditional analysis of common L2-L4 packet header fields, including source and destination IP address, IP protocol, source and destination TCP/UDP port numbers, DiffServ codepoint (DSCP), and ingress interface/VLAN
• Support analysis of all network protocol layers and full packet payload/content
• Support identification of applications with static, dynamic, and negotiated protocol and port fields
• Be able to interrogate multiple packets during session establishment extending beyond the standard TCP handshake (SYN, SYN-ACK, SYN)
• Support a signature database for identification of common applications
• Based on application and protocol identification, support the ability to parse the traffic and provide the entire flow or relevant portions of the flow to other applications
• Be completely flexible and programmable to handle the everchanging and evolving set of protocols, applications, services, and threats
• Provide full analysis at line rates; if only a portion of data can be deeply inspected without loss, the inspection process provides little value
• Support inline (active) and offline (passive) configurations
• Support the ability to take a varied set of actions based on packet and flow analysis, including active and passive packet dropping; marking or tagging of traffic; content insertion; queuing/ policing/shaping/rate limiting of flows; redirection; load balancing; and counting/metering/statistics gathering and analysis
A NEW DPI ARCHITECTURE IS NEEDED
To achieve these combined DPI requirements, along with the increasing need for network I/O virtualization, a high-performance flow-processing architecture is necessary. This is best achieved through a heterogeneous processing architecture that combines virtualized I/O network-coprocessing with multicore x86 general-purpose CPUs (Fig. 2).
An example of this architecture is the new generation of network flow processors (NFPs) designed by Netronome. These processors are intended for use in heterogeneous processing architectures. The 40-microengine (ME) core NFP with integrated cryptography supports tight coupling with general-purpose multicore CPUs over PCI Express (PCIe). Because these ME cores contain instruction sets optimized around high-performance packet processing, detailed packet and flow inspection can be performed on greater levels of traffic without compromising either security or performance.
The NFP cores operate at 1.4 GHz, each supporting eight threads. This makes it possible to perform almost 2000 operations per packet on 10-Gigabit Ethernet line-rate traffic for minimum-sized datagrams (64 bytes). Such speed and flexibility enable developers to do far more than simple packet forwarding; they also can apply complex algorithms to both packet headers and content in hardware at line rate.
Depending on the nature of the application, most DPI functionality is implemented in NFP ME cores. At times, these powerful microengine cores will be augmented by tightly coupled general-purpose processors or by specialized hardware like cryptography acceleration or regular expression hardware for pattern/signature matching to further extend expressiveness. A majority of these new solutions ideally suit such heterogeneous processing environments.
Continue to page 2
The new line of processors integrates an ARM11 core at 700 MHz; a Gen-II PCIe eight-lane interface for high-speed communications with multicore x86 processors; virtualized I/O support; cryptography acceleration; and high-speed interconnects, including XAUI, SPI, and Interlaken (ideal for interfacing with regular expression hardware) (Fig. 3).
In addition, the NFP is source-code compatible with the Intel IXP28XX network processors. This makes possible investment protection for existing field-proven networking algorithms, while simultaneously accelerating the time-tomarket for new-generation products at the lowest development cost and risk.
Traditional network and communication processors are not adequate to meet these extensive and challenging L2-L7 requirements at such sustained line speeds. Other processing solutions such as multicore MIPS architectures are capable of DPI, though with performance penalties.
These architectures require all packet processing to occur in general-purpose processors. However, such processors lack integrated security and the required high-speed data-plane interconnect for communication with higher-speed multicore processors or look-aside packet-processing hardware.
As network speeds increase, the multicore MIPS model doesn’t scale in situations where every packet of every flow needs DPI. An alternative processing solution, fixed-function network processors (NPUs), may operate at high data rates. However, these devices provide neither the programming flexibility to run DPI algorithms in hardware nor the ability to parse data beyond L2-L4 headers.
GRANULAR FLOW ANALYSIS WITH DPI
Solutions available today range from high-performance, highly programmable network flow processors for custom designs to PCIe-based acceleration cards for x86-based platforms. This NFP hardware is coupled with software modules/blocks designed to leverage the intelligence and speed of these flow processors specifically tailored for DPI.
Networking manufacturers can leverage DPI capabilities through multicore MIPS or fixed-function NPU hardware, but are limited in flexibility or performance. Often, these architectures present challenging programming environments for developers. Exposing the DPI abilities of the network flow processor via simple application programming interfaces (APIs) provides a more scalable and flexible approach.
These APIs provide an abstraction layer that hides packet processing occurring in hardware-based MEs, as well as a hardwarebased solution to easily identify applications and protocols based on patterns and behavior analyzed deep within a flow. This APIbased approach permits a user to classify, detect, and act upon specific protocols and applications while retaining the comfort of Linux and x86 as a development platform.
New technologies can provide a full suite of L2-L7 flow analysis and DPI capabilities, including:
• Classification of flows based on values of well-known packet header fields of Ethernet (including 802.1p/q) and IP
• Identification of applications and protocols with fixed or wellknown TCP/UDP ports or IP types
• HTTP 1.0/1.1, including embedded transactions and chunked encoding
• E-mail protocols, including POP3, SMTP, and IMAP
• Additional protocols embedded in HTTP, such as SOAP and Web conferencing
• Associated media and data flows, including FTP and SIP
• VoIP, IPTV, and other streaming media, such as SIP, RTP, and VLC
• IP tunnels, including GRE, L2TP, PPTP, IPsec, IP in UDP, or TCP
• Common peer-to-peer (P2P) applications, such as BitTorrent, Gnutella, FastTrack, Jabber, and WinMX
In addition, the technology allows active (inline) or passive (offline) enterprise and networking applications to perform one or more unique actions on flows once they have been identified, including:
• “Cut-through”: All classified flows are switched through the appliance in hardware by the NPU
• “Load balancing”: Flows can be load-balanced across CPU cores for added application performance
• “Redirection”: All classified flows are diverted to the CPU for processing by the host application
• “Tee”: All approved flows are cut-through the appliance; in addition, select classified flows are copied to the CPU where further processing can be performed by the host application
• Tunnel mode: IP tunnels (e.g., GRE) can be terminated and reoriginated, enabling the creation of virtual overlay networks
• Statistics and Monitoring mode: Detailed packet and flow-level statistics are gathered and general application and flow monitoring are performed (Fig. 4).
Specific pre-defined fields of protocol are identified and exposed to the user in an easy-to-comprehend format. For example, when HTTP is classified, the user has access to fields within the protocol, such as the HTTP request header fields, request URL, request content type, and much more.
Classifying the protocol and segmenting the data into wellknown fields saves development time and provides an easier DPI model to work with. Conversely, other DPI solutions simply provide a pointer to a packet, forcing the developer to locate and interpret the fields of interest for each supported application.
Identifying an application isn’t the only function of DPI. Based on DPI analysis, this approach also can apply actions to flows (classified by the 5-tuple within an application session), rather than to single packets, offering powerful, stateful flow analysis to applications.
After identifying an application, the NFP can break down the flow data into fields that represent the segments making up the protocol or application. Dissecting and presenting flows in this manner lets developers easily understand and analyze the protocols that require inspection. Once DPI is complete on a portion of a flow, actions like flow termination, fast-path/cut-through, rate limiting, guaranteed QoS, and load-balancing to x86 multicore CPUs can be implemented on subsequent packets of that flow.
Continue to page 3
DPI FOR FUTURE SOLUTIONS
As part of the next generation of deep-packet-inspection solutions, it may be necessary to extend the DPI capability provided through the APIs for the network flow processors or flow engine acceleration cards. This is accomplished by coupling the NFP’s high-performance intelligent data plane with a software development kit (SDK) to extend the available DPIs. As a result, users will be able to easily design, develop, or enhance their own particular DPI solutions.
The software components that complement the network flow processor through the SDK are reusable pieces of code that can be used as a development reference to reduce time-to-market or as complete functional blocks of software in finished DPI products. These blocks will range from very small data or memory operations to complete functional blocks of code for complex packet-processing tasks. Such tasks include header or content classification, load balancing, IP forwarding, Ethernet switching, and TCP termination.