The Growing Rationale for Deep Packet Inspection Benchmarks

Check out the latest EEMBC benchmarks that target Next Generation Firewall and Unified Threat Management including deep packet inspection.

Jeff Caldwell

Nov. 24, 2010

11 min read

Add Us On Google

1 of Enlarge image

Embedded Microprocessor Benchmark Consortium

The appearance of devices implementing Next Generation Firewall (NGFW) and Unified Threat Management (UTM) technologies at the network edge over the last five years is not the first time network managers have sought a common ingress point for security operations against threats that act on multiple OSI layers. The UTM system has its predecessors in encryption/firewall systems of the 1990s that combined bulk encryption using Data Encryption Standard or public-key encryption, digital-signature verification, and hardware-assisted firewalls. UTM and NGFW tasks require the use of deep packet inspection (DPI). The complexity of DPI and the inherent processing power requirements have escalated significantly, but the mission is still the same – at those chokepoints where an enterprise data center links with the public network, security appliances should handle multiple layers of key distribution, encryption, authentication, and verification.

Today, operations are carried out on packet contents at all seven layers of the Open Systems Interconnect protocol stack. The DPI system of 2010 performs stateful inspection, deep packet inspection (both integral elements of a firewall), intrusion detection and prevention, anti-virus, anti-malware, anti-spam, and content filtering. The NGFW also applies policy controls based on application type and the authentication of individual users. Such a set of tasks often requires dedicated RISC and/or DSP processors operating at both the control plane and data plane.

In fact, appliances implementing DPI represent an optimal case of why modern computer system architects talk about strict separation between the control plane, where management of compute operations takes place, and the data plane, where real-time operations are performed on streams of data entering or leaving a compute appliance. In traditional client-server IT applications, this segmentation only is relevant when considering very high speed server clusters. But where packets are operated on at the network edge, it is critical to look not only at the benchmarks for management and administration of a network element, but even more important, the ability to examine and manipulate packets at wire speed. These latter functions take place in the datapath, through “data-plane” operations. In past decades, these operations might be limited to analysis of the packet header, the portion of a data packet that identifies source and destination addresses and packet characteristics. But in recent years, appliances have been expected to perform “deep packet inspection,” in which the actual contents of a data packet are probed, both for characterizing the traffic and for identifying potentially malicious content.

The most common DPI appliance architecture has one or several data-plane processors operating directly on packet traffic at wire speed, along with a centralized control-plane processor that manages the operations of the packet engines that offload packet header inspection. In some DPI architectures, the Ethernet or T1/E1 processors that control transceivers on physical or data-link planes, also perform some rudimentary traffic management functions on behalf of the datapath processor. Crypto engines may be separate chips, or may be integrated in to network processors.

Benchmarking such functions, however, has taken a long time to catch up to capabilities. While the functions of DPI share commonality of definition not seen at the turn of the millennium, the specific implementations allow a wide disparity in how such functions are measured. For example, DPI engines can vary situationally, on the basis of throughput or packet latency introduced; quantitatively by the number of transformation engines offered per task, to operate on presentations or encapsulations; quantitatively in terms of numbers of applications or protocols; by application bandwidth in terms of the number of connections or flows supported; and qualitatively in the overall comparative efficacy of attack-vector detection, prevention, and eradication of a particular virus.

To date, there has been no common method of performance testing and validation of DPI throughput. Such a testing approach must consider the various threat vectors used in attempting to transfer infecting payloads into a network. Without a common standard by which to compare performance across all these variables, consumers of DPI technologies lack an objective means of selecting a solution from the myriad vendor offerings available today, and are often at a severe disadvantage when attempting to select the most suitable solution to protect their information systems. As a matter of fact, throughput numbers from various DPI system vendors have been known to be off by as much as 90% from those numbers claimed on datasheets and marketing collateral.

The Making of a Benchmark Standard

EEMBC, through its work on general microprocessor benchmarks, had been aware of the multi-faceted performance specifications for security edge devices, even before the designation “DPI” became commonplace. As an industry association, EEMBC members have traditionally focused their efforts on processor performance, defining and developing common suites of benchmarks, along with a certification process to ensure reliability and repeatability. From a processor perspective, benchmarking DPI systems presents an interesting challenge - even if you have two systems with the exact same chip, the performance will vary depending on which features the system vendor has implemented. Furthermore, network processor performance can only be estimated in the absence of a complete system, hence, in its definition and development of DPIBench, EEMBC focused on the testing of complete systems.

Extracted from our earlier discussion, the significant metrics for system performance are throughput, latency, and quantification of the number of concurrent flows. Although efficacy is also an extremely important metric, the analysis of the efficacy of next generation firewalls and DPI appliances is subject to too many issues due to the variety of methods utilized to track efficacy. Furthermore, for throughput and capacity, performance standards must come from the network side of the interface, because the relevant set of viruses and malware a DPI appliance has to confront can change daily, if not several times a day. A standard efficacy benchmark thus would have little relevance outside a single day’s virus portfolio. However, a measure of packet analysis at a given network speed would remain relatively constant even over changing content-filtering loads.

Thus, the goals established for the DPIBench involved defining benchmarks on the network traffic side of a DPI appliance utilizing test equipment from companies such as Ixia or Breaking Point Systems. The first task in the creation of such benchmarks involved identifying common protocols for the network side. The most important were judged to be HTTP, HTTPS, SMTP, P2P, and FTP. It is possible that higher-layer presentation and application protocols could be added as sites move to more general use of XML and related languages. The secure version of HTTP is included deliberately, as many attacks are now using redirection to encrypted methods and secure transport protocols to disguise the source of infection. What is important is not merely the use of the protocols themselves in packet transport, but which protocols use which ports. Port-spoofing and the use of non-standard ports must also be considered if a test methodology is to have validity in this context.

For an industry-standard benchmark, the identification of viruses represents a difficult area in which to achieve any cross-platform standardization, as many software and DPI appliance companies treat their virus signature databases as proprietary intellectual property. Many such databases employ different formats, and updates to the databases may take place at different times, utilizing different update methods. The WildList.org site represents a partial standardization of common virus signatures. For the purposes of DPIBench, the most recent suite of common signatures published at http://www.wildilst.org/WildList could be utilized. One must assume, however, that 20 to 30 percent of the entire content of WildList changes on a monthly basis.

As a first proposed model for a standard test regime, EEMBC has proposed running tests twice: a ‘C’ test measuring base performance, and a ‘V’ test in which viruses are injected at random points to verify DPI is being performed. The latter test validates the correct set of infected traffic flows are terminated. We have identified the following variables as relevant to the test:

Traffic Types, identified as I or G, where ‘I’ represents continuous data transfers to measure IDP techniques, and ‘G’ represents file transfers to exercise gateway anti-virus techniques
Data Model, identified as M or T, where ‘M’ represents minutes of maximum continuous traffic under IDP, and ‘T’ represents transferring file for anti-virus
Ports and Protocols, identified as S or N, where ‘S’ represents tests that run common protocols on standard ports, and ‘N’ represents tests running common protocols with altered ports numbers
Encryption/Compression, identified as U, Z, or E, where ‘U’ represents unencrypted traffic, ‘Z’ represents testing the transfer of compressed files, and ‘E’ represents the testing of encrypted traffic using HTTPS (SSL).
Efficacy Sanity Check, identified as C or V, where ‘C’ represents tests without any malware, while ‘V’ represents tests that include viruses patterns identified by WildList.org.

The assumption in developing such tests is that a full test suite can be run in 30 minutes or less. Tests are dynamic in their format, particularly due to the rapidly changing nature of virus signatures identified by WildList. It is important to run a clean test without viruses to generate a performance baseline, followed by tests with random virus insertion. The concern that someone could optimize a system for a benchmark could be minimized by constantly expanding your tests for the number of viruses detected, and updating the tests for newer viruses in WildList. Demonstrating benchmarks across various virus lists shows the variance potential of a DPI appliance, but it raises a question unique to security benchmarks: Can the industry handle less than 100 percent repeatability of the benchmark, because of constant virus updates?

IPS/IDP and Gateway Anti-Virus Test Types

EEMBC has developed a suite of test scenarios, eight of which are associated with IPS/IDP traffic, and eight of which are associated with gateway anti-virus, anti-spyware traffic. In all IPS cases, 1000 flows are spread across HTTP (or HTTPS), FTP, and SMTP. Gateway tests must show 100 successful file transfers. Full details of the test setups are available at the EEMBC web site, but can generally be represented by the matrix.

Matrix of the proposed DPIBench testing regime

Test	Description	1	2	3	4	5	6	7	8
Traffic Type	I: streams of continuous traffic (IPS)	X	X	X	X
Traffic Type	G: file tranfers (GAV)					X	X	X	X
Data Model	M: minutes of continuous traffic	X	X	X	X
Data Model	T: transferred file size					X	X	X	X
Ports and Protocols	S: standard port #s used to pass traffic			X	X		X		X
Ports and Protocols	N: non-standard port #s used to pass traffic	X	X			X		X
Encryption and Compression	U: cleartext traffic used	X		X		X	X
	Z: traffic is compressed (file transfers)							X	X
	E: traffic is encrypted (SSL)		X		X
Efficacy Sanity Check	C: clean traffic, generating DPI benchmark value	X	X	X	X	X	X	X	X
Efficacy Sanity Check	V: viruses injected, validates DPI actions

IMSUC
IMSEC
IMNUC
IMNEC
GTSUC
GTNUC
GTSZC
GTNZC

These test scenarios can rely on virus signature databases as small as 32 signatures, or larger than 25,000 signatures, and the databases can be comprised of signatures with different certification levels. The comparison methods can range from simple pattern-matching to complex, multivariate state-based identification.

In a typical test setup, the system under test will be connected to a test device capable of generating test traffic matching the conditions of the above IDP and gateway test scenarios. A system under test will act as a router which transfers traffic from source to destination IP address.

Complete test conditions have yet to be finalized by membership and the organization encourages others to get involved in developing this standard benchmark suite. What is needed for further DPIBench development is consensus on secondary issues of test setup, as described above, and agreement on common test and certification procedures as has been demonstrated in the past for other EEMBC benchmarks. An example list of current (2010) product lines which may be able to achieve DPIBench ratings include, but is not limited to CheckPoint UTM-1, Cisco ISR G2, Fortinet Fortigate, Juniper SRX, McAfee Firewall Enterprise, Netgear ProSecure, Palo Alto Networks PA Series, SonicWALL NSA, Watchguard Firebox. Finally, at the end of this process, consumers of firewall technologies will have an objective means of selecting a solution from the myriad of vendor offerings.