Switched Fabrics, DDS Boost Distributed Data-Critical Systems

Sept. 28, 2006

13 min read

High performance and predictability are prerequisites for any large-scale networked system that depends on real-time data processing and analysis. Data representing actual events or system status must be evaluated while it's still relevant to tactical conditions, making it imperative to know when specific data is available, and to aggregate and evaluate that data in real time. Unreliable receipt times make effective analysis difficult or impossible.

Fast and predictable performance is always an issue in the design of such a system. This is especially the case when designing distributed systems with thousands of nodes that must move lots of data around quickly in a dynamically changing environment.

Switched-fabric networks, which can provide fast and highly scalable hardware solutions, are now increasingly finding their way into such applications. What's needed beyond that is a software solution for bringing predictability, flexibility, and reliability to distributed data communications. In this article, we will describe how the Data Distribution Service (DDS) data-centric publish subscribe middleware layer can realize the full potential of a hardware switched-fabric network to deliver a complete solution for application developers.

Data-Critical Systems Share Characteristics Many large-scale data-critical applications can be characterized by three attributes: the need to gather and distribute data in real time, the large amount of data being transferred, and the various entities involved in this data exchange (which may even change over time). For instance, data-critical systems like air-traffic control; financial transaction processing; battlefield, naval command, and control; or industrial automation all feature these three attributes.

Such systems aren't necessarily hard real time, but their predictability requirements represent an integral part of the functions they perform. They gather data from various sources (e.g., sensors) and distribute the data to a number of users like databases, display devices, or control algorithms. Furthermore, by their very nature, they are distributed.

Today's bus-based architectures, typically multi-CPU, VME backplane solutions with hard-wired I/O interfaces to sensors and effectors, fall short in several areas in addressing the needs of data-critical systems. For example, these hardware transport mechanisms don't scale, are difficult to make fault-tolerant, and are tough to modify and upgrade once they've been deployed.

For these reasons, designers of complex, data-critical distributed systems are turning to switched fabrics to replace bus backplane and serial interconnect technologies. StarFabric, PCI Express Advanced Switching, Serial RapidIO, and InfiniBand are some commercial products that implement different switched-fabric designs.

A switched-fabric bus is unique in that it allows all nodes on the bus to "logically" interconnect with all other nodes on the bus. (Fig. 1). Each node is physically connected to one or more switches. Switches may be connected to each other. This topology results in a redundant network or "fabric," in which there may be one or more redundant physical paths between any two nodes. A node may be logically connected to any other node via the switch(es). A logical path is temporary and can be reconfigured, or switched among the available physical connections. Switched fabric networks can be used to provide fault tolerance and scalability without unpredictable degradation of performance, among other features.

Figure 1: Switched fabric architecture. Multiple switches can be used to expand the fabric and provide hardware redundancy.

Switched Fabrics and Data Distribution Service A key characteristic of switched fabrics is that they allow peer-to-peer communication between nodes without having to physically connect every node to every other node. With every node physically connected to every other node, adding a new node is exponentially more and more expensive with an increasing number of nodes. Because a switched-fabric network employs switching to achieve logical connectivity and reconfigurability, these systems can be designed to be highly scalable.

On the software side, publish-subscribe communication systems map very naturally onto switched fabrics. Publish-subscribe systems work by using endpoint nodes that communicate with each other by sending (publishing) data and receiving (subscribing) data anonymously via topics. A topic is identified by a name and a data type. A data producer declares the intent to publish data on a topic; a data consumer registers its interest in receiving data published on a topic. The middleware acts as the glue between the producers and the consumers. It delivers the data published on a topic by a producer to the consumers subscribing to that topic.

There can be as many topics as needed, a producer can publish on multiple topics, and a consumer can subscribe to multiple topics. The middleware layer isolates the data producers from the consumers?they have no knowledge of each other (Fig. 2).

Figure 2: The Data Distribution Service (DDS) data-centric publish-subscribe architecture anonymously sets up direct data flows between DataWriters and DataReaders, resulting in scalable and fault tolerant data distribution A publish-subscribe software architecture allows producers and consumers to be loosely coupled. Therefore, it's naturally scalable, and can easily adapt to the changing needs of distributed data-critical systems. The producers and consumers are peers?they communicate directly with each other, so that the topology of publish-subscribe systems can be closely matched to that of switched-fabric systems. Thus, a publish-subscribe middleware layer can fully exploit the potential switched-fabric network hardware.

The Data Distribution Service (DDS) standard (see "The DDS Standard," ED Online 13516) specifies a data-centric publish-subscribe middleware layer, developed with the needs of distributed data-critical applications in mind. A well-designed DDS middleware implementation can be good at real-time data distribution, be easily field-upgradable, and be transport agnostic. It can be better at real-time data distribution because publish-subscribe is more efficient than the traditional request/reply-based architectures in both latency and bandwidth for periodic data exchange. Furthermore, it can be easier to upgrade in the field because publishers and subscribers don't care about the type or amount of counterparts. And, finally, since the middleware is layered on top of the physical means of getting the data from one place to another, it needn't depend on the network transport or topology used.

New Choices for System Architects This marriage of switched fabrics and DDS real-time middleware offers architects new flexibility in adding capabilities that were once quite difficult to achieve. Many of the features offered by switched fabrics have complementary capabilities in the DDS-compliant middleware. For example, switched fabrics typically offer rich error-management features, such as the ability to recognize, report, and route around failed paths. With DDS-compliant software, system designers can also take advantage of DDS error-reporting facilities.

A key feature of switched fabrics is support for multiple paths between nodes (Fig. 3). With such support, system are able to easily implement multiple physical interconnects that can be combined with sophisticated error management. Likewise, with DDS, applications can take advantage of redundant publishers that have different strengths or bandwidths. When a higher-strength publisher fails, one with lower strength is automatically switched in by the DDS middleware. In addition to fault tolerance, this can also help with load balancing on heavily used networks (Fig. 4).

Figure 3. DDS publish-subscribe involves direct anonymous communication between producers and consumers of data. Topic A has a primary producer 1, and a backup producer 2. Note that nominally when producer 1 is active consumers do not receive data from the backup producer Switched-fabric specifications already provide for a hot-plug or hot-swap capability. This hardware capability can be combined with a "virtual" hot-plug capability at the application level using DDS middleware. Unlike traditional tightly coupled client/server architectures, DDS middleware allows producers and consumers to be dynamically added or removed in an operational system.

Figure 4. DDS provides for automatic failover. When the primary producer 1 of topic A crashes, the middleware automatically switches to the backup producer 2 of topic A. The consumers get unin Many switched fabrics provide sophisticated features that allow, for instance, bandwidth-reserved, isochronous transactions across the fabric, something that is not supported by, say, Ethernet. Corresponding to the hardware QoS facilities, DDS-compliant middleware can offer a number of QoS policies that make predictability at the application level possible. For instance, the TRANSPORT_PRIORITY policy allows developers to manage how they prioritize one data flow over another.

The Roadmap for Distributed Data Services The existence of DDS as a standard specification endorsed by the Department of Defense (DoD) paves the way for addressing the challenge of distributing data among a myriad of defense systems. DDS is now mandated for data distribution by the Navy Open Architecture Computer Environment (Navy OACE), and DISR (DoD Information Technology Standards Registry FCS Future Combat Systems) and has already been adopted by programs such as FCS, DD(X), LCS (Littoral Combat Ship), and SSDS (Ship Self Defense System).

But despite the existence of a standard specification, the value of the solution is highly dependent upon its implementation. The specification defines certain features and capabilities, but not how they should be implemented.

A carefully designed middleware architecture can reduce the likelihood of a fault, limit the damage of a fault if it does occur, help detect faults immediately, protect the middleware from errors in application code, and isolate applications from errors in other applications. That architecture can also deliver significant advantages in the performance and flexibility of network distributed data communications.

For example, the DDS specification defines how a publish-subscribe communication model should work for a distributed real-time network. The DDS specification defines DataWriters for publishing and DataReaders for subscribing to a single topic on a user-defined data type. This in itself is standard and straightforward, but how it's implemented can significantly impact network performance and scalability.

A robust implementation improves both performance and scalability by defining an architecture that supplies each DataWriter or DataReader with a queue that buffers messages bound for another endpoint through a transport. This architecture supports direct end-to-end messaging, since each endpoint (a DataReader or DataWriter) in each application communicates directly with a sister set of endpoints. Each endpoint has a dedicated set of buffers to hold messages in transit to other endpoints.

Such a queuing architecture provides for an optimized transfer of messages from DataWriter to DataReader, no matter where each resides on the network. Also, because the endpoints queue and buffer transmissions to other endpoints, this architecture can easily scale to large and complex networks with predictable delivery times.

In a similar manner, DDS defines the concept of a "DomainParticipant," which is the fundamental container entity that can participate in a publish-subscribe network. A DomainParticipant can contain many DataReaders and DataWriters. Typical applications may use only one domain, and therefore have one DomainParticipant. However, applications are free to create several DomainParticipants, so that multiple instances of this entity can exist simultaneously.

Multiple execution threads are a way to optimize responsiveness and performance, while also allowing the system to scale across a broad fabric-based network. One possible approach is to use several dedicated threads for each DomainParticipant, in this manner:

Event thread: Manages both timing delays and periodic events, such as protocol heartbeats, deadlines, and liveliness.

Database cleanup thread: Purges old information from the internal data structures, such as publication declarations and subscription requests.

Receive threads: They process the data packets received from the underlying network transports. A receive thread is created per transport "port," which represents a transport specific resource for receiving incoming messages.

When the application provides new data to the DDS middleware, the message passes all the way through to the network in a single operation. In the user's thread context, the message is serialized, deposited into the writer queue, encapsulated into a wire-protocol packet, and passed to the transport for delivery.

In the common case, the entire operation's critical path takes no inter-application locks and suffers no context switches. The event thread is only involved if the initial transport operation fails, or if it must execute follow-on processing (such as ensuring reliable delivery).

The event thread has ready access to the message, since it's already stored in the writer queue. When the transport receives a new packet, the appropriate receive thread processes the packet, retrieves the message, stores it in the reader queue, and immediately executes the listener callback. In the common-case critical path, there are no inter-application locks or context switches. If the application requires the message to be handled with user threads, it can do so with DDS WaitSets. Both flexibility and performance are optimized, even as the network scales.

Performance can also be impacted through the poor use of the code execution path. Since lock contention can have a significant detrimental impact on performance, fast path optimization takes data to or from the network transport to the application using a single mutex per message, greatly simplifying the locking protocol.

Finally, instead of using lists to store the information needed to dispatch and manipulate messages, hash tables can be used. Although hash tables are more complex than lists, they have constant time access, provided that the initial allocation of space is sufficient. Regardless, in the worst case, access time is logarithmic, which is better than linear linked lists.

Implementation Optimizes Performance, Flexibility, and Reliability Implementing the DDS specification can impact the three critical characteristics of data communication over a distributed network: reliability, performance, and flexibility. Alternatively, a poor implementation of the DDS specification can mean that the architecture works well under certain optimal implementations, but fails to take advantage of greater resources or to scale as the network grows.

Data-communications system developers don't want to change their application code when the fabric is updated, changed, or augmented. However, many possible implementations can deliver suboptimal results when the network topology changes. A DDS implementation can take this into account, enabling easy re-optimization of the application. Thus, the application will be able to deliver a comparable level of performance in the face of evolving and changing fabrics.

As switched-fabric technology advances, the middleware must support those advances by being able to adapt to new transport mechanisms and different resource requirements and availability. Being able to plug-in different transports in the middleware layer makes it possible to more easily incorporate new fabric technologies as they become available without making any changes at the application layer.

With a superior implementation of the DDS standard, network performance can be optimized to the particular application. It matches the performance needs with the underlying fabrics and availability of system resources, such as memory. The design's flexibility allows it to target a broad array of applications and network topologies by supporting many transports and maintaining individual resources for each connection. Finally, the design avoids most key single points of failure, increasing reliability.

References:

OMG, "Data Distribution Service for Real-Time Systems, v1.1," Document formal/2005-12-04, http://www.omg.org, December 2005.
Pardo-Castellote, Gerardo, "OMG Data Distribution Service: Architectural Overview," IEEE International Conference on Distributed Computing Systems, 2003.
Cotton, David B., "Switched Fabrics and the Military Market," COTS Journal, April 2005.
Ashad, Nauman, Dewar, Stewart, and Stalker, Ian, "Serial Switched Fabrics Enable New Military System Architectures," COTS Journal, December 2005.