As the cost to create ASICs continues to climb, a board design featuring multiple, more generic ICs becomes attractive. But arriving at a standardized, high-speed board interconnect has proved difficult.
The XMOS Link, as implemented in the XS1 family of programmable devices, can be used as a standard board interconnect for seamless connection between multiple devices to address the problem. In particular, an XMOS Link can be readily implemented in an off-the-shelf microcontroller.
The XMOS chip architecture combines a number of processing cores (“XCores”), each with its own memory and I/O. The cores run at 400 MHz/MIPS, and they directly support communication, ports, and concurrent processing through multiple threads. C is supported, as well as “XC,” a C-like language with added capabilities for inter-thread communication and I/O. An on-chip switch supports communication between processors, and external XMOS Links are used to communicate with other chips.
XMOS Links enable communication between all processors in the system by allowing streams of data and control information to be transmitted with low latency across the network. Streams are circuit switched, but can be set up and terminated easily, allowing them to be used to create a packet-switching network.
From an application perspective, the architecture provides a channel end resource, with communication occurring between two channel ends. The programming model is independent of the destination channel end’s location, which may be on the same processor, on different cores within the same device, or on separate devices.
XMOS Link communication uses a non-return-to-zero transition- based scheme. Transmission involves a stream of tokens, each consisting of several symbols that may encode one or more bits depending on link mode. A token contains 8 bits and a control token flag. There are two modes of link operation.
The slower, serial mode uses two data wires, “0” and “1,” in each direction—four wires in total. A single transition corresponds to a single bit of information. The level of the wires is irrelevant; a transition should never occur on both wires simultaneously. For each token transmitted, 10 transitions (symbols) are sent. The first eight are data, followed by a control token flag, and lastly an even parity bit.
The faster link mode uses five data wires in each direction, with 10 wires in total. There are four data wires and an “escape” wire. Four transitions (symbols) are required to transmit a token, with a transition on the escape wire signaling a control token. A token transmitted in fast mode may result in zero, two, or four wires being high. To return to zero, an optional return-to-zero NOP token can be sent, resulting in all five wires being low.
A link is clocked from the System Switch, which runs by default at 400 MHz. The speed of a link can be adjusted by changing the number of clock cycles between tokens and the number of clock cycles between symbols. The minimum value for each field is 2, and the maximum is 2049. This results in a data throughput of 156 kbits/s to 160 Mbits/s for the serial link and 390 kbits/s to 400 Mbits/s for the fast link. The System Switch itself can also have its speed dynamically lowered using an 8-bit divider.
XMOS Link wires are multiplexed with standard generalpurpose I/O pins on XS1 devices. Software can enable the links, which will then take priority on the multiplex.
The link uses a full-duplex point-to-point connection with a credit scheme for synchronizing communication. Each link includes a credit counter and a credits-issued counter, with three reserved control tokens used to transmit credits. To initialize the link, software must first enable it, then request a HELLO token to be sent. This will reset the internal credit counter. The counterpart link will reset its credits-issued counter and issue credits.
Whenever a link sends a token, it will decrement its credit counter. When a link is aware its counterpart is low on credits and is ready to receive more data, it will transmit credits as a control token. This method allows a link to control the transmission of tokens, making it possible to throttle down the data rate if required.
The XMOS Link relies on a unique identifier being assigned to every device in the system. This ID is sometimes referred to as “node,” “core,” or “processor” ID. Upon transmission of a data stream, a packet header is first sent. This header includes the ID of the packet’s destination node, which may be any number of hops away. Assuming a path can be found through the network, this header will establish a connection until the end of the packet.
Routing tables, stored in every node in the system, determine a route. When the System Switch of a node receives a data stream, it compares the destination processor ID with its own. If it matches, the packet has reached its destination, and all subsequent traffic is then routed into the core itself.
In case of a non-match, the switch must choose an outgoing link for the packet to travel down. It does this by considering the position of the first non-matching bit. This is then passed into the routing table and used to look up the dimension (direction) to route the stream. Every direction is assigned one XMOS Link or more. This way, systems can be constructed using most common network topologies: pipelines, grids, stars, and trees, for example.
Continue to page 2
A good illustration of the elegance and efficiency of the routing method is a 2-by- 2 grid (Fig. 1). In this case, processor IDs would be generated as a bit field, using one bit (say bit 1) for the “x” direction and another bit (say bit 0) for the “y” direction.
Consider a packet that’s leaving node “0,0” (with processor ID 00b) destined for node “1,1” (with ID 11b). Node “0,0” would be configured to send packets with a differing bit 1 in the positive “x” direction. Hence, this packet would travel out of the east link to node “1,0” with ID “10b.” Now there’s a match with bit 1, but bit 0 differs, and the packet would be sent on to its destination through the north link. A packet in the reverse direction would not travel via node “1,0.” Instead, it goes via node “0,1.”
From a software perspective, links and processor IDs are set up before the “main” function will execute. The XMOS tools provide a Mapping tool, which accepts an XML network description for multiple XCores, automatically initializes and configures links, and sets up the network’s routing tables.
Communication channels can be instantiated between any two threads, running on any two nodes in the system, exactly as if they were on the same core. Standard operators, provided by XC, can be used for outputting and inputting.
In the following example, an XMOS Link connects an XS1-L1 device to a standard microcontroller, which implements the link in software. For simplicity, only the two-wire serial link is implemented. The MCU will be utilized as an ancillary chip for the L1, providing analog-todigital converters (ADCs) and nonvolatile memory expansion. An 8-bit low-cost AVR ATtiny device was employed for prototype evaluation, using AVR Studio and AVRGCC for development.
To test the system, the ATtiny’s built-in ADC was used to verify and calibrate a 1-bit digital-to-analog converter (DAC) output from the L1 (Fig. 2 and 3). The simplicity of the test design meant that the MCU source code was written entirely in C. Benefits could be seen in hand-coding some of the routines in assembler. But at this stage, an easily verifiable test model was the priority.
TOKEN RECEIVE LAYER
The ATtiny’s two interrupt vectors are configured as input-compare on the two Rx wires of the XMOS Link. Incoming transitions on either of the two wires fire interrupts to the AVR CPU. These edges are combined into a complete token, including a control token flag, which is then checked for correct parity. Valid tokens are passed to the credit manager, and invalid ones are junked. A code snippet shows the effect of a transition on the “1” wire (Code Listing 1).
In the credit manager, received CREDIT and HELLO tokens are used to modify the local credit counters. The receive FIFO stores all other tokens, including other control tokens.
Should it be required, credit is also given to the destination node at this stage. The link can be dynamically throttled here. Should the receive FIFO be nearly full, credit will not be given, and buffer overflow will be prevented. Outgoing credit tokens are stored in the Transmit FIFO, queued with any other tokens as required.
Incoming and outgoing streams are managed in this layer, including parsing packet headers and deciding if the node and target channel ID are correct. No routing was implemented, as the MCU in this example was chosen to support only one link. Only one channel end is instantiated in this layer, and incoming streams targeting other processors or channel ends are junked.
The application layer has an application programming interface (API) identical to that presented to an XCore instruction set. When the application layer asks to send a token, and if credit is available, this token will be stored in the transmit FIFO, ready to be sent. If credit isn’t available, this request will hang until credit is received.
Transmission is performed immediately upon transmit requests. A transmit FIFO is there to guarantee when the stream layer is transmitting. Incoming credit transmission will not interrupt the current token. Transmit requests will pause until the token has completely transmitted.
At the application layer, MCU software is free to manage the connection the same way as an XCore ISA. An API very similar to the XS1 instructions is provided (testct, inchar, outchar) for extremely easy application- level multichip communications.
A connection is initiated by the remote device, establishing an outgoing connection and sending its channel-end identifier over the link. This allows the MCU to connect and set up a full duplex stream. A simple state machine that supports a small set of commands is then implemented. The transmission of a CT_END from the XCore will allow the MCU to terminate the stream (Code List 2).
The XCore implementation looks very similar to the MCU’s application-layer code. One difference is that threads wanting to communicate with the MCU can use the input and output operators built into XC. Once the link has been initialized, the channel end can be used as if it were a standard streaming channel.
Continue to page 3
The control thread instructs the DAC what voltage to output, and it then retrieves the actual voltage value from the ADC in the MCU (Code List 3). Note how the channel to the MCU is used identically to the channel that goes to the DAC, which is simply another thread on the XCore.
Data throughput from this software implementation of the link was low— around 10 kbits/s. Such a software link is ideal for low-data-rate applications, such as the example given. The capability also wasn’t included in the MCU, which would otherwise enable the use of multiple channel ends. Thus, more than one simultaneous connection is disallowed. This could be added easily, though. On larger MCUs, the XMOS system switch’s routing capabilities could be added to the firmware, allowing multiple XMOS Links on one device.
For some applications, an MCU isn’t the right choice. Source is available for XMOS Links in an FPGA at www.xlinkers.org/projects/xlink_fpga.
An XMOS Link provides a versatile solution for inter-chip communications and can be readily implemented on a standard microcontroller. Its transition-based nature and credit scheme allows for the possibility of a low-speed link in software. However, the link scales well to very high data rates with minimal overhead. Crucially, inter-chip connections located at the application level are simple, and large complex systems are able to be created without difficulty.
All of the project source code can be found on the Internet at www.xlinkers.org/node/306.