Electronic Design

Multi-Threading Hardware Revs Up Internet Edge Processor

Deterministic eight-way hardware multi-threading and a memory-to-memory instruction set make the 32-bit IP3023 a fast and efficient embedded network processor. Software I/O and a large on-chip RAM keep the die size small. The approach is so unusual that the company, Ubicom, had to coin a new designation—MASI (multi-threaded architecture for software I/O)—to categorize it. The no-overhead context switch for hardware threads keeps the chip's 10-stage superpipeline flowing. The IP3023's high performance, low-power requirements and memory to memory are optimized for its target market, gateways and embedded network devices. This market has lots of competition, but most other players use conventional RISC or CISC architectures.

Eight register banks are contained in the IP3023, and each is associated with a thread. A bank can be hard real time (HRT) or nonreal time (NRT). The architecture is designed to support up to 32 threads, but only eight are possible with the IP3023.

The 250-MIPS IP3023 uses the same ipOS operating system as the the Ubicom IP2022, so compatibility is maintained for C-based applications. The ipOS is typically allocated one NRT thread slot. An HRT thread slot usually handles high-priority interrupts. Additional HRT threads are often allocated so that one of these threads handles a high-speed device.

HRT task scheduling uses a fixed reservation sequence stored in a time-slice table (see the figure). An HRT task requiring more time will have its thread number in more than one table entry. A task that can use up to half the processor performance would have its thread number in half of the table entries. Two tables let designers change scheduling requirements in real time. It can also be used to quickly switch between two configurations.

The architecture is interesting because it essentially makes the IP3023 look like an eight-processor system running at 31.25 MHz. The IP3023 approach is more flexible as using two slots will double a task's execution speed. Conventional multitasking systems have task switch overhead that's less efficient.

A more conventional multitasking system can be built on top of the hardware multitasking system if more than eight threads are required. In this case, one or more register banks would be used to handle conventional threads by copying data to and from data or program memory.

If an HRT thread is waiting for an interrupt, or if it's not ready, then its slots can be used by NRT threads. These are scheduled in a round-robin fashion, with each thread getting one slot every time it runs.

Branching is the bane of a pipeline architecture, but it makes less of an impact on the IP3023 because the entire pipeline must rarely be flushed. If the branch prediction logic guessed wrong, only slots for the matching thread are flushed from the pipeline. All other threads are unaffected.

The IP3023 keeps its die size small by using a limited collection of hardware devices, including two SerDes (serializer/deserializer) and four MII (Media Independent Interface) devices, and by using software I/O for other devices. This is one reason why the IP3023 is so flexible. Devices like a PWM (pulse-width modulation) timer can be easily implemented in software via a GPIO (general-purpose I/O) pin. Hardware devices are dedicated to certain pins. However, if the device isn't used, the pins can go toward software devices.

Ubicom provides source code for a number of complex software devices, from a DSL Utopia interface and a PCI interface to simple interfaces like the PWM timer. Typically, complex or high-performance devices require an HRT thread. An HRT thread is also necessary to service most hardware devices, although it's possible for one HRT thread to service multiple devices depending upon their performance requirements.

MEMORY HIERARCHY
The IP3023 takes an interesting approach to memory. It has a large on-chip program and a dual-port data RAM that operate at one clock cycle. There is no need for caches, thereby avoiding cache miss delays.

The flash interface is straightforward, and applications can run from flash. Unfortunately, performance is limited by the flash device and there's no on-chip caching. Typically, applications are copied from flash into on-chip program memory.

The SDRAM interface is strange because the SDRAM isn't part of the normal processor address space. Special instructions are used to read and write data. For the most part, data will be copied to on-chip memory for processing. The large on-chip memory usually makes external SDRAM unnecessary. Performance is sufficient for most applications that need more memory, such as a multimedia Web server.

Using conventional means, the IP3023 minimizes power consumption. Its external clock is 10 MHz, and the internal speed of the processor and peripherals is independently controlled. The system can be slowed all the way down to the external clock rate in real time.

The IP3023 is available in a 208-pin plastic quad flat package (PQFP). Pricing is $12 in quantities of 100,000 units per month. Production quantities will be available in the third quarter. Samples will be available later this quarter.

UBICOM
www.ubicom.com • (650) 210-1500

IP3023 SPECIFICATIONS
Processor 32-bit multi-threading RISC, source code compatible with the IP2022
Memory 256-kbyte program SRAM, 64-kbyte data SRAM
Network Two serializer/deserializers with one on-chip 10BT physical layer, other interfaces available using software peripherals and GPIO pins
Peripherals 104 GPIO, Watchdog timer, SPI random number generator
Other SDRAM controller, SRAM/flash interface
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish