Electronic Design

Vector Processor Makes Short Work Of Complex DSP Algorithms

With four vector pipelines, Telairity Semiconductor's TVP400 combines the signal-processing features of a high-performance DSP and the control capabilities of a 32-bit microcontroller (see the figure). It will be available later this year for use in ASIC applications as a 4- by 4-mm "hard" core. Initially, it will be offered using 0.13-µm design rules to ensure operation at 600 MHz.

The highly parallel architecture allows the TVP400 to keep up to 23 operations in flight at the same time. When running at 600 MHz, the core can execute a 256-point complex fast Fourier transform in just 2.1 µs, or a 64-tap finite impulse-response filter in just 29 ns/result (continuous).

The core was crafted via Telairity's internally developed building-block design scheme. The approach includes a well defined methodology, tools, and a complete library of pre-engineered, fully characterized hard IP building blocks that are reusable, generic, and portable.

To ensure that the processing blocks won't stall due to a lack of data, designers included 128 kbytes of SRAM (128 memory banks, each 512 words by 16 bits). It supplies data to the vector pipeline through a crossbar switch that permits eight memory reads and four writes simultaneously. To support the scalar processor, designers also included 16-kbyte instruction and data caches. Both eight-way set-associative caches have prefetch and lock capabilities.

Rather than use one of the commercial scalar 32-bit cores from ARM, ARC, MIPS, or another source, Telairity crafted its own scalar engine, since it could be optimized to control the four vector pipelines as well as perform system control operations. The quad-pipeline vector engine can operate on vectors with a length of up to 32 bits or operate on four 32-bit vectors in a single-instruction/multiple-data mode so one instruction delivers 128 results. The vector engine can also operate very efficiently with short vectors as well as perform gather-scatter operations to help arrange data in memory to achieve the highest computational efficiency.

Each of the four vector pipelines can simultaneously perform four load operations and two store operations. The resources in each pipeline include 16 vector registers (each contains 32 16-bit elements), two adders with 24-bit accumulators, and one multiplier-accumulator with 40-bit accumulation. An integrated 1-Mbyte ROM can be used to hold the application program, so the core can operate as a dedicated application processor when embedded in a larger chip design.

Telairity Semiconductor Inc.

TAGS: Digital ICs
Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.