Lots of division, square-root, or squaring operations often require multiple clock cycles to execute—which engineers would like to avoid in their designs. A logarithmic-based computational scheme by QSigma of San Jose, Calif., lets such computations execute in a single cycle—as fast as basic single-cycle multiplications. By giving designers fast square-root, square, and divide operations, high-efficiency signal-processing and image-processing algorithms can be developed to handle applications that are more complex.
Dubbed QSMath, the logarithmic techniques include QSigma's proprietary calculators to efficiently convert data between the log and normal numeric domains. It is based upon extremely efficient and accurate calculators for the nonlinear function logarithms base 2 and exponentials base 2. The method that generated these calculators can be equally applied to polynomials, periodic functions like sine and cosine, and many other functions.
Traditionally, these functions were approximated on bounded intervals by linear functions, polynomials, or predictor-corrector schemes like the Newton-Raphson and Cordic methods. All of these approximation approaches consider the output as a function of a single input, usually 8 to 16 bits.
QSMath looks at approximation differently. The input within the interval is seen as an ordered collection of bits, forming clusters. These bit pair clusters provide excellent accuracy for many functions. Each has a range of four states (00, 01, 10, 11), which are mapped to a collection of small bit multipliers. Two collections of small bit multipliers are particularly useful: \{-1/2, 0, 1/2, 1\} and \{0, 1/2, 1, 1 1/2\}.
Math precision ranges from 12 to 24 bits for integer operations. The scheme can also handle single-precision floating-point inputs and outputs. The circuitry QSigma developed to perform the conversion consists of a very-long-instruction-word (VLIW) architecture with a very small interrupt latency (see the figure). The QSMath block is available as intellectual property that can be co-integrated into any ASIC design to help accelerate critical computations.
The VLIW architecture can support multiple multitasking data memory partitions without requiring caching. Multiple instances of the basic compute block can perform complex computations like a Radix 4 transform. To handle that, four QS-Math elements and some other circuitry perform the matrix calculations.
In a Radix4 fast Fourier transform (FFT), the QSMath solution running at 512 MHz could execute 500,000 256-point FFTs with a precision to within ±1 LSB. Such a throughput is close to the best any DSP-based FFT solution can deliver. Higher-resolution FFTs can be done at throughputs comparable to a Texas Instruments TMS320C6415 DSP chip running at 600 MHz. The Radix4 block, though, occupies only a fraction of the area consumed by the DSP chip.
The QSMath block runs at clock frequencies of up to 512 MHz, and the company offers Verilog, VHDL, and C models for synthesis and simulation.
Contact QSigma at (408) 979-1543 or go to www.qsigma.com. For details about the software tools, see the Forefront section at www.elecdesign.com.