Thanks to the 128-bit instruction word, the Diopsis processor can produce real and imaginary arithmetic results simultaneously. Most other floating-point DSPs require at least two cycles for complex domain algorithms. For instance, a 1024-point complex floating-point FFT requires just 5962 cycles on the mAgic engine and 14,400 cycles when run on a chip such as Texas Instruments’ TMS320C67.
A highly parallel architecture, the mAgic VLIW DSP core contains four multipliers, three adders, and three subtracters. Two four-input, four-output by 256-location register files can store the 40-bit real and imaginary numbers separately, which enables single-cycle complex arithmetic on extended-precision floating-point data. An on-chip 8-kword by 128-bit program memory for the DSP engine holds compressed program code. The DSP assembler automatically compresses program code by a factor of two or three, resulting in an average effective instruction density of 50 bits per stored cycle without any loss in performance.
An embedded ARM processor can read or write to the DSP local data memories and configuration registers. The DSP core operates in both a “system” and “run” mode. In the system mode, the VLIW engine halts and all the DSP’s internal resources are mapped into the memory space of the ARM processor. The ARM controls the DSP’s DMA channel and can read and write the local data memories and configuration registers of the DSP. In the run mode, the ARM has access only to the VLIW processor’s command register and a 1-kword by 40-bit dual-port shared memory.
Both processors operate under their own programs, and either processor may operate as a master. Software support for the chip includes development tools, a unified programming environment that includes a cycle-accurate simulator for the entire chip, and a library of 75 C-callable DSP functions.