A New Player In The 32-Bit Procesor Field

Atmel took a turn away from the pack when it designed its 8-bit AVR. Now, the company is bucking the trend toward the 32-bit ARM architecture with its AVR32 processor architecture. Needless to say, ARM and its partners don't have the 32-bit market sewn up by any means.

In fact, a range of popular 32-bit microprocessors is available, including three families from Freescale alone. The competition should be healthy. Still, the AVR32 melds system design components like DSP and single-instruction/multiple-data (SIMD) instructions, Java bytecode support, and a compact instruction set.

The AVR32 runs in standard and Java bytecode mode. In standard mode, it can execute 16- or 32-bit instructions without switching modes. Most instructions are 16 bits, which reduces code size and effectively increases the cache performance because more instructions can fit into the cache. This is one reason why the ARM Thumb and Thumb2 instruction sets have become so popular. However, the AVR32 doesn't have the mode switch overhead when the need arises for 32-bit instructions.

The AVR32 only requires a seven-stage pipeline (Fig. 1). A short pipeline reduces overhead due to stalls. It also allows for more aggressive analysis because timing constraints aren't as critical as they are with some other system architectures. As a result, features like the dynamic branch prediction can essentially implement zero-cycle-loop instructions, which are key to improving DSP performance.

Thanks to conditional return instructions, there's more inline execution versus the test/branch combination used with other architectures. Individually, the architectural finetuning may seem trivial. But combined, the features add up to greater performance. As a result, a low clock rate can perform the same function on other architectures. Also, lower clock requirements reduce power consumption. This is vital for Atmel's targeted product areas, such as portable multimedia devices.

A REGULAR ARCHITECTURE Atmel designers kept the system architecture simple. It uses a 16-register register file with a minimal number of mirrored registers for hardware context switching (Fig. 2). The AVR32 also features four levels of interrupt priority. It supports up to 64 interrupt groups and up to 32 interrupt lines per group, and each group has its own priority. This provides a very flexible interrupt control structure.

Interrupt 3, the highest-priority interrupt, mirrors a half-dozen additional registers. This allows many interrupt service routines to run without saving any additional system state. It also enables interrupts to be processed with minimal overhead.

The AVR32 includes a number of common instructions that typically take multiple instructions on other architectures. For example, certain instructions move selected blocks of registers. This is similar to instructions found in the new Texas Insturments MSP430X architecture (see "16-Bit Architecture Grows To 1 Mbyte" at www. elecdesign.com, ED Online 11528). Register-to-register block moves occur in a single cycle.

The AVR32 is a big-endian architecture. But it implements a host of pack and extract operations with a 32-bit barrel shifter that simplify little-endian support. These instructions also come in handy for structure manipulation. The processor can manipulate 64-bit values as well.

The balance of the system architecture is fairly conventional. Data and code caches provide better performance. The paged and segmented memory management unit (MMU) can handle any operating system.

However, Atmel designers still have a few tricks up their sleeves. For example, a four-entry circular buffer can hold return addresses pushed into memory. It allows the values to be used immediately from the buffer instead of being read from memory, delivering better performance. This is transparent to the application and compiler, though applications that use nonstandard returns must explicitly flush the buffer.

The DSP and SIMD sections use a straightforward design with a few interesting tweaks that increase performance, reduce overhead, and get the job done using less power. For instance, there's delayed writeback of the 48-bit temporary accumulator used in a multiply-accumulate mode.

That means each iteration of a loop only needs to load one value instead of the two typically used in other architectures. This can be employed to implement fast finite-impulse-response (FIR) filtering algorithms. The processor supports fractional multiplications with saturation, rounding, and scaling.

Likewise, SIMD support addresses common multimedia algorithms such as MPEG-4 motion compensation. MPEG-4 encoding software also uses instructions to handle operations like the sum of absolute differences. These types of operations are found in competing 32-bit multimedia architectures, but you won't see them in conventional 32-bit architectures.

THINGS TO COME Atmel just released the architecture, with chips coming within a few months. Likewise, the Java support will appear in this timeframe as a complete runtime. Development tools are still in the works, though. Expect a version with a pair of built-in Ethernet controllers and another with a single controller. Atmel already has a stable of analog and digital peripherals that likely will show up in these chips.

Developers will be able to get their hands on Atmel's STK1000, which will include AVR Studio and the open-source gcc C/C++ compiler. Linux 2.6 kernel support also will be available, but expect a number of other options.

IAR's EWAVR32 KickStart development kit provides access to an optimized C/C++ compiler and development system. IAR also will have a version of visualState, its model-driven design tool for statemachine development. Both development platforms will support the advanced DSP and SIMD instructions.

The development tools will be able to take advantage of a range of debugging features, from on-chip trace (including dual data trace channels) to half a dozen hardware code breakpoints and two data breakpoints. The code and data breakpoints can be combined for more complex real-time support. The AVR32 implements the Nexus Class 3+ debugging standard.

Initial benchmarks show the AVR32 competing very favorably with its 32-bit rivals. This is especially true for multimedia applications. The initial AVR competed very well with existing 8-bit microcontrollers, becoming a very popular, low-power alternative. The AVR32 is poised to follow in its footsteps.

Atmel Corp.
www.atmel.com