Can developers utilize a 32-bit architecture with a clear upgrade path even when low power and compact size are high on the list of requirements? Arm Ltd. attempts to answer that question with its Cortex M3 processor, which offers a much smaller footprint and lower power consumption. As a result, it can now compete in areas that were considered out of bounds for higher-performance Arm processors.
By leveraging the popularity of the 32-bit Arm architecture, the company officially moves into the high-performance 8-bit space. Expect the Cortex M3 architecture (see the figure) to be used by companies that already incorporate Arm processors in off-the-shelf microcontrollers as well as custom designs.
The Cortex M3 design is small and fast, but Arm didn't scrimp on the design. In fact, the new architecture includes some twists that higher-end processors may institute in the future. For example, bit banging is common in microcontrollers. Atomic bit operations are also necessary for efficient real-time operating-system (RTOS) support. This new design significantly improves on bit-handling performance.
Of course, squeezing a 32-bit processor into a small package brings a few concessions. Caches disappear, and clock timing is designed to match flash-memory performance with processor-core performance.
MAKING IT SMALLER Right away, developers will notice that the Cortex M3 lacks the 32-bit Arm instruction set, including single-instruction multiple-data (SIMD) instructions. Instead, the processor supports only the Thumb and Thumb 2 instruction sets. Many Arm processors support these instruction sets as well. They provide access only to the most common operations, but the more compact instructions reduce program size.Thumb instruction execution in the Cortex M3 is better than the Thumb 2 instructions, especially when the core is run at a speed that requires one wait state. Also, the Cortex M3 retains the 32-bit register set common to all Arm processors. This allows many applications to be ported without changes. Developers using a higher-level language won't have a problem migrating to the new platform.
MAKING IT FASTER Performance is relative. The Cortex is designed to operate at or near flash-memory performance, which is about 50 MHz. The core can operate at a higher speed if a wait state is introduced. This, along with the lack of a cache, makes the design slower than the higher-performance Arm architectures. However, the performance is high compared to the 8- and 16-bit processors in the Cortex M3's target market. A Harvard architecture and a three-stage pipeline provide single-cycle performance, but with 32-bit operations and registers.Bit handling was a deficiency in existing Arm architectures. So, the Cortex M3's designers took a unique approach to providing single-bit manipulation. Basically, they designated an address range for use with bit operations. The bus controller then handles all accesses in this range. Get/set instruction pairs are combined into an atomic operation requiring no change to the instruction set.
One feature not found on most microcontrollers is hardware divide, because divides can be very compute intensive. The Cortex M3 can easily handle divides while using lower clock speeds and still consume less power.
Interrupt response time is less critical on a faster processor that can respond more quickly because of a higher clock rate. Arm's designers enhanced interrupt response time using a number of techniques. First, there are 32 vectored interrupts. Second, interrupt addresses are passed directly to the core processor when an interrupt occurs, allowing for early processing of interrupts while the pipeline clears. Third, processor state is automatically saved and restored when an interrupt occurs. Finally, features like tail chaining and preemption enable multiple interrupts to be handled more efficiently.
Tail chaining occurs when two interrupts are being handled. In this case, the lower-priority interrupt will be handled with a minimal transition between the two service routines. Preemption occurs when an interrupt is received as the current interrupt is completing. In this case, the processor switches to the second interrupt without restoring and saving the state for the interrupt task.
MAKING IT EASIER TO DEBUG Debugging on an 8-bit microcontroller without ICE (in-circuit emulation) hardware can be an exercise in patience. Developers using 32-bit processors are used to much more.To ameliorate that situation, many of the Cortex M3's features use single-wire interfaces, which helps reduce pin count. The processor may also include Arm's standard embedded trace module (ETM) feature. ETM output is supported by a number of hardware trace systems.
The debug access port (DAP) provides access to all memory and registers in the system. Also, all interrupt vector information is available. The single-wire viewer (SWV) supplies real-time feedback without having to stop or slow down the processor. It can be used to provide hardware profiling without the need for program instrumentation.
The Flash Patch feature offers a way to hook into program code without modifying flash memory. In this case, flash memory is mapped to data RAM. This can be handy for debugging, as well as for making dynamic system changes.
Arm-class breakpoint and watchpoint services come with the Data Watchpoint and Trigger Unit (DWT). This includes eight hardware breakpoints courtesy of the Flash Patch feature and two hardware watchpoints. The Cortex M3 supports stepping modes with or without interrupts.
Cortex M3 power-management support includes three sleep modes. Sleep Now waits for interrupts in power-down mode. Sleep On Exit puts the system in power-down mode when an interrupt-services routine ends. The system is even smart enough not to restore a task's state, because only an interrupt can exit this mode. Finally, the Deep Sleep mode shuts down the phase-locked loop (PLL). Sleep modes can also control clocks and peripheral power outside the core.
Arm has worked with third-party vendors to deliver operating systems, compilers, and support software such as system libraries that only utilize Thumb and Thumb/Thumb 2 instructions. Bit manipulation instructions will improve RTOS performance. Developers may need to recompile their application and libraries when targeting the Cortex M3. Third-party hardware debug vendors have also worked with Arm to provide access to the new debugging support.
The Cortex M3 will be licensed in a fashion similar to Arm's existing portfolio of intellectual property. Expect the architecture to be implemented by third-party hardware vendors that currently have standard, off-the-shelf Arm-based processors as well as custom designs. Many of the standard components will benefit from the Cortex M3's new architecture since they are targeted at compact, low-power applications that are readily addressed by the Thumb and Thumb 2 instruction sets.
The Cortex M3 is a major change of direction for Arm. Ultimately, the design opens up 32-bit computing on a standard platform to the world of 8- and 16-bit developers.
Arm Ltd.www.arm.comReducing the size of the Cortex M3 was key to taking on 8- and 16-bit microcon- trollers. The processor has a three-stage pipeline designed to run with flash memo- ry and without cache memory.Thumb/Thumb 2 Instruction Set
The 16-bit Thumb instruction set has been popular in reducing code size. Eliminating support for the 32-bit Arm instruction set reduces the size of the overall system. Support for the Thumb 2 extensions allows the Cortex M3 to access almost all features found in higher-end Arm architec- tures. Improved Interrupt Handling
Interrupt response requires fewer cycles than other Arm architectures. Likewise, nested interrupt handling during interrupt transitions has been compressed, reduc- ing latency. The Cortex M3 adds a non- maskable interrupt.Hardware Divide
Speeding up integer division can signifi- cantly improve the performance of appli- cations with hefty computational require- ments.Improved Debugging
The Cortex M3 has more than just breakpoints and Arm's trace support. The single-wire viewer allows for real-time monitoring. Flash patching and sophisticated watch points make a developer's diagnostic job easier. And, single-wire debug interfaces reduce system pin count.Small Footprint
The Thumbnail core requires only 33k gates. This is almost half the 60k gates needed for the ARM7-S. The complete Cortex M3 requires only 60k gates. The optional Embedded Trace Module (ETM) adds a mere 15k gates.Bit Manipulation
Bit manipulation for flags and peripheral control is needed for efficient microcontroller operation. The Cortex M3 adds atomic bit operations through the use of "bit banding," where a block of memory maps to bits instead of words. Low Power
The Cortex M3 uses only 0.09 mW/MHz when executing Thumb instructions, compared to an ARM7-S that consumes 0.39 mW/MHz.