Direct execution of Java byte code is possible thanks to a Java extension to the ARM processor core by ARM Ltd., Cambridge, England. Known as Jazelle, the new Java acceleration feature initially will appear in an ARM9-class product. It's also planned for ARM10-class and ARM7-class products. Since ARM will deploy Java technology in devices where the target market requires this feature, it will mainly go into ARMx20-type products where the MMU provides full virtual-memory support for operating systems like Windows CE, EPOC, and Linux.
"There is a clear market need for a Java platform offering high performance comparable to JIT \[just-in-time\], while utilizing small memories comparable to virtual software machines and offering compatibility with existing operating systems and application code," says David Cormie, CPU product manager at ARM.
"By adding a Java extension to the instruction set, ARM meets exactly this market need because the ARM architecture extensions for Java provide high performance at low cost, as well as good power efficiency," he adds. "We ran a Java-accelerated ARM9 core on an instruction set simulator and determined a Java performance of 5.5 CaffeineMarks/MHz."
Even though Jazelle adds a lot of functionality to the already existing ARM core, only around 20,000 additional gates are needed—a value that's almost insignificant for a typical ARM CPU macrocell product that also includes the cache required to support the operating system.
Only One Code
Java instructions, or byte codes, aren't like other software programs. Byte codes need to be adapted to the program's hardware/software environment. For example, designers writing a C program will have to run it through a C compiler to adapt it to individual environments and operating systems, such as DOS, Windows, and EPOC. This conversion requires a lot of effort. Programs must be offered for many different operating systems, which all need to be held in stock.
A Java compiler translates Java source code into Java byte code. Only one byte code is then accepted by any environment capable of running Java. Byte code is interpreted by any Java virtual machine (VM) or translated into the machine code of the target system by a JIT compiler.
The new Java feature can be described as hardware emulation of a Java virtual machine. Java will appear to the programmer as another mode or state. Instead of executing ARM or Thumb instructions, it executes Java byte codes. Consequently, a Java byte code is decoded in the instruction decoder, turned into an ARM instruction, and immediately executed. By doing so, no element of JIT runtime compilation is involved.
"Over 90% of Java byte codes are executed directly by a Java-enabled core with no overhead," Cormie says. The remainder are cracked into small pieces and interpreted as short se-quences of ARM instructions. Typical examples are functions like complex divisions, method invocation, and floating-point operations.
The hardware/software split is invisible to the programmer, application, and operating system. While in the Java mode, the machine looks like a Java processor and executes Java byte codes. Users won't see whether the current Java byte code is executed in hardware or software. All existing ARM registers are reused in the Java mode, and all registers have particular functions in this mode.
Compared to a conventional ARM core without Java capabilities, the only differences are the Java status bit and the new instructions. The status bit is located in the code program status register (CPSR), the register that contains the flags as well as some mode bits. These mode bits are for exception processing and undefined instruction traps or interrupts, fast interrupts, and memory aborts. Mode bits record whether the processor is in an exception mode, a user mode, or a supervisor/privileged mode.
The T bit within the CPSR is the Thumb bit. It records whether ARM or Thumb instructions are executed. Thumb is a second internal instruction set. Its decoder logic converts 16-bit instructions without latency into equivalent 32-bit instructions, which are then transferred to the 32-bit CPU for execution. So, Thumb is a compressed instruction set that provides better code density.
ARM added the J bit to the CPSR. The J bit records whether the processor is in the Java, ARM, or Thumb state. According to the truth table, the remaining state with J = 1 and T = 1 is illegal. When J = T = 0, the processor is in the ARM state. The processor is in the Java state when J = 1 and T = 0. And when J = 0 and T = 1, it's in the Thumb state.
When executing ARM or Thumb code, the additional logic needed for Java is inactive and introduces no additional power consumption. "When in the Java mode, the efficiency of direct hardware execution of Java byte codes leads to lower overall power consumption than with a software virtual machine or with a JIT compiler," Cormie claims.
Basically, a programmer could enter the Java state simply by writing the J bit into the program status register, but this isn't recommended. The best way is to use the Branch Exchange to Java (BXJ) instruction that's been added. It works just like calling a subroutine.
BXJ is a conditional instruction. If a condition is false, nothing will happen. If a condition is true—which could be a zero condition, carry condition, or whatever—the branch will be taken. Before the branch is taken, the current program counter (PC) is stored and the J bit is set. Engineers can save three program steps when the program enters the Java state because the BXJ instruction performs three operations. First, it checks the condition. If the condition is true, it will store it in the PC and load a new PC. Then, it sets the Java state and takes a branch.
16 Registers Visible
These are the two only changes to the ARM architecture (see the figure). Nothing else is visible to the programmer because the machine directly executes Java byte codes and makes some special use of the existing ARM registers. The ARM has 16 registers that are visible in the user mode. Other registers are used for exception processing, but they're not visible to the user. Each exception mode, then, has its own register set.
The company profiled some Java applications and discovered that most Java methods (a method is a Java subroutine) only use three or four operands. This means only a stack depth of three or four operands is needed. These four elements can then be put in registers inside the processor. Doing so prevents a lot of the pushing and popping of items in the stack caused by memory accesses.
"Only if the method uses more than four-stack elements does the processor need to start pushing and popping from a real stack held in memory," Cormie explains. "That's why the Java extension is very efficient." Stack and even memory operations all are automatically completed by hardware. As a result, the programmer doesn't have to know about them.
For efficiency, ARM keeps one of the local variables at zero in one of the ARM registers. Java applications frequently use the local variable at zero as a pointer to data. By keeping it in a register rather than in memory, the processor can perform better.
Additionally, ARM uses other registers for other pointers. A pointer to the exception table holds the instruction sequences for the instructions that are not executed directly. Also, there is a pointer to the Java stack, a pointer to the Java variables area, and a pointer to the constant pool. Java programs access these groups of data all the time and keep them in existing ARM registers.
Programmers never have a reason to use or manipulate the registers described here, but it helps to understand how efficiently the core works. When running a Java program, programmers can't address these registers. According to Cormie, there is no Java byte code like "get the ARM register 6."
"Java registers do not manipulate ARM registers and ARM does not intend to support programming at this level because it is not useful and could lead to someone producing an application that was not portable," he says. "The important thing to remember is that Java is designed primarily for portability and to ensure portability." This is why the Java instruction set needs to be compiled.
"Programmers manipulate a stack-based machine as its own architecture, and the Java machine uses the existing ARM hardware and registers to emulate that," Cormie notes. "That way, the design is very efficient because there's no need to put a lot of extra hardware into the machine to do this and it's not necessary to build extra registers."
Consequently, calling the Java mode is exactly like calling a subroutine. The return (from subroutine) is fairly straightforward. There are a number of unused Java byte codes. All of the unused byte codes are handled as exceptions. One of the unused byte codes is used as the means to return to the calling program. Whenever this byte code is encountered, the hardware takes an exception because it's an undefined byte code. The exception handler recognizes that byte code as a "return me to the calling program" instruction, and it will do that.
A Java-enabled core will service interrupts and exceptions using the existing ARM model. All of the interpreted sequences are interruptible and re-startable, so there is no impact on interrupt latency. Java-enabled ARM cores support all of the Sun Microsystems J2ME virtual machines (JVM, KVM, CVM) and a range of third-party virtual machines. The Java extension will offer around 5.5 CaffeineMarks/MHz, yielding 1080 CaffeineMarks in a typical 200-MHz implementation. Extensions of existing ARM debugging tools will enable nonstop debugging of mixed applications running Java and C.
The first products incorporating the Java-accelerated core are expected by the end of the first quarter of 2001. For details, contact ARM Ltd. at +44 223 400 400, or go to www.arm.com.