Programming Efficiency

When I started college, design of the Intel 4004 was underway. The C programming language and UNIX operating system were being developed (unbeknownst to me). I did most of my programming in BASIC on an HP 2100 series mini-computer. The closest thing to C we had was Algol 60. I had access to an IBM 1401, a Burroughs 1700, and (via 300-baud modem) an IBM 360 running a custom operating environment.

A lot has happened in the industry since those days. By the time I graduated with one of the first CS degrees awarded by that institution, the 8-bit microprocessor was alive and well, and a number of companies, such as Imsai and Altair, were supplying usable systems based on them to the hobby market. Microsoft was a small company whose main product was a BASIC interpreter that was available for several processors.

I didn’t immediately enter the computer sector, but did start using 8-bit micros for work fairly shortly after graduation. My first project was for the motion-picture industry using a KIM 6502-based single-board computer to control 16-/35-mm sound recorders. The code was written entirely in assembly and burned into a UV-erasable EPROM. The 6502 was my preferred processor at the time, and I still can read some of the opcodes in hex. My first assembler could take a minute or more to assemble the simple program of a few kbytes. Later, I used Randy Hyde’s Lisa assembler, which was blazingly fast at 30,000 lines of source code per minute.

A company called Manx came out with an honest-to-goodness C compiler for 6502 processors a few years later, and I learned C. Again, compile times for a 10k program could take several minutes. Also, there was a bug in the compiler. If it encountered a left curly brace { without a matching right one }, it would get stuck in an infinite loop that filled the entire disk, requiring a reset to end, and leaving the storage allocated but not attached to any file. It, in turn, led to major low-level disk repair.

A frequent theme in the trade journals of the time was the shortage of programmers. It was frequently postulated that just around the next corner, anywhere from two to 10 times as much programming was going to need to be done than there were programmers to do it. The result was a search for ways to leverage that talent to be more productive. The benchmark of the time was that a competent programmer could produce 10 lines of debugged, documented code per day irrespective of the language. Thus, the higher level and more abstract the language, the more proficient the programmer. (Anyone remember PL/1, APL, SNOBOL, or LISP?). The shortage that never really materialized eventually became a tired story, and I haven’t heard it now for about 15 years, but the emphasis on productivity is still with us.

Today that trend has spawned GUI development systems, C++, and a host of interpreted languages, such as Java, in an attempt to leverage productivity. Along the way has come increasing concern about program correctness as well as security. Adding fuel to the fire, the transistor count in an IC (think processor and memory) has gone from 2300 in the 4004 to billions. Memory has jumped from a few hundred bytes (closer to 64,000 for many 8-bit CPUs) to several billion bytes. And, of course, the cost has generally gone down, even without factoring in the density.

Back to the “dark ages.” Early computers were very expensive. A mini computer cost anywhere from about $5000 to $100,000 in 1975 dollars. And like the 8-bit micros that replaced them, memory was limited to around 64 kbytes. My CS professor quoted an (anonymous) maxim that the best possible bang for the buck could be obtained by allocating half of the available memory to the most powerful interpreter once could fit (written in assembly language, of course) and the other half to the interpreted program. The limited capacity and high costs led to the creation of BASIC interpreters for micros, the FORTH language, and various other environments. The Manx C compiler I mentioned could compile directly to machine language or to an interpreted code. The interpreted code had a significant speed penalty, and memory limitations made it hard to mix the two, although it was possible.

Probably the most famous of the early interpreted environments was the UCSD Pascal compiler and P-Code interpreter. It was designed with two objectives in mind. First, 8-bit micros just weren’t inherently good at handling compiled code from high-level languages. The smallest reasonable size for an integer was 16 bits and that took multiple machine instructions to process. Often, 32-bit variables were needed and those took even more cycles. Most CPUs had few registers, so there was a lot of memory access for temporaries. Also, stack operations weren’t always adequate. Manx C for the 6502 didn’t even try to seriously use the CPU stack—it was only 256 bytes deep and any non-trivial C program needed more than that for automatic variables. Instead, they used indexed addressing to create their own stack.

The second objective of UCSD Pascal was portability. If a P-Code interpreter could be written for a new processor, along with an interface to the OS, any program written could immediately be run on it, possibly without even being recompiled. Also, the compiler and other tools would be immediately available to the new environment.

Interpreters have always extracted a speed penalty. Classically, it was assumed to be about 10:1, i.e. the program spent 90% of its time figuring out what to do and 10% doing it. Advances in both machine architecture and interpreter design have whittled that down to where a good Java implementation with JIT compilation capability is half as fast as an executable from a good native compiler.

This is the background for several of my pet peeves about the industry. It seems to me that both the resurgence of interpreted languages and the ever-increasing levels of abstraction (think virtual object inheritance and templates in C++ as a starting point) have created an environment where execution efficiency is way down the priority list; down to almost the don’t care level. I mean with gigahertz CPUs and gigabytes of memory, why should efficiency matter?

One of the more absurd examples I still clearly remember was the hype with which Sun introduced Java. Every electronic door lock in a hotel should be running Java! Just a minute. A decent Java interpreter occupies about 1 Mbyte of memory and the JIT compiler needs about 1 Mbyte of RAM. That is before the program and its data. On the other hand, a decent electronic door lock written in C probably needs about 10K of code and 2K of data. Is there something wrong with this picture? What does Java bring to the table in such a case? Yes, it might be a little faster to write. Portability really isn’t an issue with a door lock. Neither is (most likely) security against hacking. How do you hack something with no external interface?

Another industry trend that should give pause to the lack of concern about efficiency is mobile computing. To be more specific, the hardest nut to crack in mobile computing is battery life. Now battery life has a fairly linear relationship to CPU cycles. So if I write a tablet OS or app in Java, my battery will last half as long as if I write it in C! That is slightly exaggerated, on several accounts. First, a lot of apps spend a significant amount of their time making OS calls, so if the OS call is efficient, the app will inherit some of that efficiency. Secondly, battery life on a mobile device may have more to do with things like display backlighting and RF transmission than with CPU cycles. However, even though the point may be overstated, it remains true that if a programmer wants maximum battery performance, they need to consider efficiency. I have a poorly written book reader app on my gigahertz cell phone that can take five seconds to go to a place in the book and render a screen of text! That is unconscionable!

I’m not saying that abstract constructs or interpreted languages should never be used. They have their place. The biggest success enjoyed by Java has been in connection with Internet browsers. A browser can be written in any language and run on any CPU and a Java program can run on it without change. That is absolutely essential in the Internet world where the browser might be running on an x86 Windows computer, a Motorola processor on an (older) Mac, or a smartphone that likely has some flavor of ARM processor in it. What I am saying is that the path of least resistance has put too much emphasis on these techniques without doing the proper cost/benefit analysis.

I maintain that any programmer who is worthy of his salary should be able to describe the assembly-language machine code generated by any construct he uses. That doesn’t mean he has to write it in assembly. Or that he has to disassembly and examine every line of high-level code he wrote. But too many programmers out there haven’t a clue what the CPU has to do to execute the code they write, or why one method of accomplishing something might be significantly faster than another.

My next article will illustrate with a story and give some specific examples.