There are probably a lot of engineers out there like me who used to get under the hood of their cars. Some may even remember what a timing light was for. At best, you can change the oil and spark plugs in today's cars. Still, it's nice to know what's going on under the hood, even though most people won't even get to open it.
Compilers are much the same. There's little you can do to change them, although compiler options offer some control over the kinds of optimizations being performed. The more ambitious person might even look at the compiler's output using disassemblers available with most debuggers, just to see how optimizations operate.
The types of convolutions employed by a compiler when dealing with an architecture not designed with a compiler in mind can turn out to be quite interesting. For example, simple optimizations can exploit the ability to load a pair of 8-bit registers with a constant using a 16-bit load instruction. More complex optimizations may utilize selective loop unrolling to take advantage of processors that more efficiently execute straight line code instead of looping. On the other hand, some architectures work better with loops if they're small enough to fit into a cache.
In most cases, the optimizations that a compiler will perform suit the task at hand as well as the target processor. However, sometimes the programmer may not be giving the compiler the right source code. Take C for example. It's easy to define an integer, but the default integer size can vary, depending upon the compiler, target, and compiler optimization settings. For instance, many C compilers for 8-bit micros use 16-bit integers as the default for an int definition. Often, an 8-bit integer is suitable for a particular algorithm. Using 16-bit integers throughout the application provides a type of consistency that can result in an application that's twice the size of one with 8-bit integers. This is often why a math library for an 8-bit architecture will typically have a collection of multiplies or divides optimized for different integer length combinations. A compiler will usually exploit a larger register if necessary, but otherwise it will implement an algorithm using a smaller width register due to the resulting overflow or underflow.
Therefore, it's imperative that programmers go with an algorithm featuring suitable data, parameter, and return definitions so that the compiler can do its job. This is particularly true for architectures that were originally designed for hand optimization or for specific algorithms ranging from bit banging to processing multimedia streams.