Multicore Programming Practices Worth A Read

Multicore Programming Practices Worth A Read

The Multicore Association (MCA) just released the first Multicore Programming Practices guide that you can download for free. The MCA is also know for its array of multicore APIs including:

These have been out for awhile and I wanted to give the guide a read before writing about it. It is rather extensive but well worth reading. It addresses everything from load balancing to debugging. It is not all encompassing but it is pretty broad in the areas it does cover. It is worth reading even if you are not into embedded multicore programming yet because it provides a good overview the available techniques being employed in the industry.

The level of detail and number of examples varies significantly from section to section. For example, the load balancing section addresses the topic area and introduces top-down, bottom-up and hybrid decomposition but not much more.

The section on threading is more helpful. There are a number of C++ code examples. PThreads are examined closely. There is even PThreads example for loop parallelism.

The debugging section was on the thin side. It starts with general recommendations like this:

  1. Debug a serial version of the application.
  2. Use defensive coding practices when parallelizing serial applications.
  3. Debug a parallel version executing serially.
  4. Debug a parallel version using an increasing number of parallel tasks.

It is not a bad start and a good way to approach the problem but there is not much on how. Of course, that would require more specific target tool information.

One section I really liked was optimization. It does get into some details that are useful. For example, it discusses the use of restrict with C pointers (Fig. 1). This feature was introduced in C99 (ISO/IEC 9899:1999 standard). It even notes that it is not a standard C++ feature although many C/C++ compilers often include the support.

/* File: restrict.c */
void f (int* restrict dest, int * restrict src, int n) {
  int i;
  for (i=0;i < n; i++){
    dest[i]=src[i];
  }
}

Some examples are platform specific like the one addressing SIMD vectorization. Here it mentions AltiVec and uses an x86 SSE intrinsic library. It is a simple example but easy to extrapolate to other implementations. The discussion is also general so the comments apply to any almost SIMD environment.

There may be an item or two for some to quibble with but overall this is an impressive collection. Don't forget to read the appendices because they are useful as well.
 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish