The Explosion Of Multicores: Use Software To Level The Playing Field
As CPUs and graphics processors (GPUs) evolve, many of their design features are beginning to look remarkably similar. As a result, many of today's most common workloads will soon have a choice about where to execute. All the major hardware providers have told users to expect processors that feature increasingly non-uniform and complex memory hierarchies, rapidly increasing core (and thread) counts, and the integration of specialized acceleration units.
These new processor designs won't be friendly to legacy code bases optimized for single-threaded, uniform memory systems, or, for that matter, to programmers without the time or expertise to create tuned, processor-specific code. If we want to fully utilize these new hardware designs, something needs to change about the way we write software.
For the last 25 years, developers have been used to programming traditional CPUs—single-core processors with integrated floating-point units, shared memory, and a large uniform cache. As a result, software environments— compilers, debuggers, application, platforms, and libraries— have been created to support programming and running applications on these types of CPUs.
CHANGES AHEAD
Several different trends are now converging to render the existing software infrastructure obsolete.
Luckily, cutting-edge software coming to market will let developers harness the power of these next-generation processors, without requiring a radical change in their working habits.
While engineers are already struggling to meet the software demands for quad-core processors, the spectre of massively multicore designs looms. At the Intel Developer Forum last fall, the chip giant first announced a prototype design of "Polaris," an 80core processor with programmer-managed distributed memories and non-uniform caches.
Add to this the increasingly tight integration of GPUs and CPUs, demonstrated in AMD's Fusion project, as well as the growing movement to leverage GPUs as math coprocessors, and we can see that the obstacles facing engineers in relation to processor design will only increase exponentially.
In addition, applications designed for today's traditional single-threaded CPUs could be rendered meaningless if they can't scale to increasingly sophisticated architectures, wasting organizations' precious resources. Today's software simply isn't ready for where AMD, IBM, and Intel are bringing us.
Encouragingly, one of the inherent differences between traditional and multicore processors—the parallel architecture of multicores—is inspiring new software approaches that enable engineers to not only take advantage of the increased power offered by the increase in the number of cores per processor, but also create applications that can scale to hundreds or even thousands of cores.
Stream programming is a data parallel programming method compatible with distributed, explicitly managed memory that offers vastly superior productivity, performance, and efficiency compared to outdated serial programming models that aren't designed to cope with the vastly increased parallelism seen in these new processors.
Using a stream programming approach, developers with traditional skills can quickly and easily build applications using existing tools such as gcc, gdb, and Intel compilers leveraging C, C++, and even Matlab conventions and skills.
A NEW TECHNIQUE
With this model, developers can easily exploit the full potential of industry-standard multicore processors, programming a wide variety of hardware platforms with a
single application-programming interface. As these hardware platforms evolve, a developer's application binary will continue to run
on these new platforms, maximizing their return on software
development investments. 
While the challenges around multicores and the converging trends associated with the new architectures are daunting to engineers, new and innovative software technologies such as stream programming are assuaging concerns. This pioneering software holds the most promise in fully exploiting the power and performance of these converging designs, allowing engineers and organizations to propel massively multicore processors out of the realm of research and supercomputing and into general-purpose computing.