Four Keys to Successful Multicore Optimization

For many years, increases in machine vision speed came almost automatically with increasing microprocessor speeds. However, this no longer is true with multicore PC architectures, which require major software design changes to take advantage of the parallel processing capabilities.

A successful multicore strategy for machine vision can be implemented at multiple levels. Independent high-level tasks—especially those with hardware dependencies such as acquisition and I/O—can be written to run asynchronously on separate cores. This leaves the processor free to concentrate on tasks that are not blocked. Individual vision tools also can be parallelized so that they divide their processing task among several cores.

In the past, vision applications depended on advances in PC hardware performance to handle bigger and more complex applications. Improved performance resulted from faster CPUs and associated hardware improvements. But faster processors require greater and greater heat dissipation, to the point where cooling has become a limiting factor.

Manufacturers such as Intel and AMD have addressed this by moving to an approach that uses multiple processors to do the job previously done by a single processor. These processors are packaged on a single chip. Two, four, and eight-core processors now are common while much higher density models also are being designed.

A 2-GHz dual-core processor might appear to have the same computing power as a 4-GHz single-core processor, but this is rarely true. To take full advantage of each core, software applications must be written to distribute the computation between the cores. Otherwise one core will sit idle for at least part of the time.

Optimized Software Is the Key

You cannot simply move an existing machine vision application from a single-core PC to a multicore PC and expect to see a significant performance improvement. In fact, some applications may not run any faster on a multicore machine due to operating system overhead and other inefficiencies.

Application developers and vision software vendors must rewrite their programs if they want to take advantage of multicore architectures to speed up their applications. This can be a complex task, and many algorithms do not easily lend themselves to parallel processing.

To achieve successful multicore optimization for machine vision applications, four key areas must be addressed: application optimization, vision tool optimization, tuning for overall system performance, and software portability.

Processes and Threads

The PC operating system manages programs as separate processes. Each process has an associated context which makes it appear to the program that it owns all of the computer resources such as CPU, memory, and I/O. When a process is blocked such as when it is waiting for an I/O resource or when its time slice ends, the operating system saves the current context and swaps in another process. The operating system juggles process priorities to be as responsive as possible to a wide range of demands, most of which are invisible to you.

A multithreaded program can be written so that different sections run simultaneously and independently. This is similar to running multiple processes, but threads are much lighter weight; in particular, they share the same address space. This allows the operating system to quickly switch between them and makes it easy for them to share data when running in parallel.

Multithreading is especially well suited for multicore PCs. Those parts of a machine vision algorithm that previously ran sequentially can be partitioned into separate threads that now run in parallel on separate cores.

Commercial Multithreaded Software

Writing multithreaded application code is not simple, and often there are timing dependencies that make it hard to debug in a real-world environment. It also may require the underlying machine vision libraries to be written in a re-entrant manner that allows multiple instances of the program to execute in parallel. It takes a skilled programmer to write robust multithreaded applications. For this reason, writing custom software at the application layer to take advantage of a multicore PC usually is only justified in very demanding applications.

It’s usually much more effective for machine vision users to purchase commercial software already optimized for multicore PCs. Off-the-shelf solutions may not be as efficient as custom code, but they can provide significant benefits at very low cost.

Application Optimization

Application-level software can be optimized for multicore PCs by creating separate threads for:
•?Tasks with hardware dependencies, such as image acquisition, accept/reject results, and operator interaction. These threads often are designed to minimize unpredictable hardware delays. For example, the system needs to be ready to respond to a trigger event but should not delay image processing to poll the triggering hardware every several milliseconds.
•?Each camera in a multicamera application. This allows each thread to run as soon as its camera is triggered.
•?Different machine vision tasks within a vision application. For example, one thread might handle part alignment while another measures critical dimensions. However, this only works if the tasks are not dependent on each other, and the benefit will be small if one task is much shorter than the other.

Some commercial machine vision products build-in these features. For example, Cognex VisionPro™ software can automatically create separate threads for image acquisition and vision processing. The software is designed to automatically detect the number of cores in a PC and create threads based upon the number of cores available.

This type of scalability is a great advantage in multicore PCs for applications with multiple image acquisition and vision processing tasks that need to perform simultaneously. It’s even beneficial on single-core PCs because image acquisition does not use much CPU time and can run in parallel with image processing operations.

Vision Tool Optimization

In addition to application-level optimization, it’s possible to optimize machine vision tools by parallelizing their algorithms so they use multiple cores simultaneously. However, not all vision tools can be easily parallelized. In general, parallelization is most helpful for image processing filters or other vision tools that run local operations on small regions of the image. Commonly used filters include median Gaussian and morphology operations.

These can be optimized by dividing the image into different pieces and assigning each one to a separate thread. The results from each thread then are combined to produce the final result (Figure 1). The final speedup depends on the algorithm and the number of cores. Because of overhead, there always will be some inefficiencies, so even a well-optimized vision tool may not run eight times faster on an eight-core PC.

Figure 1. Example of Partitioning an Image Across Multiple ThreadsZoom here

Unfortunately, many vision applications spend most of their processing on tools that are much more complex than simple image processing filters. It’s not always possible to parallelize complex vision tool algorithms such as alignment. In these cases, optimizing the tool might only benefit a small portion of the algorithm.

In response, Cognex is working to optimize its most important alignment and inspection tools. For example, the PatInspect™ Tool has been redesigned so that inspection steps are divided among the available cores. Even when the percentage improvement is lower than for simple image processing filters, the overall application may benefit more since complex machine vision tools generally consume a larger portion of the overall application.

Tuning for Overall System Performance

It might seem that the fastest vision application would be one that had control over every processor core in the PC and which created one thread to run on each core (Figure 2). Real-world applications are not that simple. The PC also must support operating system, machine control, and other background tasks. In practice, the optimum number of threads for the vision application may not necessarily be the same as the number of cores in the PC, and it may not make sense to assign each thread to a specific core (Figure 3).

Figure 2. A Vision Application Divided Into Three Threads Running on a Four-Core PC

Figure 3. One Vision Thread Per Core May Not Be the Optimum Choice

The only way to determine the optimum number of machine vision threads is to test it under realistic conditions. For this reason, Cognex’s CVL™ and VisionPro™ software libraries provide a simple method to set the number of threads for multicore-aware vision tools in an application. This top-level capability lets you easily tune the system for best overall performance.

Software Portability

Another real-world concern is software portability from one PC to another. PC hardware changes so quickly that many vision applications will be deployed on multiple PC models over their lifetime either when new vision stations are deployed or when a PC needs to be replaced. The machine vision application frequently is developed on a different PC than the one on which it is deployed. Additionally, replacing PCs deployed in manufacturing lines is a constant maintenance issue.

Since the number of cores available may change over time, it’s important to have a vision application that can account for any number of cores in the system. Otherwise, redeploying the existing system on a different PC may require recompilation, or worse, rewriting the application software. This can be cost prohibitive for many companies as development stations are modified and developers move on to other projects.

To avoid this, the software libraries can automatically detect the number of cores on a PC and dynamically adjust the number of threads that they create. This allows applications written for a four-core PC to run efficiently on an eight-core PC without touching the source code or recompiling. Downstream maintenance savings are provided while offering the capability to upgrade performance simply by deploying the system on a PC with more cores.

Summary

Optimizing a machine vision application for multicore PCs can be a complex process with unpredictable results. Developers need to pay close attention to achieve the best overall system performance. In particular, field testing under real-world operating conditions is the only way to fully measure system throughput.

To maximize the benefits of multicore PC technology in machine vision applications, developers should consider several key questions when evaluating machine vision software products. These not only should include obvious points such as whether some image processing filters have been optimized for multicore, but also incorporate factors that can significantly impact the performance of the overall application, including:
•?Can the software automatically create separate acquisition and processing threads to speed system throughput and responsiveness?
•?Does the software allow you to write your own multithreaded application?
•?Can you tune the number of threads for best overall system performance?
•?Does the software have the capability to automatically detect and adjust the number of threads, based on the number of cores, without having to rewrite the application?

By keeping these points in mind, you can maximize your options and minimize work to take full advantage of multicore PC technology.

About the Author

John Petry has worked in machine vision for 20 years and currently is the marketing manager for the Cognex Vision Software Business Unit. He has been a software developer, an engineering manager, and a product manager for a wide range of products. Mr. Petry holds five patents in machine vision and a B.S. degree from the Massachusetts Institute of Technology. Cognex, 1 Vision Dr., Natick, MA 01760, 508-650-3140, e-mail: [email protected]

May 2009