Automotive Vision Systems Mix It Up With SIMD-MIMD Processor Architectures

In today’s automobiles, manufacturers competitively blend performance and custom features with cutting-edge safety technology. Tire-pressure monitoring, proactive rollover prevention/mitigation, adaptive headlights and/or night-vision assistance, smart air bags, and emergency response features are becoming industry standards, dramatically increasing our odds of accident survival. Similarly, sensor- or vision-based systems, such as adaptive cruise control and collision mitigation, blind-spot detection and collision warning systems, lane departure warning systems and rear-view cameras for back-up assistance, are quickly becoming mainstream.

In the cost- and risk-sensitive automotive market, vision-based and image-recognition systems are likely to experience strong growth over the coming years due to their cost advantages and multifunction capabilities. For example, cost-effective vision-based systems can offer road user/object classification in addition to highly precise detection, making them ideally suited for blind-spot detection, emergency braking and lane departure warning. More advanced systems can analyze traffic lights or read traffic signs, resulting in even further protection or providing helpful navigation information for the driver. And, with automobile manufacturers integrating radio detection and ranging (radar) or light detection and ranging (lidar) for sensing in addition to vision-based sensors, it is clear that enhancements to the existing vision-processing environment are required. In particular, support for running multiple vision- or sensor-based applications simultaneously is key.

To accommodate these requirements, various issues surrounding the vision processor must be addressed, including true real-time performance and power efficiency as well as software flexibility to account for various applications, range of recognition targets, and changing lighting and weather conditions. While parallelization through single-instruction/multiple-data (SIMD) architectures has speeded up the initial processing of an image, these processors have limited efficiency for the final image-processing steps, which are mainly serialized or require floating-point arithmetic. A new alternative is the utilization of a processor that offers efficient SIMD processing and also can be reconfigured on the fly. This enables sequential processing in multiple-instruction multiple-data (MIMD) operation, thereby providing an efficient and cost-effective way to support multiple applications running simultaneously to provide more comprehensive driver-assistance information.

The Alternatives – SIMD, MIMD or Mixed-Mode

Vision or image-recognition tasks require large amounts of data-level parallelism and real-time responses. SIMD elements are recognized as the most efficient for algorithms that can be designed with a highly parallel structure. The number of operations that can be done in parallel is theoretically limited only by the number of blocks available for processing. SIMD processors can manipulate large amounts of data in a highly efficient manner, enabling implementation of operations in software that conventional digital signal processors (DSPs) find cumbersome. For these reasons alone, the SIMD option can be the best choice for some applications to maximize performance without raising total cost. As a result, a SIMD environment is advantageous for image processing steps that require the same operations to be done on groups of pixels simultaneously. For typical image processing, SIMD is used for image scaling, image filtering and basic image-detection functions.

As mentioned, SIMD processors have limited efficiency for the final image-processing steps that are either mainly serialized or require floating-point arithmetic. Additionally, when compared with general-purpose processors, the reduced amount of control circuitry of a highly parallel SIMD processor reveals a flexibility gap.

While MIMD processors have their shortcomings for processing data in parallel, they are very well suited for processing steps that can be parallelized on the thread level, but not on the data level. In a typical image-recognition application, the primary filtering and detection will identify areas of the image that may contain useful information. The resulting processing of each area, however, will depend on the result of the primary detection. MIMD architecture enables handling of multiple areas in parallel, with each area processed by algorithms that are tailored toward the potential detection target. As an example, an area identified as a potential traffic light would be processed for confirmation through its shape and for its color, while an area identified as a potential road object would be processed for classification of the object (i.e. vehicle vs. bicycle vs. pedestrian). As some of these algorithms are based on floating-point arithmetic, using floating-point units (FPUs) can significantly improve their processing times.

Developed to eliminate the issues faced when using a pure SIMD architecture, modern mixed-mode solutions provide a robust platform for vision-processing applications. These devices can be used for parallel preprocessing of image data in SIMD mode and then reconfigured on-the-fly to sequentially process different execution threads for the required application in MIMD/multiprocessor mode, providing efficiency with the greatest amount of flexibility. This dynamic reconfiguration is made possible by a slight increase in logic circuit size, which enables a device to process all processing elements of vision-processing algorithms most effectively in the desired mode of operation.

The reconfiguration enables the core to operate as a SIMD processor (with N processing elements), as a MIMD/multiprocessor processor (with N/4 processing elements), or as a mixed-mode processor (with N/2 processing elements and N/8 processing units). The hardware reconfiguration involves combining four processing elements and their memory blocks to act as a complete processing unit. Each of these processing units therefore acts as an independent processor core, using the processing element’s internal memory as program and data cache. In addition, each of these processing units includes its own FPU, providing the desired acceleration for floating-point routines.

As shown in Figure 1, in the mixed mode, half of the processing elements are combined in groups of four to act as processing units, while the other half of the processing elements operate as SIMD elements.

In today’s automobiles, manufacturers competitively blend performance and custom features with cutting-edge safety technology. Tire-pressure monitoring, proactive rollover prevention/mitigation, adaptive headlights and/or night-vision assistance, smart air bags, and emergency response features are becoming industry standards, dramatically increasing our odds of accident survival. Similarly, sensor- or vision-based systems, such as adaptive cruise control and collision mitigation, blind-spot detection and collision warning systems, lane departure warning systems and rear-view cameras for back-up assistance, are quickly becoming mainstream.

In the cost- and risk-sensitive automotive market, vision-based and image-recognition systems are likely to experience strong growth over the coming years due to their cost advantages and multifunction capabilities. For example, cost-effective vision-based systems can offer road user/object classification in addition to highly precise detection, making them ideally suited for blind-spot detection, emergency braking and lane departure warning. More advanced systems can analyze traffic lights or read traffic signs, resulting in even further protection or providing helpful navigation information for the driver. And, with automobile manufacturers integrating radio detection and ranging (radar) or light detection and ranging (lidar) for sensing in addition to vision-based sensors, it is clear that enhancements to the existing vision-processing environment are required. In particular, support for running multiple vision- or sensor-based applications simultaneously is key.

To accommodate these requirements, various issues surrounding the vision processor must be addressed, including true real-time performance and power efficiency as well as software flexibility to account for various applications, range of recognition targets, and changing lighting and weather conditions. While parallelization through single-instruction/multiple-data (SIMD) architectures has speeded up the initial processing of an image, these processors have limited efficiency for the final image-processing steps, which are mainly serialized or require floating-point arithmetic. A new alternative is the utilization of a processor that offers efficient SIMD processing and also can be reconfigured on the fly. This enables sequential processing in multiple-instruction multiple-data (MIMD) operation, thereby providing an efficient and cost-effective way to support multiple applications running simultaneously to provide more comprehensive driver-assistance information.

The Alternatives – SIMD, MIMD or Mixed-Mode

Vision or image-recognition tasks require large amounts of data-level parallelism and real-time responses. SIMD elements are recognized as the most efficient for algorithms that can be designed with a highly parallel structure. The number of operations that can be done in parallel is theoretically limited only by the number of blocks available for processing. SIMD processors can manipulate large amounts of data in a highly efficient manner, enabling implementation of operations in software that conventional digital signal processors (DSPs) find cumbersome. For these reasons alone, the SIMD option can be the best choice for some applications to maximize performance without raising total cost. As a result, a SIMD environment is advantageous for image processing steps that require the same operations to be done on groups of pixels simultaneously. For typical image processing, SIMD is used for image scaling, image filtering and basic image-detection functions.

As mentioned, SIMD processors have limited efficiency for the final image-processing steps that are either mainly serialized or require floating-point arithmetic. Additionally, when compared with general-purpose processors, the reduced amount of control circuitry of a highly parallel SIMD processor reveals a flexibility gap.

While MIMD processors have their shortcomings for processing data in parallel, they are very well suited for processing steps that can be parallelized on the thread level, but not on the data level. In a typical image-recognition application, the primary filtering and detection will identify areas of the image that may contain useful information. The resulting processing of each area, however, will depend on the result of the primary detection. MIMD architecture enables handling of multiple areas in parallel, with each area processed by algorithms that are tailored toward the potential detection target. As an example, an area identified as a potential traffic light would be processed for confirmation through its shape and for its color, while an area identified as a potential road object would be processed for classification of the object (i.e. vehicle vs. bicycle vs. pedestrian). As some of these algorithms are based on floating-point arithmetic, using floating-point units (FPUs) can significantly improve their processing times.

Developed to eliminate the issues faced when using a pure SIMD architecture, modern mixed-mode solutions provide a robust platform for vision-processing applications. These devices can be used for parallel preprocessing of image data in SIMD mode and then reconfigured on-the-fly to sequentially process different execution threads for the required application in MIMD/multiprocessor mode, providing efficiency with the greatest amount of flexibility. This dynamic reconfiguration is made possible by a slight increase in logic circuit size, which enables a device to process all processing elements of vision-processing algorithms most effectively in the desired mode of operation.

The reconfiguration enables the core to operate as a SIMD processor (with N processing elements), as a MIMD/multiprocessor processor (with N/4 processing elements), or as a mixed-mode processor (with N/2 processing elements and N/8 processing units). The hardware reconfiguration involves combining four processing elements and their memory blocks to act as a complete processing unit. Each of these processing units therefore acts as an independent processor core, using the processing element’s internal memory as program and data cache. In addition, each of these processing units includes its own FPU, providing the desired acceleration for floating-point routines.

As shown in Figure 1, in the mixed mode, half of the processing elements are combined in groups of four to act as processing units, while the other half of the processing elements operate as SIMD elements.

Flexibility is the Key

With heavy parallelism in the chip, an automotive-grade, mixed-mode vision processor, such as the IMAPCAR2® image processor from NEC Electronics, can provide better performance than DSPs while operating at a fraction of the DSP’s operating frequency. With just a few instructions, the 128 SIMD processing elements become 32 processing units, each with its own data and instruction memory and FPU, resulting in a 32-processing-unit multiprocessor system on a single chip. With its advantage on overall power consumption, designers utilizing some of today’s mixed-mode architectures are able to minimize the additional cost for cooling and power management. Thus, a mixed-mode processing solution can enable high performance at low cost while reducing processing time, when compared with a sequential processor, parallel processor or multicore processor, as seen in Figure 2.

In a pure SIMD environment, the control processor broadcasts parallel instructions to processing elements and executes the sequential part of the application, including control of peripherals. In mixed-mode solutions, the control processor has three main tasks: broadcast instructions to processing elements; control the instruction flow; and perform address calculations effectively. Thus, the control processor must be designed to have an improved very-long-instruction-word (VLIW) structure to support multiple instructions per clock cycle, improving sequential processing. State-of-the-art, mixed-mode processing solutions enhance the data-processing capabilities from traditional 8-bit to 16-bit data by the 128 processing engines, as well as increase the internal parallel execution from four to six instructions in the VLIW, enabling faster processing speeds.

Such an execution environment is particularly beneficial for vision-processing algorithms that can be parallelized due to redundancy in computation involved in the algorithms. Other algorithms that require a more sequential approach can be parallelized to an extent using data restructuring. The time involved in data restructuring may sometimes exceed the time required for the actual computation using a serial machine. A mixed-mode solution attempts to optimize parallel as well as sequential programming using the same chip. As a result, high-performance, mixed SIMD/MIMD-mode processors can handle a range of image processing functionality in a single chip―from image input via video capture, to image processing and segmentation, to object recognition, as seen in Figure 3.

Conclusion

In today’s automobiles, multiple sensor- or vision-based systems, such as adaptive cruise control and collision mitigation, blind-spot detection and collision-warning systems, lane-departure warning systems and rear-view cameras for backup assistance are moving into the mainstream. To meet the stringent price, real-time performance demands and power-consumption requirements of the automotive market, designers of automotive vision-processing systems should consider moving to a reconfigurable mixed SIMD/MIMD-mode architecture. The ability of a mixed-mode processor to dynamically switch between modes facilitates the efficient implementation of vision-processing and recognition algorithms in real time while also offering the required flexibility at a low cost. The addition of reconfiguration to mixed or multiprocessor modes facilitates the recognition stage in the algorithm, thereby eliminating the drawbacks of a pure SIMD architecture. Thus, these solutions provide a processor design that favors both data and control parallelism in an integrated automotive application environment. These cores can also interface with other sensors, like radar and lidar, with the help of external circuits. With support for both SIMD and MIMD operations, mixed-mode processors provide flexibility and performance for the effective implementation of advanced safety systems that simultaneously combine multiple applications with different computation requirements.

About the Author

Jens Eltze is a Principal Technical Application Engineer, Automotive Strategic Business Unit at NEC Electronics America, Inc. He received a master’s degree in electrical engineering from the University of Karlsruhe, Germany. He can be reached at [email protected] .