Distributed processing offers performance advantages, but programming it in a scalable fashion can be a challenge. Enterprise FPGA platforms like Xilinx’s Alveo and Intel’s PAC are popular because they can support parallel processing in programmable hardware. Unfortunately programming FPGAs isn’t an easy task either. Partitioning systems across a number of chips is even more difficult.
CacheQ has designed a heterogeneous distributed acceleration system that can target a range of platforms from arrays of processors to FPGAs (see figure). The company’s QCC Acceleration Platform is a development environment that handles these heterogenous computing resources. It’s designed to provide orders of magnitude of performance improvement while significantly reducing development time.
CacheQ targets a variety of platforms from GPGPUs to FPGAs.
“Demand for hardware acceleration beyond x86 is tremendous,” says Clay Johnson, chief executive officer and co-founder of CacheQ Systems. “Our goal is to simplify high-performance data center and edge-computing application development. The QCC Acceleration Platform meets that goal and will enable new solutions across a variety of applications, including life sciences, financial trading, government, oil and gas exploration, and industrial IoT.”
The platform is based on the CacheQ virtual machine (CQVM). Applications are compiled using an LLVM-based compiler that generates the CacheQ target language (CTL). This approach is typical for LLVM-based compilers, including popular C/C++ versions. In turn, it can be used to target a particular platform like an FPGA. A system can also be run through the virtual-machine optimizer and partitioner.
Partitioning is important for large applications that will span multiple chips, systems, or even data centers. It can be a challenging problem because of timing, memory, and pipeline considerations. Many developers do this manually, but having a tool that’s able to handle it automatically or under user-configuration control allows for easy repartitioning. Thanks to the approach, custom many-port pooled memory architectures can be included, significantly improving overall system performance. FPGAs lend themselves to such a technique.
CacheQ needn’t target a homogeneous system. In fact, most systems will be heterogeneous in nature, making partitioning and support issues like memory management critically important. Issues like pointer support, memory-pool management, and multiport memory designs come into play, leading to a rather complex solution. Repartitioning a system can change how these pieces work together and how they’re structured. Manual programming would be prone to errors, whereas the compiler and tool approach used here handles these automatically, enabling changes to be made and quickly evaluated.