Deep-learning technology demands numerous tensor arithmetic operations. To support real-time execution, memory and processor performance must meet higher targets than possible with standard software-driven architectures. This leads to the use of designs based on FPGA hardware accelerators performing parallelized and heavily pipelined tensor-arithmetic operations. To avoid pipeline stalls, data must be in the right place, at the right time and in the right format. Learn how FPGA based orchestration hardware overcomes accelerator pipelines stalling and allows operation at peak efficiency.

