On the left side below is the ANSI-C software code that makes the call to the C Hough function.
The input to the Hough function is the image to be processed (TileIn). The output of the Hough function is the location of the most predominant line in the image (HoughMax, RhoMax, ThetaMax). The column on the left shows the corresponding Handel-C code to call the Handle-C Hough function.
The single call to the Hough function is replaced with NUM_PAR_HOUGH Hough calls (each doing a part of the overall processing) done in parallel. This will perform the Hough processing NUM_PAR_HOUGH times faster—at the cost of more internal FPGA memory being used. Software cannot do this kind of true parallel processing. Note that TileIn is a multi-port memory, allowing simultaneous reads and a write in a single clock cycle. A new input is required to tell the Hough function which part of the processing to do (iCnt).
To process in parallel, NUM_PAR_HOUGH duplicate copies of the input image are required, as we need to do NUM_PAR_HOUGH concurrent reads of the input image. The preprocessing takes a few thousand clock cycles to do for each input image, but saves hundreds of thousands (or more depending on the size of NUM_PAR_HOUGH of clock cycles due to the parallel Hough processing. So it’s well worth it! Note this preprocessing is pipelined and has one clock cycle of latency.
Because the parallel Hough processing returns NUM_PAR_HOUGH predominant lines (one for each parallel process), we must find the most predominant line of these in a post-processing step. This post-processing takes only NUM_PAR_HOUGH clock cycles, adding almost no time to the overall processing time.