Hardware acceleration including graphics processing units (GPU) and field programmable graphics arrays (FPGAs) have been made increasingly popular by the likes of Microsoft for accelerate codes that are well-suited for parallel execution. Let’s take a closer look at why FPGA is so popular, how to improve the development process using OpenCL, and how SoftServe benchmarked using Cyclone V.
FPGAs are now treated as promising hardware accelerators for modern AI, ML, and big data solutions mainly because of their highly parallel nature and ability to process data in-place—substantially reducing the processing feedback time and infrastructure required. Moreover, FPGAs can be thought of as a kind of “soft” hardware able to be re-tai lored for a specific task, whenever such a need appears.
Until quite recently, FPGAs were programmed in Verilog and VHDL hardware definition languages. The development process was long and restricted to a relatively small hardware engineers’ community. But the emergence of toolkits is making FPGA programming possible in C/C++. This reduces development time while dramatically increasing the developers’ community. One such toolkit is Open Compute Language (OpenCL).
Using OpenCL for FPGA programming is a two-stage process:
- Code is written in OpenCL language (a subset of C)
- The specific toolkit takes compiled OpenCL code as an input to produce the final FPGA image as an output
The first stage essentially mirrors conventional C/C++ programming. The second stage specifically requires a sophisticated tool to load the developed code on to the FPGA. This process is carried out without developer assistance.
FPGA programming using OpenCL is advantageous due to its conventional programming methodology, but keep in mind that learning how to leverage toolkit possibilities takes time. The good news is that the effort is well worth it based on the potential return on investment.
Benchmarking with Cyclone V FPGA
Our objective was to solve practical problems using the FPGA SoC. First, Cyclone V FPGA from Intel was chosen because it is integrated into Terasic‘s De10-Nano SoC which is widely used in university programs—which is relevant to expanding the FPGA developer community.
SoftServe chose the task of calculating a user’s heart rate based on readings taken only from the forehead region using a laptop’s integrated camera.
Two challenges led to this task selection:
- Porting from a laptop to a much smaller form-factor device (FPGA SoC) and solving an intrinsically, non-parallel problem because these are th e challenges that FPGA solves best.
- Initially, FPGA loses to CPU performance. However, we applied several optimization tricks related to programming mathematical operations, as well as general code structure, including:
- Replacing all float point operations with fixed point
- Replacing multiplications/divisions by additions whenever possible
- Loops unrolling
- Optimized data exchange between CPU and FPGA
Eventually with FPGA, we successfully achieved 50% faster pure calculations than with CPU.
OpenCL is a promising and powerful tool for programming FPGAs. It substantially reduces FPGA development time and opens FPGA development for a wider programmers’ community.