Wechseln sie die bildschirmausrichtung ihres geräts ins hochformat.

by Taras Kurnytskyi

FPGA Development (Using OpenCL)

Hardware acceleration including graphics processing units (GPU) and field programmable graphics arrays (FPGAs) have been made increasingly popular by the likes of Microsoft for accelerate codes that are well-suited for parallel execution. Let’s take a closer look at why FPGA is so popular, how to improve the development process using OpenCL, and how SoftServe benchmarked using Cyclone V.

Why FPGA?

FPGAs are now treated as promising hardware accelerators for modern AI, ML, and big data solutions mainly because of their highly parallel nature and ability to process data in-place—substantially reducing the processing feedback time and infrastructure required. Moreover, FPGAs can be thought of as a kind of “soft” hardware able to be re-tailored for a specific task, whenever such a need appears.

Until quite recently, FPGAs were programmed in Verilog and VHDL hardware definition languages. The development process was long and restricted to a relatively small hardware engineers’ community. But the emergence of toolkits is making FPGA programming possible in C/C++. This reduces development time while dramatically increasing the developers’ community. One such toolkit is Open Compute Language (OpenCL).

OpenCL

Using OpenCL for FPGA programming is a two-stage process:

  1. Code is written in OpenCL language (a subset of C)
  2. The specific toolkit takes compiled OpenCL code as an input to produce the final FPGA image as an output

The first stage essentially mirrors conventional C/C++ programming. The second stage specifically requires a sophisticated tool to load the developed code on to the FPGA. This process is carried out without developer assistance.

FPGA programming using OpenCL is advantageous due to its conventional programming methodology, but keep in mind that learning how to leverage toolkit possibilities takes time. The good news is that the effort is well worth it based on the potential return on investment.

Benchmarking with Cyclone V FPGA

Our objective was to solve practical problems using the FPGA SoC. First, Cyclone V FPGA from Intel was chosen because it is integrated into Terasic‘s De10-Nano SoC which is widely used in university programs—which is relevant to expanding the FPGA developer community.

SoftServe chose the task of calculating a user’s heart rate based on readings taken only from the forehead region using a laptop’s integrated camera.

Two challenges led to this task selection:

  1. Porting from a laptop to a much smaller form-factor device (FPGA SoC) and solving an intrinsically, non-parallel problem because these are the challenges that FPGA solves best.
  2. Initially, FPGA loses to CPU performance. However, we applied several optimization tricks related to programming mathematical operations, as well as general code structure, including:
  • Replacing all float point operations with fixed point
  • Replacing multiplications/divisions by additions whenever possible
  • Loops unrolling
  • Optimized data exchange between CPU and FPGA

Eventually with FPGA, we successfully achieved 50% faster pure calculations than with CPU.

Conclusion

OpenCL is a promising and powerful tool for programming FPGAs. It substantially reduces FPGA development time and opens FPGA development for a wider programmers’ community.