Fixed functional hardware accelerators are specialised hardware units, meant to carry out a particular task faster than is possible in software running on a general-purpose CPU. They have been researched for many years, and it is well understood that using them can improve the performance and energy efficiency of a system. However, integrating hardware accelerators in an SoC environment can be challenging, as it requires additional design, verification and implementation effort. It also involves developing a new programming model for the host program to interact with the accelerator.
The Arm Coherent Accelerator Interface (ACAI) framework, developed within Arm Research, addresses the challenges of integrating and programming accelerators, thus enabling researchers to focus on SoC design space exploration and accelerator research. We collaborated with the University of Michigan and Xilinx Inc., to port the ACAI framework on the Xilinx Zynq Ultrascale+ MPSoC [1] platform, and benchmarked accelerators using the framework which demonstrated its benefits. The successful collaboration activity led to a presentation at the 2018 Design Automation Conference (DAC), in the 'IP Track' category, as well as plans to make the ACAI framework widely available for research purposes.
Figure 1 shows the logical overview of the ACAI framework. This maps onto the existing components of the Xilinx Zynq Ultrascale+ MPSoC platform, which offers the capability to add logic onto the FPGA to interact with the rest of the SoC.
Figure 1: Logic overview of the ACAI framework
The ACAI software component comprises of software libraries which assist with job creation, scheduling and dispatch and Linux device drivers which enable accelerator virtualization (sharing across different applications). In addition, we have user-mode job dispatch capability in the framework, which makes application development simple and easy, and enables fine-grain task acceleration.
The ACAI hardware component offers cache coherency and virtual addressing services to the accelerator, via standard AXI interfaces both for the data and config path. This enables researchers to use Xilinx HLS tools and synthesise hardware accelerators with the standard interfaces into the framework.
Figure 2: Zynq Ultrascale+ MPSoC ZCU102 Evaluation Kit
ACAI was ported to Zynq Ultrascale+ MPSoC ZCU102 Evaluation Kit using Vivado 2017 tools. Figure 2 shows the interfaces and configuration setup which were used to integrate ACAI into the FPGA.
We integrated two accelerators in the ACAI framework, benchmarked their performance and calculated speed ups against the software version of the same algorithm running on a single core Cortex-A53 in the Zynq Ultrascale+ MPSoC platform.
The Xilinx LogiCORE tool was used to generate 1D-FFT single precision complex numbers - FFT hardware accelerator with AXI interfaces for this example. The generated accelerator supported burst and pipelined mode. In the burst mode, the accelerator completes the FFT calculation for the current job, prior to starting the next job. In the pipelined mode, the accelerator is able to process data from the next job while the current one is being processed. The speedup estimated is around 3x to 10x against FFTW software library running on the CPU.
Figure 2: RGB2YUV accelerator speedup compared to software implementation
Xilinx HLS tools were used to synthesize the RGB2YUV accelerator. We generated a base version of the accelerator design which demonstrated good speedups, and further optimized it by pipelining the load, compute and storing of data. The speedups estimated are around 2x to 6x against a simple RGB2YUV algorithm running on the CPU.
The Arm Research Collaboration and Enablement team is dedicated to support research and academia through collaborative research and easy access to cutting-edge Arm and partner technologies. We believe that the development of the ACAI framework will prove useful to researchers around the world, and so as part of our Enablement roadmap we will be making ACAI RTL available for research purposes. If you think it will be useful to your project, please submit a proposal with the relevant details using the link below. ACAI will also be made widely available as encrypted IP for deployment on Xilinx MPSoC Ultrascale FPGAs in the first half of 2019.
Submit a Proposal
There is a growing interest in hardware accelerators amongst researchers, especially in the Machine Learning and Artificial Intelligence fields. The ACAI framework, along with HLS compatibility offers researchers a readily available platform so that they are able to focus on accelerator and system design research.
For more information, please get in touch:
Contact the ACAI team