AI data-processing workloads at the edge are transforming use cases and user experiences. Arm’s third-generation Ethos-U85 NPU helps meet the needs of future edge AI use cases. Ethos-U85 is the highest performing Ethos NPU. It addresses the growing demand for running advanced AI inference workloads at the edge, including transformer-based networks such as large language models (LLMs).
Arm also offers reference designs. For example, the Corstone-320 IoT reference design platform integrates Ethos-U85, among many items, to accelerate and simplify the chip development cycle. The reference design platform also includes a Fixed Virtual Platform (FVP). The FVP simulates an entire system, enabling cutting edge embedded software development and neural network deployment for Ethos-U85.
The example code in this technical blog post is tested on the Corstone-320 Fixed Virtual Platform (FVP). For more information and insights about Ethos-U85, the Corstone-320 reference design platform, and the Arm Fixed Virtual Platform (FVP), please visit Arm.com or developer.arm.com.
What do you get when you combine China’s premier open-source deep learning platform with Arm? Rocket fuel for innovation.
Arm has a long-standing partnership with Baidu. Together we accelerate the development of transformative edge AI solutions such as PaddlePaddle on embedded devices.
Through the partnership, Arm has worked with Baidu to deploy nine classical PaddleLite vision models on the Ethos-U85 NPU processor.
To date, the list of supported models includes:
The Arm-Examples GitHub repository provides a full development environment with six example use cases. In this blog post, we show one example workflow deploying the “ch_ ppocr_mobile_v2.0_rec” model (for OCR use case) on the Ethos-U85 NPU. We also note considerations for deployment of other common models. For detailed technical guidance, please see the deployment guide for each model in the repository.
Before you begin, please ensure that your running environment configuration meets the following requirements:
Create a virtual operating environment for model training or deployment. Please note, some other models in the repository may require different training and deployment virtual environments due to the model fine-tuning process. For more details, please refer to the deployment guide for each model in the repository.
# Create virtual environment with Python 3.9 python3.9 -m venv ppocr_rec source ppocr_rec/bin/activate cd ppocr_rec
Download the example code from GitHub and install the required packages.
# Download example source code git clone https://github.com/Arm-Examples/Paddle-on-Ethos-U.git cd Paddle-on-Ethos-U git lfs pull # Configure inference environment bash install.sh
Download PaddleLite models.
# Download ppocr_rec model wget -O ./model_zoo/PpocrRec_infer_int8/ch_ppocr_mobile_v2.0_rec_slim_opt.nb paddleocr.bj.bcebos.com/dygraph_v2.0/lite/ch_ppocr_mobile_v2.0_rec_slim_opt.nb
Use the conversion tool in the repository (write_model.py) to convert the model. The conversion process includes three main steps:
a) Convert the PaddleLite model (file with .nb extension) into an intermediate representation (IR) graph (.json file). The IR file is stored in the same directory as the input PaddleLite model automatically. (Known limitation: in this conversion situation, the --out_dir does not work)
# Convert .nb model into IR file (.json file) python ./readnb/write_model.py --model_path ./model_zoo/PpocrRec_infer_int8/ch_ppocr_mobile_v2.0_rec_slim_opt.nb --out_dir . # "g_ch_ppocr_mobile_v2.0_rec_slim_opt.json" is generated under the same directory with input model file
b) Manually adjust the intermediate representation (IR) model. As the adjustment parts are relatively scattered, we provide a model patch. This patch finishes the adjustment quickly and improves the developer experience.
# Modify the IR file with patch quickly. You could also do this modification manually. patch -p0 model_zoo/PpocrRec_infer_int8/g_ch_ppocr_mobile_v2.0_rec_slim_opt.json < readnb/test_asset/ppocr_rec/g_ch_ppocr_rec.patch
c) Optionally, use the conversion script again to convert the manually adjusted intermediate representation IR model into a TOSA diagram. Then use the compiler ethos-u vela provided by the Ethos-U official website. For more details about the ethos-u vela compiler, please check the introduction on PyPI or visit the technical documents on Arm Developer. You can also ignore this conversion step because it is already included in Step 5 automatically.
Build the OCR application and check the inference results.
# Run inference bash paddle_verify.sh -m ppocr_rec -p ./model_zoo/PpocrRec_infer_int8/test.jpg
An example of the test result is as follows:
telnetterminal0: Listening for serial connection on port 5000 telnetterminal1: Listening for serial connection on port 5001 telnetterminal5: Listening for serial connection on port 5002 telnetterminal2: Listening for serial connection on port 5003 handles.inputs->count is 1 input tensor scratch_addr address 0x7c11f840 input shapes 122880 copy input data into scratch_addr handles.outputs->io[x] shapes is 655360 output tensor output_addr address 0x7c1bf840 output shapes 655360 output bin [0x7c1bf840 655360] handles.outputs->count is 1 Shape : 655360 Rec Reuslut: Confidence: 0.966813 ============ NPU Inferences : 1 ============ Profiler report, CPU cycles per operator: ethos-u : cycle_cnt : 2083105832 cycles Operator(s) total: 574619648 CPU cycles Inference runtime: -987073648 CPU cycles total NOTE: CPU cycle values and ratio calculations require FPGA and identical CPU/NPU frequency Inference CPU ratio: 100.00 Inference NPU ratio: 0.00 cpu_wait_for_npu_cntr : 574619648 CPU cycles Ethos-U PMU report: ethosu_pmu_cycle_cntr : 2083105832 ethosu_pmu_cntr0 : 479 ethosu_pmu_cntr1 : 21 ethosu_pmu_cntr2 : 118511 ethosu_pmu_cntr3 : 0 ethosu_pmu_cntr4 : 592 Ethos-U PMU Events:[ETHOSU_PMU_SRAM_RD_DATA_BEAT_RECEIVED, ETHOSU_PMU_SRAM_WR_DATA_BEAT_WRITTEN, ETHOSU_PMU_EXT_RD_DATA_BEAT_RECEIVED, ETHOSU_PMU_EXT_WR_DATA_BEAT_WRITTEN, ETHOSU_PMU_NPU_IDLE] ============ Measurements end ============ Running Model Exit Successfully Application exit code: 0. Info: /OSCI/SystemC: Simulation stopped by user. [run_fvp] Simulation complete, 0 Dump to out_tensors.bin
To deploy PaddlePaddle models on Arm-based edge AI devices, optimize the model, prepare the software, and use the right hardware. These steps help you deploy AI applications at the edge for fast, efficient inference close to where the data is generated.
Learn ore about deploying AI models onto Arm-based edge AI hardware use cases with our IoT Learning Paths:
Arm Developer Learning Paths