Support forums

AI forum Some Questions About Vela Compiler Report

State Suggested Answer
Locked Locked
Replies 1 reply
Answers 1 answer
Subscribers 15 subscribers
Views 2386 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Some Questions About Vela Compiler Report

AAAAA8877 over 3 years ago

Hi,

I recently trained two model, and used vela compiler to estimate their performance.

There are their model architecture :

(I changed the right model's middle layer from "Standard Convolution(Conv2D_16x3x3x16(NHWC))" to "Depthwise Seperable Convolution(DepthwiseConv2D_1x3x3x16(NHWC)+Conv2D_16x1x1x16(NHWC))

I compiled them with the same system config and memory mode.

(I selectd accelerator ethos-u55-128 and add --optimise Performance command)

(System Config)

(Memory Mode)

Finally,there are their vela report :

(Model with all Standard Conv)

(Model with Depthwise Seperable Conv)

I compared them and had some question :

1. In "Total SRAM Bandwidth", why the model with depthwise conv is bigger than the model with all standard conv ?

Is this means depthwise conv will transfer the feature map between external SRAM and NPU internal SRAM more frequently?

2. In "Neural network macs", the numbers of depthwise conv is 3.5 times smaller than the standard conv one, but only reduce the "NPU cycles" from 2814460 to 2535830 .

It seems that using depthwise conv to improve the model's inference time doesn't get very good effect.

I read the Ethos-U55 NPU Technical Reference Manual(https://developer.arm.com/documentation/102420/0200/?lang=en), and found that the table in topic 4.8 "Operator and Performance"

In this table, Depthwise Conv only use 16 MAC per cycle.

So I think the reason of this question is that DepthwiseConv has lower MAC utilization than standard Conv. Are there other reasons ?