This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What does the MACs(8x8) of Ethos-U55 specification mean?

chong_9 over 5 years ago

Hi,

I read the Ethos-U55 specification and I can't understand what the "MACs(8x8) 32,64,128,256" mean.

Can anyone help me with more detailed explanation?

Thanks.

Top replies

Sandeep Singh over 5 years ago +2

Multiply-Accumulate Operation (MAC)- Computations in NN network involve a m ultiplication and then an addition and are thus referred to as Multiply-Accumulate Operations (MACs). These (32/63/128...
Sandeep Singh over 5 years ago in reply to chong_9 +2 verified

Let me explain in more detail with considering 256 MAC config. 256 MAC config means that can do 256 8x8 multiplications per cc. A MAC counts as two operations (mul+add). Suppose you have a network...

0 Sandeep Singh over 5 years ago

Multiply-Accumulate Operation (MAC)- Computations in NN network involve a multiplication and then an addition and are thus referred to as Multiply-Accumulate Operations (MACs). These (32/63/128/256) configuration of U55 means that (32/63/128/256) number of 8x8 MACs/cc supported in respective version of the config of U55.Note the higher the MAC config, more will be number of DPUs(dot product unit), adders, area size, power etc.
Cancel
Vote up +2 Vote down

Cancel
0 chong_9 over 5 years ago in reply to Sandeep Singh

Not sure it's right, just my understanding that one "8x8 MACs/cc" means there are 8 multiply-accumulate circuits operation in parallel at one clock cycle and the inputs of each multiply-accumulate circuit has two 8-bit numbers.

I also see the Cortex-M55 specification and the Multiply-accumulate (MAC)/cycle up to 2 x 32-bit MACs/cycle, 4 x 16-bit MACs/cycle and 8 x 8-bit MACs/cycle. User can configure the operation of MAC.

Could you please explain more for "8x8 MACs/cc" ?

Thanks.
Cancel
Vote up 0 Vote down

Cancel
+1 Sandeep Singh over 5 years ago in reply to chong_9

Let me explain in more detail with considering 256 MAC config.

256 MAC config means that can do 256 8x8 multiplications per cc. A MAC counts as two operations (mul+add).

Suppose you have a network of 8 bit IFM. for e.g

INPUT : 10 * 10 * 3

Convolution : 128*3*3*3 (3x3x3 (HxWxD) kernel and 128 filter) (Same pad)

Output shape: 8*8*128

MAC = 8*8*128*3*3*3 = ~221184 MACs

Now, 256 8*8 MAC can happen in one clock cycle. So, you would need 221184/256 = ~864 cc for computing these MAC's.

Now say if your IFM is 16 bit then this means that we can do 128 16x8 multiplications per cc. which mean that now we need 1728 cc for these MAC operations.

You can compute things for other configs similarly.

Note that there are other parameter like DPUs per configs and to get this maximum performance, we need to fill all DPUs inside MAC.Generally the width, height and depth should be a multiple of the microblock size which is 2x2x8 for Z-256. You can refer Arm U55 TRM for details.
Cancel
Vote up +2 Vote down

Cancel