This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Expected Increase in Throughput for Int8 vs FP32 Multiplication

FabianSchuetze over 1 year ago

What increase in throughput can I expect on my device from changing a sequence of

```

fmla v1.4s, v1.4s, v1.4

```

mla v1.16b, v1.16b, v1.16b

```

My device consist of X3, A715 and A510 processors.

In profiling peakflops I got a ~2x increase in throughput. I would have expected an 4x increase.

Is there any matrix multiplication related instruction on arm in which I can expect a 4x increase in throughput by using int8 data types (possibly widening accumulator)?

Top replies

Parents

0 Zhifei Yang over 1 year ago in reply to FabianSchuetze
Great to hear that you have findings for the throughput increase.

I don't have any S9 tablet datasheet at hand. There is a generic method to check the Arm CPU processor type.

In Android or Linux-like OS, you can run this command " cat /proc/cpuinfo". Here is one example for you.

Please check the CPU part number. After you know the CPU type of each CPU id, you can try to connect it to the Socket ID.

Cortex-X3 part number is 0xD4E.

Cortex-A715 part number is 0xD4D.

Cortex-A510 part number is 0xD46.

<quote>

# cat /proc/cpuinfo
processor : 0
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 1
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

</quote>
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Zhifei Yang over 1 year ago in reply to FabianSchuetze
Great to hear that you have findings for the throughput increase.

I don't have any S9 tablet datasheet at hand. There is a generic method to check the Arm CPU processor type.

In Android or Linux-like OS, you can run this command " cat /proc/cpuinfo". Here is one example for you.

Please check the CPU part number. After you know the CPU type of each CPU id, you can try to connect it to the Socket ID.

Cortex-X3 part number is 0xD4E.

Cortex-A715 part number is 0xD4D.

Cortex-A510 part number is 0xD46.

<quote>

# cat /proc/cpuinfo
processor : 0
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 1
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

</quote>
Cancel
Vote up 0 Vote down

Cancel

Children

No data