This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Expected Increase in Throughput for Int8 vs FP32 Multiplication

What increase in throughput can I expect on my device from changing a sequence of

```

fmla  v1.4s, v1.4s, v1.4

```

to

```

mla  v1.16b, v1.16b, v1.16b  

```

?

My device consist of X3, A715 and A510 processors.

In profiling peakflops I got a ~2x increase in throughput. I would have expected an 4x increase.

Is there any matrix multiplication related instruction on arm in which I can expect a 4x increase in throughput by using int8 data types (possibly widening accumulator)?

Parents
  • Great to hear that you have findings for the throughput increase.

    I don't have any S9 tablet datasheet at hand.  There is a generic method to check the Arm CPU processor type.

    In Android or Linux-like OS, you can run this command " cat /proc/cpuinfo".  Here is one example for you.    

    Please check the CPU part number. After you know the CPU type of each CPU id, you can try to connect it to the Socket ID.

    • Cortex-X3 part number is 0xD4E.   
    • Cortex-A715 part number is 0xD4D.
    • Cortex-A510 part number is 0xD46.

    <quote>

    # cat /proc/cpuinfo
    processor : 0
    BogoMIPS : 26.00
    Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
    CPU implementer : 0x41
    CPU architecture: 8
    CPU variant : 0x0
    CPU part : 0xd46
    CPU revision : 2

    processor : 1
    BogoMIPS : 26.00
    Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
    CPU implementer : 0x41
    CPU architecture: 8
    CPU variant : 0x0
    CPU part : 0xd46
    CPU revision : 2

    </quote>

Reply
  • Great to hear that you have findings for the throughput increase.

    I don't have any S9 tablet datasheet at hand.  There is a generic method to check the Arm CPU processor type.

    In Android or Linux-like OS, you can run this command " cat /proc/cpuinfo".  Here is one example for you.    

    Please check the CPU part number. After you know the CPU type of each CPU id, you can try to connect it to the Socket ID.

    • Cortex-X3 part number is 0xD4E.   
    • Cortex-A715 part number is 0xD4D.
    • Cortex-A510 part number is 0xD46.

    <quote>

    # cat /proc/cpuinfo
    processor : 0
    BogoMIPS : 26.00
    Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
    CPU implementer : 0x41
    CPU architecture: 8
    CPU variant : 0x0
    CPU part : 0xd46
    CPU revision : 2

    processor : 1
    BogoMIPS : 26.00
    Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 bti
    CPU implementer : 0x41
    CPU architecture: 8
    CPU variant : 0x0
    CPU part : 0xd46
    CPU revision : 2

    </quote>

Children
No data