This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali T760MP4 OpenCL performance issue

Hi :

I use RK3288 SoC and force the Mali T760MP4 work at 600Mhz. "clpeak" program from git hub is used for testing performance. "clpeak" always show mali works at 200Mhz not 600Mhz.

(1) OS is TinkerOS_Debian V1.8. It can download from dlcdnet.asus.com/.../20170417-tinker-board-linaro-stretch-alip-v1.8.zip

(2) clpleak can be downloaded from  https://github.com/krrishnarraj/clpeak

(3) linaro@linaro-alip:/proc$ cat /sys/class/misc/mali0/device/devfreq/ffa30000.gpu/cur_freq
600000000

I got the following results. It seem Mali T760 MP4 has very poor performance. What's wrong with Mali T760 MP4 ?

I also found that when Mali T760 is running. The linux api "clock_t clock(void);" always got wrong value, but  "gettimeofday()"  got correct time. Is it the reason why "clpeak" generate wrong performance report ?


-----------------------------------------------------------------

Platform: ARM Platform
  Device: Mali-T760
    Driver version  : 1.2 (Linux ARM)
    Compute units   : 4
    Clock frequency : 200 MHz

    Global memory bandwidth (GBPS)
      float   : 2.90
      float2  : 4.60
      float4  : 4.74
      float8  : 3.94
      float16 : 3.61

    Single-precision compute (GFLOPS)
      float   : 12.94
      float2  : 5.93
      float4  : 5.95
      float8  : 31.21
      float16 : 7.04

    half-precision compute (GFLOPS)
      half   : 2.89
      half2  : 6.14
      half4  : 14.32
      half8  : 13.91
      half16 : 18.97

      Double-precision compute (GFLOPS)
      double   : 1.66
      double2  : 1.55
      double4  : 15.70
      double8  : 15.46
      double16 : 15.26

    Integer compute (GIOPS)
      int   : 2.61
      int2  : 6.10
      int4  : 6.71
      int8  : 7.50
      int16 : 30.89

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 3.86
      enqueueReadBuffer          : 1.38
      enqueueMapBuffer(for read) : 1237.03
        memcpy from mapped ptr   : 1.37
      enqueueUnmap(after write)  : 2350.57
        memcpy to mapped ptr     : 1.34

    Kernel launch latency : 74.72 us

-----------------------------------------------------------------

Thank you

-Jack