Mali's texture is poor than buffer, why?

I tested the performance of mali's texture(cl_image), I found it is poor than buffer(cl_mem).

my GPU is mali G76

I think the texture should be better than buffer, such as: bilinear.

but, my test tell me G76's texture is poor than buffer about 10%-20%. my test format is RGBA

I don't know why?

is there anyone would like to tell me the secret?

or, is there any standard benchmark program?

Parents
  • 1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low,

    Your bytes per-access value is high, so inline with expectations for a downscale.

    . I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

    Correct, you can't control it. But buffers and textures may have a different memory layouts, and so have different access patterns.

        1). how to caculate them if my program is OpenCL?

    You can't. 

    You will get at least one Job per compute dispatch, but may get more as the driver generates small jobs for some management activities. 

    Tasks are somewhat meaningless to an application developer. For compute workloads a task is some multiple of the workgroup size, but the exact scaling is chosen by the driver and depends on the system configuration. 

      2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

    As above, will get get at least one Job per fragment workload, but may get more. 

Reply
  • 1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low,

    Your bytes per-access value is high, so inline with expectations for a downscale.

    . I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

    Correct, you can't control it. But buffers and textures may have a different memory layouts, and so have different access patterns.

        1). how to caculate them if my program is OpenCL?

    You can't. 

    You will get at least one Job per compute dispatch, but may get more as the driver generates small jobs for some management activities. 

    Tasks are somewhat meaningless to an application developer. For compute workloads a task is some multiple of the workgroup size, but the exact scaling is chosen by the driver and depends on the system configuration. 

      2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

    As above, will get get at least one Job per fragment workload, but may get more. 

Children
No data