This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali's texture is poor than buffer, why?

I tested the performance of mali's texture(cl_image), I found it is poor than buffer(cl_mem).

my GPU is mali G76

I think the texture should be better than buffer, such as: bilinear.

but, my test tell me G76's texture is poor than buffer about 10%-20%. my test format is RGBA

I don't know why?

is there anyone would like to tell me the secret?

or, is there any standard benchmark program?

Parents
  • Hi Peter, thanks for your reply at first

    I think I cannot understand your analyzation correctly:

    1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low, it is abnormal. is it right?

        furthermore, what are the normal number?

    2. I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

    3. I cannot understand these two performance couters: Non-fragment tasks(unit: tasks) and Non-fragment jobs(unit: jobs) 

    buf_style:
    Non-fragment tasks(unit: tasks): 562500
    Non-fragment jobs(unit: jobs): 300
    
    texture_style:
    Non-fragment tasks(unit: tasks): 562500
    Non-fragment jobs(unit: jobs): 400

        1). how to caculate them if my program is OpenCL?

        2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

    buf_style:
    Non-fragment jobs(unit: jobs): 300
    
    texture_style:
    Non-fragment jobs(unit: jobs): 400

    I run the both tests for 100 times, so, the reports are 300 and 400. why not 100?

     

        

Reply
  • Hi Peter, thanks for your reply at first

    I think I cannot understand your analyzation correctly:

    1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low, it is abnormal. is it right?

        furthermore, what are the normal number?

    2. I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

    3. I cannot understand these two performance couters: Non-fragment tasks(unit: tasks) and Non-fragment jobs(unit: jobs) 

    buf_style:
    Non-fragment tasks(unit: tasks): 562500
    Non-fragment jobs(unit: jobs): 300
    
    texture_style:
    Non-fragment tasks(unit: tasks): 562500
    Non-fragment jobs(unit: jobs): 400

        1). how to caculate them if my program is OpenCL?

        2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

    buf_style:
    Non-fragment jobs(unit: jobs): 300
    
    texture_style:
    Non-fragment jobs(unit: jobs): 400

    I run the both tests for 100 times, so, the reports are 300 and 400. why not 100?

     

        

Children
  • 1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low,

    Your bytes per-access value is high, so inline with expectations for a downscale.

    . I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

    Correct, you can't control it. But buffers and textures may have a different memory layouts, and so have different access patterns.

        1). how to caculate them if my program is OpenCL?

    You can't. 

    You will get at least one Job per compute dispatch, but may get more as the driver generates small jobs for some management activities. 

    Tasks are somewhat meaningless to an application developer. For compute workloads a task is some multiple of the workgroup size, but the exact scaling is chosen by the driver and depends on the system configuration. 

      2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

    As above, will get get at least one Job per fragment workload, but may get more.