This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

On Mali-G76, it will take 2 cycles for 2-D bi-linear filtering per 4 sample quad. Does it mean that read_imagef() and write_imagef() have the same cycles in doing 2-D bi-linear filtering?

Hi,

When I read the materials about G76, especially about its texture unit. I encountered a question about its performance in doing 2-D Bi-linear interpolation. It was said that for Mali-G76 the best case performance (bi-linear filtered samples) is 0.5 cycles per sample. Some other related descriptions were given as well, like that this counter increments for every texture filtering issue cycle, that some instructions take more than one cycle due to multi-cycle data access and filtering operations, and that  the costs per 4 sample quad are:(i) 2D bilinear filtering takes two cycles, (ii)2D trilinear filtering takes four cycles; (iii) 3D bilinear filtering takes four cycles, and (iv) 3D trilinear filtering takes eight cycles. So my question is whether the OpenCL API read_imagef() and write_imagef() has the same performance in using texture unit on 2-D bi-linear interpolation.

Thanks.

Parents
  • Hi xwentian, 

    If you use imageLoad/Store to access the data, then yes, that sounds correct (albeit the bilinear filtering would be done manually by the application in shader code in this case).

    It's worth noting that the imageLoad/Store path is slower than texture reads and framebuffer writes on newer Mali GPUs. Mali-G77 can load 4 bilinear texture() samples per clock, vs 1 imageLoad() sample per clock. For simple filtering, it is therefore recommended to use textures for input, and framebuffers for outputs when possible.

    HTH, 
    Pete

Reply
  • Hi xwentian, 

    If you use imageLoad/Store to access the data, then yes, that sounds correct (albeit the bilinear filtering would be done manually by the application in shader code in this case).

    It's worth noting that the imageLoad/Store path is slower than texture reads and framebuffer writes on newer Mali GPUs. Mali-G77 can load 4 bilinear texture() samples per clock, vs 1 imageLoad() sample per clock. For simple filtering, it is therefore recommended to use textures for input, and framebuffers for outputs when possible.

    HTH, 
    Pete

Children