This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

On Mali-G76, it will take 2 cycles for 2-D bi-linear filtering per 4 sample quad. Does it mean that read_imagef() and write_imagef() have the same cycles in doing 2-D bi-linear filtering?

Hi,

When I read the materials about G76, especially about its texture unit. I encountered a question about its performance in doing 2-D Bi-linear interpolation. It was said that for Mali-G76 the best case performance (bi-linear filtered samples) is 0.5 cycles per sample. Some other related descriptions were given as well, like that this counter increments for every texture filtering issue cycle, that some instructions take more than one cycle due to multi-cycle data access and filtering operations, and that the costs per 4 sample quad are:(i) 2D bilinear filtering takes two cycles, (ii)2D trilinear filtering takes four cycles; (iii) 3D bilinear filtering takes four cycles, and (iv) 3D trilinear filtering takes eight cycles. So my question is whether the OpenCL API read_imagef() and write_imagef() has the same performance in using texture unit on 2-D bi-linear interpolation.

Thanks.

Top replies

Parents

0 xwentian over 3 years ago in reply to Peter Harris

Hi, Pete

Thanks for your reply. I parsed the way of using OpenCL to implement the bi-linear interpolation and its maximal performance. As for your suggestion, I think you recommend that the input image should be kept in the texture memory, a read-only on-chip memory space to the GPU threads, rather than the common way of puting the input image data in the local memory. Am I right in thinking your recommendation so?
Cancel
Up 0 Down

Cancel

Reply

0 xwentian over 3 years ago in reply to Peter Harris

Hi, Pete

Thanks for your reply. I parsed the way of using OpenCL to implement the bi-linear interpolation and its maximal performance. As for your suggestion, I think you recommend that the input image should be kept in the texture memory, a read-only on-chip memory space to the GPU threads, rather than the common way of puting the input image data in the local memory. Am I right in thinking your recommendation so?
Cancel
Up 0 Down

Cancel

Children

0 xwentian over 3 years ago in reply to xwentian

BTW. The input data to be interpolated is a little bit larger than 3840x2160 in size and the result is a 4K NV21 image. This interpolation is a step in an image stablization application on Android.
Cancel
Up 0 Down

Cancel
0 Peter Harris over 3 years ago in reply to xwentian

Yes, don't copy read-only data into local memory - there is no dedicated local memory in the Mali hardware, so this just wastes cycles.
Cancel
Up +1 Down

Cancel