This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Texture Cache on Mali-G76

Hi,

I am trying to utilize a texture cache for non-texture data(binary tree structure) with OpenCL on Mali-G76

since L1 seems very slow for random tree data access.

In my understanding, data will be in Texture Cache with using OpenCL API clCreateImage.

So I force to put tree data structures into tan Image2D area allocated by the API.

And ithis approach significantly improves performance so far.

Currently I put the structures with raster scan manner in Image2D area.

In addition, Someone suggested me to try z-ordering for more TC performance

https://en.wikipedia.org/wiki/Z-order_curve

So my question is, is it possible to improve performance by putting the data structure with z-ordering manner?

I also heard that some GPUs support z-ordering by hardware, in that case, z-ordering by kernel code may be in vain or get worse.

I could not find any Mali document related to Texture Cache, so if anyone give any information about TC, it would be really appreciated.

Regards

Parents
  • Hi,

    It's hard to give a generic answer to that question as ultimately it's the access patterns in your application that will determine whether a different data layout would help performance.

    Mali GPUs have got support for something similar to Z-Order but this is not exposed via OpenCL at the moment. If you know the locality patterns of your accesses and how they map to thread locality, it is possible to use raster order from the HW's point of view (which is the only ordering you have access to anyway) but interleave your data in such a way that you get better locality when accessing with the expected pattern. This is a technique that is commonly used in kernels optimised for Mali GPUs.

    Out of curiosity, what is the algorithm/application you're implementing? If you're willing/able to share your code, we can take a look at the specifics but, otherwise, I'm afraid that's as much as I'll be able to say without additional information.

    Hope this helps.

    Regards,

    Kévin

Reply
  • Hi,

    It's hard to give a generic answer to that question as ultimately it's the access patterns in your application that will determine whether a different data layout would help performance.

    Mali GPUs have got support for something similar to Z-Order but this is not exposed via OpenCL at the moment. If you know the locality patterns of your accesses and how they map to thread locality, it is possible to use raster order from the HW's point of view (which is the only ordering you have access to anyway) but interleave your data in such a way that you get better locality when accessing with the expected pattern. This is a technique that is commonly used in kernels optimised for Mali GPUs.

    Out of curiosity, what is the algorithm/application you're implementing? If you're willing/able to share your code, we can take a look at the specifics but, otherwise, I'm afraid that's as much as I'll be able to say without additional information.

    Hope this helps.

    Regards,

    Kévin

Children
No data