This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Multiple render targets and tilebuffer

According to several blog posts, the Mali GPU uses 16x16 tiles for its tile-based rendering. I was wondering on how many tiles one shader core can actually keep in on-chip memory in case of multiple render targets. Presuming that the GL driver of the GPU states that the maximum number of drawbuffers and color attachments for an FBO is 4, can I safely assume that the shader core is actually working on 4 16x16 tiles that fit into on-chip RAM or is the access serialized by masking the writes and execute the shader several times ?

Parents
  • For the more recent GPUs (anything since Mali-T760) it is more about the total bytes per tile than the number of pixels per tile. We recommend aiming for a maximum of 128-bits per pixel of storage to get the best tile pipelining.

    That 128-bits could be made up of MRT surfaces, wider color formats, or multi-sampling. More than 128-bits per pixel will work, but it starts to cause some slow downs and loss of efficiency.

    > or is the access serialized by masking the writes and execute the shader several times

    That would be pretty horrific for performance; the biggest benefit of MRT compared to individual passes is the ability to share triangle setup, shader data fetches, and overlapping computation.

Reply
  • For the more recent GPUs (anything since Mali-T760) it is more about the total bytes per tile than the number of pixels per tile. We recommend aiming for a maximum of 128-bits per pixel of storage to get the best tile pipelining.

    That 128-bits could be made up of MRT surfaces, wider color formats, or multi-sampling. More than 128-bits per pixel will work, but it starts to cause some slow downs and loss of efficiency.

    > or is the access serialized by masking the writes and execute the shader several times

    That would be pretty horrific for performance; the biggest benefit of MRT compared to individual passes is the ability to share triangle setup, shader data fetches, and overlapping computation.

Children