According to several blog posts, the Mali GPU uses 16x16 tiles for its tile-based rendering. I was wondering on how many tiles one shader core can actually keep in on-chip memory in case of multiple render targets. Presuming that the GL driver of the GPU states that the maximum number of drawbuffers and color attachments for an FBO is 4, can I safely assume that the shader core is actually working on 4 16x16 tiles that fit into on-chip RAM or is the access serialized by masking the writes and execute the shader several times ?
For the more recent GPUs (anything since Mali-T760) it is more about the total bytes per tile than the number of pixels per tile. We recommend aiming for a maximum of 128-bits per pixel of storage to get the best tile pipelining. That 128-bits could be made up of MRT surfaces, wider color formats, or multi-sampling. More than 128-bits per pixel will work, but it starts to cause some slow downs and loss of efficiency. > or is the access serialized by masking the writes and execute the shader several times That would be pretty horrific for performance; the biggest benefit of MRT compared to individual passes is the ability to share triangle setup, shader data fetches, and overlapping computation.
It is a dedicated tile buffer memory, but the number of pixels is variable depending on the pixel storage requirements so there isn ot a single fixed number of in-flight tiles it can contain. If you are within the 128-bits/pixel recommendation then there are "enough" tiles that the shader core stays busy.