Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.
We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.
Thank you for your understanding.
Hi there,
I am playing around with the Mali-G78 in a Pixel 6a and Streamline.
As a trial I want to copy the content of a 64MB buffer into another - here my shader code:
```#version 450
layout (local_size_x = 16, local_size_y = 1, local_size_z = 1) in;layout(constant_id = 0) const int kBufferSizeElements = (64*1024*1024)/4;layout(set = 0, binding = 0) buffer InputBuffer {uint input_buffer[kBufferSizeElements];};layout(set = 0, binding = 1) buffer OutputBuffer {uint output_buffer[kBufferSizeElements];};
void main() { output_buffer[gl_GlobalInvocationID.x] = input_buffer[gl_GlobalInvocationID.x];}
```
When I use Streamline to check the GPU counters, I observe the following:
Generally, can you explain how the memory should be accessed for best performance ?
And finally a short question on the datasheet: the "FP32 operations/cycle" is this per Arithmetic unit or per core ?
opm said: Is this documented / explained somewhere in more detail ?
Not aware of anything more detailed (but not sure there is much more detail to give either ;)