Load/Store Unit and 16-bit arithmetic from mali oc are not as expected

I profiled my shaders, but i found the Load/Store Unit value is extremely large. Therefore, I tried to simplify the shader and run some tests.

Environment:  Mali-G715,  glslc in the latest Vulkan SDK .
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#version 450
#define LENGTH 512 // 1024
layout(set = 0, binding = 0, std140) mediump uniform ubo0 {
mediump vec4 data[LENGTH];
} _ubo0;
layout(set = 0, binding = 1, std140) mediump uniform ubo1 {
mediump vec4 data[LENGTH];
} _ubo1;
layout(location = 0) out mediump vec4 outColor;
void main()
{
outColor = vec4(0);
for(int i = 0; i < LENGTH; i++)
{
outColor+= _ubo0.data[i];
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The profile result:
Uniform Count Per Unifrom Length LS 16-bit arithmetic Uniform Register Function
1 512 0.00 N/A 2 (3% used) main
1 1024 2.00 0.0 2 (3% used) main
2 512 0.0 N/A 2 (3% used) main
2 1024 4.00 0.0 2 (3% used) main
2 512 4.00 0.0 2 (3% used) confusedMain

The results seem to indicate that the LS value is related to the size of the UBO. However, when I tried the following code, the results confused me.

So I have some questions about the result above.

Q1: Does the size of the UBO really affect LS? Could it be that there is a special cache inside the chip, but due to the limited cache size, a large UBO increases LS?

Q2: Why different UBO size have different 16-bit arithmetic result?

Q3: Why did different calculation orders produce different results in the example above?

Q4: Why does the UBO size affect 16-bit arithmetic?

0