Does anyone know how many vector registers does mali T860 gpu have? I looked up mali gpu reference (ARM® Mali GPU OpenCL Version 3.3 Developer Guide), not mentioning the number of vector registers.
IIRC, T860 has 1024 128-bit registers per shader core.
May I ask where do you find this detailed info
Besides, I wanna ask how many vector registers can each thread use?
This blog covers register allocation for the Mali-T860 core: https://community.arm.com/graphics/b/blog/posts/arm-mali-compute-architecture-fundamentals
Summary: you get either 4, 8, or 16 registers per thread, but increasing register count will give you a proportional reduction in thread count.
In general try to keep programs at 4 or 8 registers; 16 is possible and may not cost too much if your kernel is dominated to a large degree by arithmetic operations, but comes with a sizable performance penalty if it isn't.
HTH, Pete