Does Mali suuport 8bit int vector operation to workaround overflow issue like scalar operation?
Such as..
I tested with G72.
In scalar operation,
--------------------------------
uchar a = 255;
uchar b = 255;
Int c = a + b;
It results 510 in c.
But in case of vector,
uchar4 a={255,255,255,255}
uchar4 b={255,255,255,255}
int4 c = a + b;
It prints wrong answer..
So my question is
1. Scalar operation uses general purpose register and it is 32bit register. That's why scalar operation results correctly. Am i right?
2. Why does Vector operation not support auto cast like scalar operation ? Does it not support general purpose register like in scalar operation?
3. I heard G52 and it supports int8 operation. Does it mean G52 supports 8bit vector register which resolve second case above?
Hello Unarmed guy As it's been a few days with no responses here, I'm moving this across to our Graphics & Multimedia forum, where there is more discussion of Mali.Many thanks,Georgia
Unarmed guy said: Scalar operation uses general purpose register and it is 32bit register. That's why scalar operation results correctly. Am i right?
How the hardware works is irrelevant really; this is just how the language specification is specified to behave.
Just like "normal" C programming, integer scalar types that are smaller than an int are promoted up to an int when an operation is performed on them. (Search for "integer promotion" in the OpenCL C spec).
int
uchar a = 255; uchar b = 255; int c = a + b;
... is effectively:
uchar a = 255; uchar b = 255; int c = ((int)a) + ((int)b);
Unarmed guy said:2. Why does Vector operation not support auto cast like scalar operation ? Does it not support general purpose register like in scalar operation?
... because the specification says so. See section "6.2.1 Implicit Conversions"; it explicitly states:
"Implicit conversions between built-in vector data types are disallowed".
To be honest, I'm actually surprised the code compiles at all - the conversion from a uchar4 sum to an int4 result is an implicit conversion so I would expect that to have generated a compile error.
HTH, Pete
To answer your third question about Mali-G52, then it adds a dedicated vector instruction for 8-bit integer dot product which effectively provides a cross-lane FMA for machine learning kernels. The instruction behaves as if all of the multiplication intermediates are 32-bits wide, so there is no clipping of the result.
See the following OpenCL extension for usage information in OpenCL kernels:
https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_integer_dot_product.txt
Cheers, Pete