Hi all,
recently, I am trying to implement neural networks on one of my Cortex-M devices. However, I wasn't know what "int_width" in arm_nn_activations_direct_q7 is about. The definition in the document is: "bit-width of the integer part, assume to be smaller than 3" but it is still unclear to me. Please give me some example about it as I am trying to transfer neural networks from other DL framework like tensorflow and pytorch. Many thanks!!
Thanks for the explanation! Now, I understand how the Q-format actually is.
Based on the note written in the "arm_nntables.c", it says "input is 3.x format, i.e, range of [-8, 8)". So I guess we need to right shift to the input to q3.4 format instead of q4.3? (ARM uses the TI's Q format)
TI's q3.4 is the same as AMD's q4.4. For int_width == 3, no shifting is necessary. For the example given in the previous post, the conversion is from q7 (in TI notation) to q3.4 (in TI notation). So yes, the input must be converted to q3.4 (in TI notation) before it can be used to index into the table.
So in the quantization point of view, the int_width is a scale factor that remaps the float values into [-8, 8), I suppose?
As I undertand it, q7_t is a single data-type being used as one of q7, q1.6, q2.5, or q3.4 (all in TI notation). By default, q7_t means q7 (in TI notation). To distinguish between different uses of q7_t, they introduced another variable int_width, so q7_t with int_width==3 is q3.4 (TI), q7_t with int_width==0 is q7 (TI), etc. They could have created different types q7_t, q1_6_t, q2_5_t, q3_4_t, etc to make the Q format explicit, i.e. the API could have been designed a bit better.
So how could we know the output of the previous layer is q1.6, q2.5 or q3.4 according to the library?
I am unable answer that question. The caller of arm_nn_activations_direct_q7 must know the format/range of the input and so must appopriately select the int_width value.