This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Incorrect compile for convert float to short/int16

Hi,
the following code for STM32F4 (Cortex-M4):

float fZero= 0.;
float fInfinity;
short sTestPlus, sTestNeg;
int main( void){
  fInfinity= 1.f/fZero;
  sTestPlus= fInfinity;
  fInfinity= -1.f/fZero;
  sTestNeg= fInfinity;
  while(1);
}

should result in the values sTestPlus= 0x7FFF and sTestNeg= 0x8000. Instead it produces 0xFFFF and 0x0000.

The reason is, that the compiler uses the signed 32bit convert (leading to the correct result 0x7FFFFFFF and 0x80000000), and then just cuts out the lower short, which is really poor.

As I understand, it is no problem with Cortex M4 to do float to signed16 using the VCVT command in a more sophisticated way (specifying 16bit).

Is there a way to get this done?

Parents
  • I would possibly accept such behaviour

    Do you really have a choice?

    this is clearly not acceptable (from my point of view ...)

    As you may have noticed, nobody on this thread agrees with you on this one.

    (as there is a nice command to convert float to int16)

    You mean CPU instruction, I presume. I looked it up: there is no instruction to convert float to int16. There is one to convert float to int32, however.
    By the way, you may be interested in the way programmers from ARM perform this task:

    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
    


    This is taken from CMSIS DSP library.

Reply
  • I would possibly accept such behaviour

    Do you really have a choice?

    this is clearly not acceptable (from my point of view ...)

    As you may have noticed, nobody on this thread agrees with you on this one.

    (as there is a nice command to convert float to int16)

    You mean CPU instruction, I presume. I looked it up: there is no instruction to convert float to int16. There is one to convert float to int32, however.
    By the way, you may be interested in the way programmers from ARM perform this task:

    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
    


    This is taken from CMSIS DSP library.

Children
  • Hi Mike,
    thanks for the hint with the __ssat/__usat intrinsic - this comes very near to what I want:

    s1= __ssat(f1, 16);
    s2= __ssat(f2, 16);
    

    delivers the correct results:

    This is the disassembly with the __ssat macro:

      VLDR          s0,[r0,#0x04]
      VCVT.S32.F32  s0,s0
      VMOV          r1,s0
      SSAT          r1,#16,r1
      STRH          r1,[r0,#0x00]
    

    This would be the optimum disassembly - using the VCVT command with S16:

      VLDR          s0,[r0,#0x04]
      VCVT.S16.F32  s0,s0
      VMOV          r1,s0
      STRH          r1,[r0,#0x00]
    

    So the __ssat macro creates one assembly line more than the optimum - this is really ok for me (as I anyway use it only one time in my time-critical loop - just at the end of the float calculation).

    You write, that there should be no direct CPU command to convert from float to int16/short, but this is not correct to my understanding of Cortex M4 - see my last email. It should be perfectly possible to do it all in one VCVT command (see VCVT in the "optimum" code snippet above) - so it would be really nice if Keil would present an intrinsic for this very powerful VCVT command (... or - this would be super-great - the possibility to do use inline assembly for Cortex M4 ...).

    (further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/... - it is a bit disturbing/ "a bad surprise" if this works without problems for int32, but then you run into problems as soon as you use int8 or int16).

  • "(further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/... - it is a bit disturbing/ "a bad surprise" if this works without problems for int32, but then you run into problems as soon as you use int8 or int16)."

    And how many thousands - or tens of thousands - or even hundreds of thousands (dare I say million?) of programmers have stumbled on this:

    uint8_t ii;
    
    for (ii = 0; ii < 256; ii++) {
        ...
    }
    

    Yes, people will always suffer "a bad surprise" when they make use of char or short variables together with numbers larger that what fits in that numeric range, and they make specific expectations about the outcome (or just totally ignores that there can even be an issue).

    When mixing variables with different numeric range, it is you, the developer, who have to take the consequences of assuming what the outcome will be when the smaller variable is too small in relation to the numeric value of the larger variable.

    There are quite a number of paragraphs in the language standard discussing mixing of variable types of different size, mixing of signed/unsigned etc. Have you read the language standard? Did you understand it? Did you realize that the language standard forms a contract? Do you think end users of a programming language are in the position to renegotiate a contract? Because in the end, your request is that Keil should add a secondary contract with additional promises outside of what is covered by the language standard. And you formulate it as if it is a direct error of Keil to not have this additional contract available, and already signed.

  • This would be the optimum disassembly - using the VCVT command with S16 ... VCVT.S16.F32 S0, S0

    The manual says there is no such instruction. See for yourself:

    infocenter.arm.com/.../index.jsp

    However, the manual does say that there is an instruction to convert from F32 to fixed-point S16:

    infocenter.arm.com/.../index.jsp

    Perhaps, this instruction could be used to convert to an S16 integer?

    VCVT.S16.F32 S0, S0, #0
    

    I'm not sure since I haven't done much fixed-point math recently.

  • Yes, I assume the 2nd version of VCVT should work.

    (But I did not test in assembly myself).

  • further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/...

    So how many more times will you have to be told that that is no problem, because there is no wrong conversion in those cases you tested, before you finally get it?

    The only wrong thing there is your expectation about what that code should do.