This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Incorrect compile for convert float to short/int16

Hi,
the following code for STM32F4 (Cortex-M4):

float fZero= 0.;
float fInfinity;
short sTestPlus, sTestNeg;
int main( void){
  fInfinity= 1.f/fZero;
  sTestPlus= fInfinity;
  fInfinity= -1.f/fZero;
  sTestNeg= fInfinity;
  while(1);
}

should result in the values sTestPlus= 0x7FFF and sTestNeg= 0x8000. Instead it produces 0xFFFF and 0x0000.

The reason is, that the compiler uses the signed 32bit convert (leading to the correct result 0x7FFFFFFF and 0x80000000), and then just cuts out the lower short, which is really poor.

As I understand, it is no problem with Cortex M4 to do float to signed16 using the VCVT command in a more sophisticated way (specifying 16bit).

Is there a way to get this done?

Parents
  • The infinite value is not the crucial thing here - perhaps I was a bit to radical in my first code snippet - very sorry.

    You will get the same illegal behaviour of float to int16 (of course float to int - thank you for pointing this out), if you take any number above the int16 range. You could e. g. write the following:

    float f1= 65537.;
    float f2= -65538.;
    short s1, s2;
    int main( void){
      s1= f1;
      s2= f2;
      while( 1){}
    }
    

    Now s1 ends up as 0x0001 (because 65537= 0x10001) and s2 as 0xFFFE (because -65538= 0xFFFEFFFE).

    This is neither correct nor makes much sense.

    I would possibly accept such behaviour, if the processor would have no intrinsic command to convert float to int16, but in this case of Cortex M4 this is clearly not acceptable (from my point of view ...). (as there is a nice command to convert float to int16).

Reply
  • The infinite value is not the crucial thing here - perhaps I was a bit to radical in my first code snippet - very sorry.

    You will get the same illegal behaviour of float to int16 (of course float to int - thank you for pointing this out), if you take any number above the int16 range. You could e. g. write the following:

    float f1= 65537.;
    float f2= -65538.;
    short s1, s2;
    int main( void){
      s1= f1;
      s2= f2;
      while( 1){}
    }
    

    Now s1 ends up as 0x0001 (because 65537= 0x10001) and s2 as 0xFFFE (because -65538= 0xFFFEFFFE).

    This is neither correct nor makes much sense.

    I would possibly accept such behaviour, if the processor would have no intrinsic command to convert float to int16, but in this case of Cortex M4 this is clearly not acceptable (from my point of view ...). (as there is a nice command to convert float to int16).

Children
  • You still talk in terms of "illegal behaviour" for something the language standard specifically calls "undefined". Your personal views can't just be updated into hard laws for the compiler vendors to follow.

  • I would possibly accept such behaviour

    Do you really have a choice?

    this is clearly not acceptable (from my point of view ...)

    As you may have noticed, nobody on this thread agrees with you on this one.

    (as there is a nice command to convert float to int16)

    You mean CPU instruction, I presume. I looked it up: there is no instruction to convert float to int16. There is one to convert float to int32, however.
    By the way, you may be interested in the way programmers from ARM perform this task:

    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
    


    This is taken from CMSIS DSP library.

  • The infinite value is not the crucial thing here

    What gave you the impression that anybody thought it was?

    This is neither correct nor makes much sense.

    For what feels like the 100th time: you're wrong. It is correct, because your code is causing undefined behaviour. Which means that every possible outcome is correct. It's impossible to be incorrect here.

    The thing that makes no sense here is your insistence to rate your personal opinion above everything else, including the definition of the programming language. And this is not the first time you behaved like that. Are you really that incapable of gathering insight from earlier experience?

  • Hi Mike,
    thanks for the hint with the __ssat/__usat intrinsic - this comes very near to what I want:

    s1= __ssat(f1, 16);
    s2= __ssat(f2, 16);
    

    delivers the correct results:

    This is the disassembly with the __ssat macro:

      VLDR          s0,[r0,#0x04]
      VCVT.S32.F32  s0,s0
      VMOV          r1,s0
      SSAT          r1,#16,r1
      STRH          r1,[r0,#0x00]
    

    This would be the optimum disassembly - using the VCVT command with S16:

      VLDR          s0,[r0,#0x04]
      VCVT.S16.F32  s0,s0
      VMOV          r1,s0
      STRH          r1,[r0,#0x00]
    

    So the __ssat macro creates one assembly line more than the optimum - this is really ok for me (as I anyway use it only one time in my time-critical loop - just at the end of the float calculation).

    You write, that there should be no direct CPU command to convert from float to int16/short, but this is not correct to my understanding of Cortex M4 - see my last email. It should be perfectly possible to do it all in one VCVT command (see VCVT in the "optimum" code snippet above) - so it would be really nice if Keil would present an intrinsic for this very powerful VCVT command (... or - this would be super-great - the possibility to do use inline assembly for Cortex M4 ...).

    (further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/... - it is a bit disturbing/ "a bad surprise" if this works without problems for int32, but then you run into problems as soon as you use int8 or int16).

  • "(further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/... - it is a bit disturbing/ "a bad surprise" if this works without problems for int32, but then you run into problems as soon as you use int8 or int16)."

    And how many thousands - or tens of thousands - or even hundreds of thousands (dare I say million?) of programmers have stumbled on this:

    uint8_t ii;
    
    for (ii = 0; ii < 256; ii++) {
        ...
    }
    

    Yes, people will always suffer "a bad surprise" when they make use of char or short variables together with numbers larger that what fits in that numeric range, and they make specific expectations about the outcome (or just totally ignores that there can even be an issue).

    When mixing variables with different numeric range, it is you, the developer, who have to take the consequences of assuming what the outcome will be when the smaller variable is too small in relation to the numeric value of the larger variable.

    There are quite a number of paragraphs in the language standard discussing mixing of variable types of different size, mixing of signed/unsigned etc. Have you read the language standard? Did you understand it? Did you realize that the language standard forms a contract? Do you think end users of a programming language are in the position to renegotiate a contract? Because in the end, your request is that Keil should add a secondary contract with additional promises outside of what is covered by the language standard. And you formulate it as if it is a direct error of Keil to not have this additional contract available, and already signed.

  • This would be the optimum disassembly - using the VCVT command with S16 ... VCVT.S16.F32 S0, S0

    The manual says there is no such instruction. See for yourself:

    infocenter.arm.com/.../index.jsp

    However, the manual does say that there is an instruction to convert from F32 to fixed-point S16:

    infocenter.arm.com/.../index.jsp

    Perhaps, this instruction could be used to convert to an S16 integer?

    VCVT.S16.F32 S0, S0, #0
    

    I'm not sure since I haven't done much fixed-point math recently.

  • Yes, I assume the 2nd version of VCVT should work.

    (But I did not test in assembly myself).

  • further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/...

    So how many more times will you have to be told that that is no problem, because there is no wrong conversion in those cases you tested, before you finally get it?

    The only wrong thing there is your expectation about what that code should do.