This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Incorrect compile for convert float to short/int16

Hi,
the following code for STM32F4 (Cortex-M4):

float fZero= 0.;
float fInfinity;
short sTestPlus, sTestNeg;
int main( void){
  fInfinity= 1.f/fZero;
  sTestPlus= fInfinity;
  fInfinity= -1.f/fZero;
  sTestNeg= fInfinity;
  while(1);
}

should result in the values sTestPlus= 0x7FFF and sTestNeg= 0x8000. Instead it produces 0xFFFF and 0x0000.

The reason is, that the compiler uses the signed 32bit convert (leading to the correct result 0x7FFFFFFF and 0x80000000), and then just cuts out the lower short, which is really poor.

As I understand, it is no problem with Cortex M4 to do float to signed16 using the VCVT command in a more sophisticated way (specifying 16bit).

Is there a way to get this done?

Parents
  • For int32 the VCVT function used does exactly what I would expect:
    -inf gives 0x80000000
    and
    +inf gives 0x7FFFFFFF.

    This is a well defined behaviour, and also very useful. The VCVT command takes only 1 cycle. If together with every VCVT I would have to check for such infinity cases, I would spoil about 5-20 further cycles - this would blow up the code quite appreciably.

    Generally, including +-INF into the number range is a very big benefit if floating point calcualations are done. It was a major point in introduction of the IEEE floating point stardard to define a clear behaviour of such INF values in several situations. This then of course also requires a sound convert to fix point.

    The Cortex M4 Generic User Guide shows, that float to fix convert is handled by Cortex M4 very carefully - it has very careful options:

    VCVT{cond}.Td.F32 Sd, Sd, #fbits
    VCVT{cond}.F32.Td Sd, Sd, #fbits
    

    Td can be S16, U16, S32, U32. To my tests, the compiler currently uses only S32 and U32, but for short/int16 variables this is not very helpful - then you get illegal results as soon as the floating point value is above the number range - which of course easily can happen, especially for short/int16. Creating such illegal results from my point of view is a compiler error.

    Further there is this nice possibility of "fbits" - this additionally can be very useful to restrict the number range further. I did not find any possibility in the help file, how to use this command efficiently from the C compiler - as there is also no inline assembler available for STM32F4, I do not come further at this point. From my point of view, it would be very useful to create an intrinsic function for this powerful command.

    PS: Mike, I only use the division by zero to get these infinity values in the floating point registers - you could possibly also use other methods. (but I do not really understand your posts ...).

Reply
  • For int32 the VCVT function used does exactly what I would expect:
    -inf gives 0x80000000
    and
    +inf gives 0x7FFFFFFF.

    This is a well defined behaviour, and also very useful. The VCVT command takes only 1 cycle. If together with every VCVT I would have to check for such infinity cases, I would spoil about 5-20 further cycles - this would blow up the code quite appreciably.

    Generally, including +-INF into the number range is a very big benefit if floating point calcualations are done. It was a major point in introduction of the IEEE floating point stardard to define a clear behaviour of such INF values in several situations. This then of course also requires a sound convert to fix point.

    The Cortex M4 Generic User Guide shows, that float to fix convert is handled by Cortex M4 very carefully - it has very careful options:

    VCVT{cond}.Td.F32 Sd, Sd, #fbits
    VCVT{cond}.F32.Td Sd, Sd, #fbits
    

    Td can be S16, U16, S32, U32. To my tests, the compiler currently uses only S32 and U32, but for short/int16 variables this is not very helpful - then you get illegal results as soon as the floating point value is above the number range - which of course easily can happen, especially for short/int16. Creating such illegal results from my point of view is a compiler error.

    Further there is this nice possibility of "fbits" - this additionally can be very useful to restrict the number range further. I did not find any possibility in the help file, how to use this command efficiently from the C compiler - as there is also no inline assembler available for STM32F4, I do not come further at this point. From my point of view, it would be very useful to create an intrinsic function for this powerful command.

    PS: Mike, I only use the division by zero to get these infinity values in the floating point registers - you could possibly also use other methods. (but I do not really understand your posts ...).

Children
  • but I do not really understand your posts

    My point is that the behavior of your code isn't defined by the C standard. It's not defined by the compiler reference manual either, as far as I can see.
    You may have found some compiler features that can be used in your code. But since you are relying on undocumented behaviour, you'll have to verify that the undocumented features are still there every time you upgrade the compiler or use a different library (microlib?) or make other significant changes to you build environment.

  • But conversion from int to float is nothing mystical - why should this be not documented?

    At least - if I should be able to use Cortex M4 efficiently, it would be necessary that there is some inline/intrinsic possibility to use this VCVT command with its full power.

  • But conversion from int to float is nothing mystical - why should this be not documented?

    Well, I don't know. Why don't you ask Keil? It's their documentation, after all. Or maybe I didn't dig deep enough.

    At least - if I should be able to use Cortex M4 efficiently, it would be necessary that there is some inline/intrinsic possibility to use this VCVT command with its full power.

    Definitely. Sounds like a feature request that should be directed to Keil.

  • But conversion from int to float is nothing mystical

    It's also not not what we've been talking about here. That was float-to-int, not int-to-float.

    - why should this be not documented?

    Why should it be? Per the language definition your code causes undefined behaviour. That means the compiler itself can crash, or generate code that returns a different, random number every time, and still be 100% correct, without the need to document anything.

    Yes, ARM could have decided to define a particular behaviour in this cause, and maybe they did. In that case, and only then, they should have documented this decision. But the fact that nobody seems to have found any such documentation would seem to indicate that no such decision was made.

    Summary: this code is buggy. It relies on unwarranted assumptions. For the ARM compiler, these assumptions actually are incorrect, so the bug became noticeable.

  • This is a well defined behaviour

    So you think. But who, do you think, defined it? And who put them in the position to be defininig such things?

    then you get illegal results as soon as the floating point value is above the number range

    You've been told several times already that this assessment is incorrect. No, those results are not illegal. They're not even wrong. They're well inside the allowed range of results --- because by definition of the C programming langauge, that range is infinite. The only thing wrong here is your belief that you know what the result should be.

  • The infinite value is not the crucial thing here - perhaps I was a bit to radical in my first code snippet - very sorry.

    You will get the same illegal behaviour of float to int16 (of course float to int - thank you for pointing this out), if you take any number above the int16 range. You could e. g. write the following:

    float f1= 65537.;
    float f2= -65538.;
    short s1, s2;
    int main( void){
      s1= f1;
      s2= f2;
      while( 1){}
    }
    

    Now s1 ends up as 0x0001 (because 65537= 0x10001) and s2 as 0xFFFE (because -65538= 0xFFFEFFFE).

    This is neither correct nor makes much sense.

    I would possibly accept such behaviour, if the processor would have no intrinsic command to convert float to int16, but in this case of Cortex M4 this is clearly not acceptable (from my point of view ...). (as there is a nice command to convert float to int16).

  • You still talk in terms of "illegal behaviour" for something the language standard specifically calls "undefined". Your personal views can't just be updated into hard laws for the compiler vendors to follow.

  • I would possibly accept such behaviour

    Do you really have a choice?

    this is clearly not acceptable (from my point of view ...)

    As you may have noticed, nobody on this thread agrees with you on this one.

    (as there is a nice command to convert float to int16)

    You mean CPU instruction, I presume. I looked it up: there is no instruction to convert float to int16. There is one to convert float to int32, however.
    By the way, you may be interested in the way programmers from ARM perform this task:

    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
    


    This is taken from CMSIS DSP library.

  • The infinite value is not the crucial thing here

    What gave you the impression that anybody thought it was?

    This is neither correct nor makes much sense.

    For what feels like the 100th time: you're wrong. It is correct, because your code is causing undefined behaviour. Which means that every possible outcome is correct. It's impossible to be incorrect here.

    The thing that makes no sense here is your insistence to rate your personal opinion above everything else, including the definition of the programming language. And this is not the first time you behaved like that. Are you really that incapable of gathering insight from earlier experience?

  • Hi Mike,
    thanks for the hint with the __ssat/__usat intrinsic - this comes very near to what I want:

    s1= __ssat(f1, 16);
    s2= __ssat(f2, 16);
    

    delivers the correct results:

    This is the disassembly with the __ssat macro:

      VLDR          s0,[r0,#0x04]
      VCVT.S32.F32  s0,s0
      VMOV          r1,s0
      SSAT          r1,#16,r1
      STRH          r1,[r0,#0x00]
    

    This would be the optimum disassembly - using the VCVT command with S16:

      VLDR          s0,[r0,#0x04]
      VCVT.S16.F32  s0,s0
      VMOV          r1,s0
      STRH          r1,[r0,#0x00]
    

    So the __ssat macro creates one assembly line more than the optimum - this is really ok for me (as I anyway use it only one time in my time-critical loop - just at the end of the float calculation).

    You write, that there should be no direct CPU command to convert from float to int16/short, but this is not correct to my understanding of Cortex M4 - see my last email. It should be perfectly possible to do it all in one VCVT command (see VCVT in the "optimum" code snippet above) - so it would be really nice if Keil would present an intrinsic for this very powerful VCVT command (... or - this would be super-great - the possibility to do use inline assembly for Cortex M4 ...).

    (further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/... - it is a bit disturbing/ "a bad surprise" if this works without problems for int32, but then you run into problems as soon as you use int8 or int16).

  • "(further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/... - it is a bit disturbing/ "a bad surprise" if this works without problems for int32, but then you run into problems as soon as you use int8 or int16)."

    And how many thousands - or tens of thousands - or even hundreds of thousands (dare I say million?) of programmers have stumbled on this:

    uint8_t ii;
    
    for (ii = 0; ii < 256; ii++) {
        ...
    }
    

    Yes, people will always suffer "a bad surprise" when they make use of char or short variables together with numbers larger that what fits in that numeric range, and they make specific expectations about the outcome (or just totally ignores that there can even be an issue).

    When mixing variables with different numeric range, it is you, the developer, who have to take the consequences of assuming what the outcome will be when the smaller variable is too small in relation to the numeric value of the larger variable.

    There are quite a number of paragraphs in the language standard discussing mixing of variable types of different size, mixing of signed/unsigned etc. Have you read the language standard? Did you understand it? Did you realize that the language standard forms a contract? Do you think end users of a programming language are in the position to renegotiate a contract? Because in the end, your request is that Keil should add a secondary contract with additional promises outside of what is covered by the language standard. And you formulate it as if it is a direct error of Keil to not have this additional contract available, and already signed.

  • This would be the optimum disassembly - using the VCVT command with S16 ... VCVT.S16.F32 S0, S0

    The manual says there is no such instruction. See for yourself:

    infocenter.arm.com/.../index.jsp

    However, the manual does say that there is an instruction to convert from F32 to fixed-point S16:

    infocenter.arm.com/.../index.jsp

    Perhaps, this instruction could be used to convert to an S16 integer?

    VCVT.S16.F32 S0, S0, #0
    

    I'm not sure since I haven't done much fixed-point math recently.

  • Yes, I assume the 2nd version of VCVT should work.

    (But I did not test in assembly myself).

  • further of course I would suggest to fix the problem, that "i=f" produces a wrong conversion for int8/int16/...

    So how many more times will you have to be told that that is no problem, because there is no wrong conversion in those cases you tested, before you finally get it?

    The only wrong thing there is your expectation about what that code should do.