This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Bug in driver (Android SGS 2 I9100)

Note: This was originally posted on 3rd June 2011 at http://forums.arm.com

Hello,

I'm an Android dev and have bought the SGS 2 I9100 (GPU Mali 400). I have developed a tool called
GPUBench and tested on this device and found a bug which cause many artefact.

In shader, sin & cos are valid (or correct) between -6.14 and 6.14; when angle is important, result are
false.

I have created and apk which show the problem:
And there is code source:
How fix driver on Android ? Is it the work of ARM or Samsung ?

Thanks

twitter: ellis2323
mail: laurent.mallet at_ gmail.com
Parents
  • Note: This was originally posted on 7th June 2011 at http://forums.arm.com

    Hi Laurent,

    I'm certain the root cause is actually a precision issue so  let's understand the implementation details first.

    I don't know whether you're familiar with how floating point  numbers are represented in computers but in case not here's a reminder.

    In a 32-bit floating point number, the 1st bit is a sign bit  (0 for positive, 1 for negative numbers). The next 8 bits are the exponent -  more on this in a moment, but it's what allows a 32-bit floating point number to  cover a dynamic range from 1.4x10^-45 to 3.4x10^38

    The last 23 bits are the significand (also known as  mantissa), holding the actual bits which will make up the number to be  represented.

    In the mobile space, it's often not necessary to support  32-bit precision as 16-bit will be enough accuracy, and helps with performance  and power consumption. The OpenGL-ES 2.0 specification is designed with this in  mind, and the OpenGL-ES Shading Language specification (available on www.khronos.org) talks in section 4.5 about Precision and  Precision Qualifiers. For the fragment language it is required that the  supported range is +-16,384 with precision of 1 in 1024.

    16-bit floating point numbers are enough to meet this  precision in the fragment processor, so Mali chooses to use these in the  fragment shader for the performance and power consumption reasons above. The  format here is 1 sign bit, 5 exponent bits and 10 significand bits. This means  the representable dynamic range is 3.0x10^-5 to 6.6x10^4

    The way the specification is worded, this is mediump.  Implementations are not required to support highp, and may substitute mediump.  Also, implementations which do implement highp are allowed to ignore requests  for mediump and stay in highp, which may explain why you saw no difference using  that keyword on other platforms.

    Interestingly, due to the way floating point numbers work,  the larger the number the less precision you have available in the smallest  fractional part. This is true of 32-bit or 16-bit floating point numbers - you  only have as many bits as the significand holds to represent the data.

    The actual calculation for generating the floating point  number from the sign, exponent and significand includes an implicit "˜hidden' bit  - a "˜1' is always prefixed to the significand before it is scaled by 2^exponent.  Also the exponent is stored added to a bias (15 in the case of 16-bit  floats).

    Taking exp to be the 5 bit exponent and s[0] to represent the  leftmost, most significant bit and s[9] to represent the rightmost, least  significant bit the equation becomes:

    float = (1 + s[0] * 2^-1 + s[1] * 2^-2 + s[2] * 2^-3 ... +  s[9] * 2^-10) * 2^(exp - 15)

    So, to represent 1.0f in 16-bit float format we need a sign  bit of 0, then an exponent of 0 which with the bias added becomes 0+15 = 15.  Lastly, there is no fractional part to this number, so the significand is all  zeroes (because of the implicit "˜1'). Combined together this becomes:

    0 01111 0000000000 = 0x3c00 in hex

    For your example, when the time variable in your program has  reached 19, we can picture this in binary:


    16 8 4 2 1
    1 0 0 1 1


    This consumes 5 of the significand bits, meaning there are  only 5 left to represent the fractional part of the number (the result of the  atan() you have added to 19). But, we must remember the leading 1 doesn't need  to be stored in the significand so actually we need 4 bits for the integer part  and therefore get 6 bits for the fractional part.

    So, the smallest increment that can be represented in  addition to 19 is:



    [font="Arial"]The significand (with its prefixed "˜1' bit) would need to be  multiplied by 2^4 to become 19 so the exponent will be 4 (plus the bias of 15)  which coincidentally is also 19.

    So to represent this number in 16-bit float format is this in  binary:

    0 10011 0011000001 = 0x4cc1 in hex

    Sure enough, this number represents 19.015625 which is 19 + 1/64

    What we're seeing is that the higher mTime is allowed to  increment (and it increments at 0.01 per frame, so at 58FPS it increments 0.58  per second) the less accuracy there is left available at the lowest end of the  number. Since the number is used to calculate an angle, I'd recommend doing a  modulo in the increment function in the Java code. That way, we can control the  number in a smaller range (-PI to +PI for instance). It's useful to use -PI  here, since we get the sign bit for free, and sin() won't care whether it's 0 to  2PI or -PI to +PI, but using the sign bit in this way gives us one more bit for  precision at the low end, rather than using one more bit to go from 3.14 to  6.28.

    Since the result of atan() is also in the range -PI to +PI,  this means the new range of inputs to sin() etc. is -2PI to +2PI. Since 2PI is  about 6.25 (.25 makes the sums easier in binary as it's 1/4), we can see the new  representation would be:

    [/font]
    16 8 4 2 1
    0 0 1 1 0


    [font="Arial"]We see we've saved 2 bits off the top of the significand - in  fact it's 3 because of the hidden implicit "˜1' so the integer part of our  significand only needs 2 bits. This now leaves us 8 bits for the fractional  part, meaning the smallest increment to 2PI we can hold is 1/256 - better than  the 1/64 of before.

    So 6.25 would need this significand:

    [/font]

    The exponent would be 2 (+15 = 17) so the whole number  becomes:

    0 10001 1001000000 = 0x4640 in hex

    And, if we wanted to represent the smallest increment we can  do to 6.25 we can set the bottom bit of the significand: 0x4641 which is 6.25 +  1/256 = 6.25390625. This represents an increment of only 0.06% (6 parts in  10000).

    When I make my proposed change to your program, I no longer  see degradation over time:


       mTime +=  0.01;
       if(mTime >= 3.14) mTime -= 6.28; //  Modulo between -PI to +PI


    However, that fragment shader generally is quite expensive.  Our Mali Offline Shader Compiler (available to download on the portal) shows it  compiles to a 7 cycle shader. To give some context, our 1080p-capable  stereoscopic space racing game True//Force uses less cycles for its shaders:

    http://blogs.arm.com...d-by-mali-gpus/

    Admittedly our shaders were heavily optimised, but the point  is it's probably not the best way to achieve a rotozoomer doing all the  calculation in the fragment shader. I understand it may be trying to benchmark  the fragment shader performance but perhaps a better test would be a realistic  effect, like specular highlighted bumpmapped fragments. If I were trying to  achieve a rotozoom effect I'd probably be looking to use texture matrices in the  vertex shader where the precision is higher, then letting those coordinates be  interpolated between the vertices in the fragment shader.

    In summary:

    • Floating point precision varies platform to platform - the  OpenGL-ES API gives the ability to query what is supported.
    • Fragment shader precision of mediump (16-bit floats)  conforms to the Khronos specification.
    • Developers should write code that is aware of the specified  precision of fragment shaders.
    • Values close to zero maintain more fractional bits than  large numbers.
    • Fractional bits are important for trigonometric arguments,  so wrapping angles to be close to zero may be advisable.
    Hope this helps, Pete
Reply
  • Note: This was originally posted on 7th June 2011 at http://forums.arm.com

    Hi Laurent,

    I'm certain the root cause is actually a precision issue so  let's understand the implementation details first.

    I don't know whether you're familiar with how floating point  numbers are represented in computers but in case not here's a reminder.

    In a 32-bit floating point number, the 1st bit is a sign bit  (0 for positive, 1 for negative numbers). The next 8 bits are the exponent -  more on this in a moment, but it's what allows a 32-bit floating point number to  cover a dynamic range from 1.4x10^-45 to 3.4x10^38

    The last 23 bits are the significand (also known as  mantissa), holding the actual bits which will make up the number to be  represented.

    In the mobile space, it's often not necessary to support  32-bit precision as 16-bit will be enough accuracy, and helps with performance  and power consumption. The OpenGL-ES 2.0 specification is designed with this in  mind, and the OpenGL-ES Shading Language specification (available on www.khronos.org) talks in section 4.5 about Precision and  Precision Qualifiers. For the fragment language it is required that the  supported range is +-16,384 with precision of 1 in 1024.

    16-bit floating point numbers are enough to meet this  precision in the fragment processor, so Mali chooses to use these in the  fragment shader for the performance and power consumption reasons above. The  format here is 1 sign bit, 5 exponent bits and 10 significand bits. This means  the representable dynamic range is 3.0x10^-5 to 6.6x10^4

    The way the specification is worded, this is mediump.  Implementations are not required to support highp, and may substitute mediump.  Also, implementations which do implement highp are allowed to ignore requests  for mediump and stay in highp, which may explain why you saw no difference using  that keyword on other platforms.

    Interestingly, due to the way floating point numbers work,  the larger the number the less precision you have available in the smallest  fractional part. This is true of 32-bit or 16-bit floating point numbers - you  only have as many bits as the significand holds to represent the data.

    The actual calculation for generating the floating point  number from the sign, exponent and significand includes an implicit "˜hidden' bit  - a "˜1' is always prefixed to the significand before it is scaled by 2^exponent.  Also the exponent is stored added to a bias (15 in the case of 16-bit  floats).

    Taking exp to be the 5 bit exponent and s[0] to represent the  leftmost, most significant bit and s[9] to represent the rightmost, least  significant bit the equation becomes:

    float = (1 + s[0] * 2^-1 + s[1] * 2^-2 + s[2] * 2^-3 ... +  s[9] * 2^-10) * 2^(exp - 15)

    So, to represent 1.0f in 16-bit float format we need a sign  bit of 0, then an exponent of 0 which with the bias added becomes 0+15 = 15.  Lastly, there is no fractional part to this number, so the significand is all  zeroes (because of the implicit "˜1'). Combined together this becomes:

    0 01111 0000000000 = 0x3c00 in hex

    For your example, when the time variable in your program has  reached 19, we can picture this in binary:


    16 8 4 2 1
    1 0 0 1 1


    This consumes 5 of the significand bits, meaning there are  only 5 left to represent the fractional part of the number (the result of the  atan() you have added to 19). But, we must remember the leading 1 doesn't need  to be stored in the significand so actually we need 4 bits for the integer part  and therefore get 6 bits for the fractional part.

    So, the smallest increment that can be represented in  addition to 19 is:



    [font="Arial"]The significand (with its prefixed "˜1' bit) would need to be  multiplied by 2^4 to become 19 so the exponent will be 4 (plus the bias of 15)  which coincidentally is also 19.

    So to represent this number in 16-bit float format is this in  binary:

    0 10011 0011000001 = 0x4cc1 in hex

    Sure enough, this number represents 19.015625 which is 19 + 1/64

    What we're seeing is that the higher mTime is allowed to  increment (and it increments at 0.01 per frame, so at 58FPS it increments 0.58  per second) the less accuracy there is left available at the lowest end of the  number. Since the number is used to calculate an angle, I'd recommend doing a  modulo in the increment function in the Java code. That way, we can control the  number in a smaller range (-PI to +PI for instance). It's useful to use -PI  here, since we get the sign bit for free, and sin() won't care whether it's 0 to  2PI or -PI to +PI, but using the sign bit in this way gives us one more bit for  precision at the low end, rather than using one more bit to go from 3.14 to  6.28.

    Since the result of atan() is also in the range -PI to +PI,  this means the new range of inputs to sin() etc. is -2PI to +2PI. Since 2PI is  about 6.25 (.25 makes the sums easier in binary as it's 1/4), we can see the new  representation would be:

    [/font]
    16 8 4 2 1
    0 0 1 1 0


    [font="Arial"]We see we've saved 2 bits off the top of the significand - in  fact it's 3 because of the hidden implicit "˜1' so the integer part of our  significand only needs 2 bits. This now leaves us 8 bits for the fractional  part, meaning the smallest increment to 2PI we can hold is 1/256 - better than  the 1/64 of before.

    So 6.25 would need this significand:

    [/font]

    The exponent would be 2 (+15 = 17) so the whole number  becomes:

    0 10001 1001000000 = 0x4640 in hex

    And, if we wanted to represent the smallest increment we can  do to 6.25 we can set the bottom bit of the significand: 0x4641 which is 6.25 +  1/256 = 6.25390625. This represents an increment of only 0.06% (6 parts in  10000).

    When I make my proposed change to your program, I no longer  see degradation over time:


       mTime +=  0.01;
       if(mTime >= 3.14) mTime -= 6.28; //  Modulo between -PI to +PI


    However, that fragment shader generally is quite expensive.  Our Mali Offline Shader Compiler (available to download on the portal) shows it  compiles to a 7 cycle shader. To give some context, our 1080p-capable  stereoscopic space racing game True//Force uses less cycles for its shaders:

    http://blogs.arm.com...d-by-mali-gpus/

    Admittedly our shaders were heavily optimised, but the point  is it's probably not the best way to achieve a rotozoomer doing all the  calculation in the fragment shader. I understand it may be trying to benchmark  the fragment shader performance but perhaps a better test would be a realistic  effect, like specular highlighted bumpmapped fragments. If I were trying to  achieve a rotozoom effect I'd probably be looking to use texture matrices in the  vertex shader where the precision is higher, then letting those coordinates be  interpolated between the vertices in the fragment shader.

    In summary:

    • Floating point precision varies platform to platform - the  OpenGL-ES API gives the ability to query what is supported.
    • Fragment shader precision of mediump (16-bit floats)  conforms to the Khronos specification.
    • Developers should write code that is aware of the specified  precision of fragment shaders.
    • Values close to zero maintain more fractional bits than  large numbers.
    • Fractional bits are important for trigonometric arguments,  so wrapping angles to be close to zero may be advisable.
    Hope this helps, Pete
Children
No data