This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Bug in driver (Android SGS 2 I9100)

Note: This was originally posted on 3rd June 2011 at http://forums.arm.com

Hello,

I'm an Android dev and have bought the SGS 2 I9100 (GPU Mali 400). I have developed a tool called
GPUBench and tested on this device and found a bug which cause many artefact.

In shader, sin & cos are valid (or correct) between -6.14 and 6.14; when angle is important, result are
false.

I have created and apk which show the problem:

http://bit.ly/ikF3rz

And there is code source:

http://bit.ly/jb9H6c

How fix driver on Android ? Is it the work of ARM or Samsung ?

Thanks

twitter: ellis2323
mail: laurent.mallet at_ gmail.com

Doug Day over 11 years ago

Note: This was originally posted on 3rd June 2011 at http://forums.arm.com

Hi Laurent,

Thank you for your forum post. I've downloaded your sample code and run it on a device. After examining your code I can see that the problems are indeed occurring in your fragment shader. However, I do not believe the issue is with the atan(), sin() and cos() functions but rather an issue of precision.

In your fragment shader you are requesting high precision floating point ("precision highp float;") and again repeating (unnecessarily) the "highp" option on some of your vectors. The OpenGL ES Shading Language specification states that for fragment shaders, high precision is optional and requires a minimum of 16bit precision; applications must be written with this in mind. To check whether high precision is supported in the fragment shader you should check for the OES_fragment_precision_high extension in glGetString(GL_EXTENSIONS). I also noticed that in your Java code you are incrementing your time value (which is passed to the shader) indefinitely without sanity checks "" this will cause any system to "break' eventually regardless of the precision used. Floating point gives dynamic range rather than precision "" the best way to get what you want is to normalize your input.

Special care needs to be taken when writing applications for OpenGL ES 2.0 devices; they do not have the high level of precision of desktop OpenGL.

Can I ask what you are trying to achieve? Maybe we could help you find a better solution.

Best regards,

Doug
Cancel
Up 0 Down

Cancel
Laurent Mallet over 11 years ago

Note: This was originally posted on 3rd June 2011 at http://forums.arm.com

Hello Doug,

You have the code source, change to mediump doesn't fix anything. Sin/Cos aren't correct after 3 round => time value is near from 19. I haven't seen any restriction for angle values.
What i'm trying todo is a simple rotozoom. It works on any mobile GPU but not the Mali400.

I have written a bench tool which is a port a shadertoy and it uses intensively sin and cos in value greater than 20...
You can try it: gpubench Many effects seems crappy on mali 400. if you add mod(angle, 6.29), the result is ok. So i'm sure that there is a problem which is not precision related.

My code has been tested on many mobile gpus:
- powervr 530/535 (iphone 3GS / iphone4 / ipad 1)
- powervr 540 (SGS / Galaxy Tab)
- powervr 543 (ipad2)
- andreno 200 (N1)
- andreno 205 (HTC Desire HD)
- Geforce ULP (Xoom, LG Optimus 2X...)
...
Cancel
Up 0 Down

Cancel
Pete over 11 years ago
Note: This was originally posted on 7th June 2011 at http://forums.arm.com

Hi Laurent,

I'm certain the root cause is actually a precision issue so let's understand the implementation details first.

I don't know whether you're familiar with how floating point numbers are represented in computers but in case not here's a reminder.

In a 32-bit floating point number, the 1st bit is a sign bit (0 for positive, 1 for negative numbers). The next 8 bits are the exponent - more on this in a moment, but it's what allows a 32-bit floating point number to cover a dynamic range from 1.4x10^-45 to 3.4x10^38

The last 23 bits are the significand (also known as mantissa), holding the actual bits which will make up the number to be represented.

In the mobile space, it's often not necessary to support 32-bit precision as 16-bit will be enough accuracy, and helps with performance and power consumption. The OpenGL-ES 2.0 specification is designed with this in mind, and the OpenGL-ES Shading Language specification (available on www.khronos.org) talks in section 4.5 about Precision and Precision Qualifiers. For the fragment language it is required that the supported range is +-16,384 with precision of 1 in 1024.

16-bit floating point numbers are enough to meet this precision in the fragment processor, so Mali chooses to use these in the fragment shader for the performance and power consumption reasons above. The format here is 1 sign bit, 5 exponent bits and 10 significand bits. This means the representable dynamic range is 3.0x10^-5 to 6.6x10^4

The way the specification is worded, this is mediump. Implementations are not required to support highp, and may substitute mediump. Also, implementations which do implement highp are allowed to ignore requests for mediump and stay in highp, which may explain why you saw no difference using that keyword on other platforms.

Interestingly, due to the way floating point numbers work, the larger the number the less precision you have available in the smallest fractional part. This is true of 32-bit or 16-bit floating point numbers - you only have as many bits as the significand holds to represent the data.

The actual calculation for generating the floating point number from the sign, exponent and significand includes an implicit "hidden' bit - a "1' is always prefixed to the significand before it is scaled by 2^exponent. Also the exponent is stored added to a bias (15 in the case of 16-bit floats).

Taking exp to be the 5 bit exponent and s[0] to represent the leftmost, most significant bit and s[9] to represent the rightmost, least significant bit the equation becomes:

float = (1 + s[0] * 2^-1 + s[1] * 2^-2 + s[2] * 2^-3 ... + s[9] * 2^-10) * 2^(exp - 15)

So, to represent 1.0f in 16-bit float format we need a sign bit of 0, then an exponent of 0 which with the bias added becomes 0+15 = 15. Lastly, there is no fractional part to this number, so the significand is all zeroes (because of the implicit "1'). Combined together this becomes:

0 01111 0000000000 = 0x3c00 in hex

For your example, when the time variable in your program has reached 19, we can picture this in binary:

16 8 4 2 1 1 0 0 1 1

This consumes 5 of the significand bits, meaning there are only 5 left to represent the fractional part of the number (the result of the atan() you have added to 19). But, we must remember the leading 1 doesn't need to be stored in the significand so actually we need 4 bits for the integer part and therefore get 6 bits for the fractional part.

So, the smallest increment that can be represented in addition to 19 is:

[font="Arial"]The significand (with its prefixed "1' bit) would need to be multiplied by 2^4 to become 19 so the exponent will be 4 (plus the bias of 15) which coincidentally is also 19.

So to represent this number in 16-bit float format is this in binary:

0 10011 0011000001 = 0x4cc1 in hex

Sure enough, this number represents 19.015625 which is 19 + 1/64

What we're seeing is that the higher mTime is allowed to increment (and it increments at 0.01 per frame, so at 58FPS it increments 0.58 per second) the less accuracy there is left available at the lowest end of the number. Since the number is used to calculate an angle, I'd recommend doing a modulo in the increment function in the Java code. That way, we can control the number in a smaller range (-PI to +PI for instance). It's useful to use -PI here, since we get the sign bit for free, and sin() won't care whether it's 0 to 2PI or -PI to +PI, but using the sign bit in this way gives us one more bit for precision at the low end, rather than using one more bit to go from 3.14 to 6.28.

Since the result of atan() is also in the range -PI to +PI, this means the new range of inputs to sin() etc. is -2PI to +2PI. Since 2PI is about 6.25 (.25 makes the sums easier in binary as it's 1/4), we can see the new representation would be:

[/font]16 8 4 2 1 0 0 1 1 0

[font="Arial"]We see we've saved 2 bits off the top of the significand - in fact it's 3 because of the hidden implicit "1' so the integer part of our significand only needs 2 bits. This now leaves us 8 bits for the fractional part, meaning the smallest increment to 2PI we can hold is 1/256 - better than the 1/64 of before.

So 6.25 would need this significand:

[/font]

The exponent would be 2 (+15 = 17) so the whole number becomes:

0 10001 1001000000 = 0x4640 in hex

And, if we wanted to represent the smallest increment we can do to 6.25 we can set the bottom bit of the significand: 0x4641 which is 6.25 + 1/256 = 6.25390625. This represents an increment of only 0.06% (6 parts in 10000).

When I make my proposed change to your program, I no longer see degradation over time:

mTime += 0.01; if(mTime >= 3.14) mTime -= 6.28; // Modulo between -PI to +PI

However, that fragment shader generally is quite expensive. Our Mali Offline Shader Compiler (available to download on the portal) shows it compiles to a 7 cycle shader. To give some context, our 1080p-capable stereoscopic space racing game True//Force uses less cycles for its shaders:

http://blogs.arm.com...d-by-mali-gpus/

Admittedly our shaders were heavily optimised, but the point is it's probably not the best way to achieve a rotozoomer doing all the calculation in the fragment shader. I understand it may be trying to benchmark the fragment shader performance but perhaps a better test would be a realistic effect, like specular highlighted bumpmapped fragments. If I were trying to achieve a rotozoom effect I'd probably be looking to use texture matrices in the vertex shader where the precision is higher, then letting those coordinates be interpolated between the vertices in the fragment shader.

In summary:

Floating point precision varies platform to platform - the OpenGL-ES API gives the ability to query what is supported.
Fragment shader precision of mediump (16-bit floats) conforms to the Khronos specification.
Developers should write code that is aware of the specified precision of fragment shaders.
Values close to zero maintain more fractional bits than large numbers.
Fractional bits are important for trigonometric arguments, so wrapping angles to be close to zero may be advisable.
Hope this helps, Pete
Cancel
Up 0 Down

Cancel
Laurent Mallet over 11 years ago

Note: This was originally posted on 8th June 2011 at http://forums.arm.com

[font=arial, verdana, tahoma, sans-serif][size=2]Hello Pete,

Thanks for your response. I understand your response... Why mali 400 doesn't implement highp ? It is not available in the GPU or is it a limitation of drivers? [/size][/font]

Ellis,
Cancel
Up 0 Down

Cancel
Pete over 11 years ago

Note: This was originally posted on 8th June 2011 at http://forums.arm.com

Hi,

designing a GPU for use in a mobile phone is a big challenge - we have to balance the need for great graphics performance with the need for great battery life. There are many trade-offs to be made in the areas of power consumption, memory bandwidth, cost, etc.

We designed the Mali-400 MP pixel processor hardware to be both fast and power efficient - we believe that "mediump' is the right level of precision for the fragment shaders. Note that the vertex processor does support "highp', where we believe more precision may be needed.

To achieve the best power-saving, these precision trade-offs need to be made in the hardware. So although the Mali-400 MP has a very flexible software architecture, this is not something that we could change in the driver.

Hope this helps, Pete
Cancel
Up 0 Down

Cancel