This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Many Issue with Mali 400 on A10 (Linux)

Note: This was originally posted on 17th August 2012 at http://forums.arm.com

Hi,

I am facing many problems with Mali 400. One of the big issue is lib is close source so can't change anything.

I am in advance stage of product development but stuck with many issues.

Point 1.

I was using Ericsson Texture Compression Mipmap Example and trying to work with normal RGBA texture.

Here is my load texture code.

/* Load just base level texture data. */
GL_CHECK(glGenTextures(1, &textureID));
GL_CHECK(glBindTexture(GL_TEXTURE_2D, textureID));
unsigned char *textureData = NULL;
Texture::loadData(texturePath.c_str(), &textureData);
    
GL_CHECK(glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 256, 256, 0, GL_RGBA, GL_UNSIGNED_BYTE, textureData));    

Here it just render a black area.

Just by referring RotoZoom program and with hit and try I fogure out it work if I add.

/* Set texture mode. */
GL_CHECK(glGenerateMipmap(GL_TEXTURE_2D));

after glTexImage2D.

Is glGenerateMipmap must to use texture ?
besides there is not any sample program which load basic rgba texture.

Point 2.
Besides with 2 or 3 test system give sigmentation fault continuously. Even is workable program.

Same program which was working before not work now and can work again on system restart.

Point 3.
EGLSwapbuffer is consuming 50% CPU. & I can't find any way to render directly to frame-buffer. Mali document say it's platform specific.
I can't understand FB interface is fix in all linux how come it could be platform specific ?.
Every body no most of vendor not support for anything like this.

Point 4.
http://www.arm.com/products/multimedia/mali-graphics-hardware/mali-400-mp.php
here mention 30M tri/s and up to 1.1G pix/s at 275MHz.

Correct me if I am wrong 1.1G Pix is filling pixels of screen with color or texture.

1.1 * 1000 *1000 *1000 / 1920 / 1080 / 60 = 8.84 Layers

if means if I render full hd @ 60 FPS I should be able to render full screen 8.84 times.

I am using a simple triangle program which is filling the screen and rendering 8 times in one round. then performance is dropping to 14fps.

Means 4 times slower than speq. where other points gpu is running on 320 Mhz not on 275 Mhz.

Point 5.

A very simple triangle program is not working in GLES1 so I am bound to use only GLES2.
As many places internet mention GLES1 is faster than GLES2.
There is not any sample program for GLES1. and document talk about gles1 simulator but there is not any gles1 simulator on mali website.


Point 6.

What is the process to get Mali 400 DDK Source.
Is it possible to get even after spending some money ?


Any clarification in above points can help us.

Thanks in Advance.


Regards

Piyush Verma
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com

    Hi Piyush,

    for point 1, as you have discovered, if OpenGL-ES considers the texture "incomplete" it will instead render black texels. In this case, the glTexParameteri() mode for GL_TEXTURE_MIN_FILTER has probably been left on its default of GL_NEAREST_MIPMAP_LINEAR - in which case OpenGL-ES will consider the texture incomplete unless you load *all* mipmap levels, or generate them automatically using the glGenerateMipmap() call as you have done.

    Hope this clarifies what was happening. Cheers, Pete
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com

    Hi Piyush,

    for point 2 I think more information would be needed to understand the problem. Can you post a stack backtrace of the segmentation fault? Can you reduce the problem to a small code snippet and share the steps to reproduce?

    Cheers, Pete
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com

    Hi Piyush,

    for point 3 - how are you measuring this? Because the Mali GPU uses a deferred architecture, it is sometimes not unusual to see a seemlingly large amount of time in eglSwapBuffers(). The way the deferred architecture works, the driver collects all of the draw commands during the frame, but just adds them to a queue - it does not do actual rendering at the time of the glDraw*() call like an immediate mode renderer would. Instead, when the end of the frame is reached as indicated by eglSwapBuffers(), the driver will then work out the required drawing operations and execute them.

    Could this be what you are seeing?

    HTH, Pete
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com

    Hi Piyush,

    regarding point 5 - have you seen the Mali OpenGL-ES 1.1 emulator here?

    http://www.malideveloper.com/developer-resources/tools/opengl-es-11-emulator.php

    If you can post your OpenGL-ES code that isn't working perhaps we can help you debug it.

    I'm not sure the statement about OpenGL-ES 1.1 vs 2.0 being faster is necessarily true - it depends what effects you are trying to achieve. Some effects will be much harder when forced to use the fixed funtionality of 1.1 when you could write a much more efficient custom shader yourself.

    Can you explain more about what you are trying to achieve?

    HTH, Pete
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com

    Hi Piyush,

    for point 6, no - ARM would only expect to licence the Mali-400 Driver Development Kit source code to a Mali-400 silicon licencee.

    It should not be necessary to need the OpenGL-ES implementation source code in order to write or debug applications using the OpenGL-ES API - this is part of the point of abstracting the graphics operations into a standards-body specified API.

    HTH, Pete
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com


    Got it thanks.

    Was it really available there before or u uploaded it :rolleyes: .


    I'm afraid it was there all along :-)
  • Note: This was originally posted on 6th September 2012 at http://forums.arm.com


    Hi Piyush,

    for point 3 - how are you measuring this? Because the Mali GPU uses a deferred architecture, it is sometimes not unusual to see a seemlingly large amount of time in eglSwapBuffers(). The way the deferred architecture works, the driver collects all of the draw commands during the frame, but just adds them to a queue - it does not do actual rendering at the time of the glDraw*() call like an immediate mode renderer would. Instead, when the end of the frame is reached as indicated by eglSwapBuffers(), the driver will then work out the required drawing operations and execute them.

    Could this be what you are seeing?

    HTH, Pete



    Hello, regarding your anwser to the point 3 I have a question: here is our usual process to render a scene (not speaking about multithreading now) :

    while (gameloop)
    {
        1 - Do all GL commands according to the previous game behavior update
        2 - Do game behavior update
        3 - Call eglSwapBuffers
    };

    We call eglSwapBuffers(3) after the game behavior update(2)  to let the GPU work during this time (as we have sent all GL command before in (1)).

    Does it means that on Mali 400 this method is not a good one? What would you advise?

    Thank you,
  • Note: This was originally posted on 6th September 2012 at http://forums.arm.com

    Like most embedded GPUs Mali only starts rendering when you call eglSwapBuffers - it's more power efficient to buffer things up and submit it all to the hardware in one big batch, so what you are actually doing is forcing a big gap between your GL commands and the hardware starting.

    The following is far more common

    while( true )
        Do game behavior update
        Call glClear ( COLOR | DEPTH | STENCIL )
        Do all GL commands according to the game behavior update
        Call eglSwapBuffers


    Remember eglSwapBuffers is asynchronous and doesn't actually swap anything, it just tells the driver "I'm done with this window surface". The actual window system update happens "later" under driver control.
  • Note: This was originally posted on 6th September 2012 at http://forums.arm.com


    Like most embedded GPUs Mali only starts rendering when you call eglSwapBuffers - it's more power efficient to buffer things up and submit it all to the hardware in one big batch, so what you are actually doing is forcing a big gap between your GL commands and the hardware starting.

    The following is far more common

    while( true )
        Do game behavior update
        Call glClear ( COLOR | DEPTH | STENCIL )
        Do all GL commands according to the game behavior update
        Call eglSwapBuffers


    Remember eglSwapBuffers is asynchronous and doesn't actually swap anything, it just tells the driver "I'm done with this window surface". The actual window system update happens "later" under driver control.


    Hello, thank you for your reply.

    If I well understand: the eglSwapBuffers take care of the GL commands I do, pass it to the GPU that will handle these command asynchronously. Then, if the next time I call eglSwapBuffers, the previous list of gl command is not yet finish to draw by the GPU, eglSwapBuffers will wait until it's finished ; this is why we can have a big amount of time consumed by eglSwapBuffers.

    Is that right?

    Regards,
  • Note: This was originally posted on 6th September 2012 at http://forums.arm.com

    Yep, sounds like you've got it.

    The eglSwapBuffer wait time varies - it depends on the level on N-buffering supported by the window system. We only need to wait if we are running ahead of the windowing system and so do not have a buffer to render to (you don't want to run hundreds of frames ahead of the hardware, latency is bad and you just use a lot of memory, so most windowing systems rate limit the driver stack so it is only a few frames ahead of what is on screen); that rate limiting is what causes us to wait in eglSwapBuffers in most cases.
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com

    Hi Piyush,

    regarding point 4, I expect the numbers are considering the best possible scenario. For instance, they may assume an infinitely fast memory bus which is able to consume the output from the Mali. In the real world, perhaps your device has become memory bandwidth saturated before the Mali has reached peak pixel output? What is the memory bandwidth of your device?

    Also, I imagine things like color depth and Z buffer come into play - to achieve the maximum throughput you would probably ensure the depth buffer is disabled, and that no writes are happening to it to consume extra cycles. Similarly, the color depth would be configured to the minimum to maximise output bus efficiency.

    What configuration are you using for your depth and color buffers? What method are you using to try rendering 8 layers on top of each other?

    Cheers, Pete
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com


    Hi Piyush,

    for point 1, as you have discovered, if OpenGL-ES considers the texture "incomplete" it will instead render black texels. In this case, the glTexParameteri() mode for GL_TEXTURE_MIN_FILTER has probably been left on its default of GL_NEAREST_MIPMAP_LINEAR - in which case OpenGL-ES will consider the texture incomplete unless you load *all* mipmap levels, or generate them automatically using the glGenerateMipmap() call as you have done.

    Hope this clarifies what was happening. Cheers, Pete


    First Thank you very much Pete for so quick all reply. I was so much surprised with so quick response.

    So as conclusion even if I don't need mipmap I need to call  glGenerateMipmap() right ?
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com


    Hi Piyush,

    regarding point 5 - have you seen the Mali OpenGL-ES 1.1 emulator here?

    http://www.malidevel...11-emulator.php

    If you can post your OpenGL-ES code that isn't working perhaps we can help you debug it.

    I'm not sure the statement about OpenGL-ES 1.1 vs 2.0 being faster is necessarily true - it depends what effects you are trying to achieve. Some effects will be much harder when forced to use the fixed funtionality of 1.1 when you could write a much more efficient customer shader yourself.

    Can you explain more about what you are trying to achieve?

    HTH, Pete


    Thanks Pete,

    I don't have early program But I will write once again and post here latter.

    My use case is multiple digital photo frame in single screen. where is not use of 3d engine similer to cocos2d.
    Each photo frame will change picture with animation effects.
    There will also be video playback same time in some partial area. where decoded frame will be rendered as texture.

    In My case one texture is not going to render again. So it may not need mimap.


    Thanks & Regards

    Piyush Verma
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com


    Hi Piyush,

    regarding point 5 - have you seen the Mali OpenGL-ES 1.1 emulator here?

    http://www.malidevel...11-emulator.php

    If you can post your OpenGL-ES code that isn't working perhaps we can help you debug it.

    I'm not sure the statement about OpenGL-ES 1.1 vs 2.0 being faster is necessarily true - it depends what effects you are trying to achieve. Some effects will be much harder when forced to use the fixed funtionality of 1.1 when you could write a much more efficient customer shader yourself.

    Can you explain more about what you are trying to achieve?

    HTH, Pete


    Got it thanks.

    Was it really available there before or u uploaded it :rolleyes: .

    Thanks :rolleyes:
  • Note: This was originally posted on 17th August 2012 at http://forums.arm.com


    So as conclusion even if I don't need mipmap I need to call  glGenerateMipmap() right ?


    Not quite - if you don't want to use mipmaps, you can use glTexParameteri() to set GL_TEXTURE_MIN_FILTER to either GL_NEAREST (no filtering at all, will look low visxual quality but be fast) or GL_LINEAR (no mipmap levels, but 4 texels will be sampled and averaged, so higher visual quality).

    However, typically using mipmaps is advisable - they mean the GPU can make better use of the limited amount of texture cache available. When a large texture is drawn on a small triangle, only sparse points on the texture will be sampled even though the cache brings in blocks of adjacent texels - so the cache will be filled with data which won't be used again, defeating the point of the cache. When using mipmaps, the GPU selects the most appropriate mipmap level to use, and the texels sampled will be much more likely to be adjacent - leading to cache hits and improved performance, memory bandwidth usage and power consumption.

    HTH, Pete