This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

MALI-400 : eglCreateImageKHR, EGL_GL_TEXTURE_2D_KHR and updating textures with the CPU

Hello everybody, I'm currently struggling with the said system.Is there somewhere a *full* sample code for Linux that does create an EGLImage for a texture and demonstrates how to update it with the CPU ?

Reference documentation seems ok but eglGetError keeps telling me I don't know what I'm doing

I won't post my various tries here because they don't work and therefore have no value for the reader, but I've been romaing the web and trying stuff for a while.

Cheers, Tramb

  • Hi tramboi,

    We don't have a complete sample code for this. Can you show us a code snippet and explain what you are trying to achieve, we could perhaps help you that way. The good news is the extensions which adds support for eglCreateImageKHR and EGL_GL_TEXTURE_2D_KHR are supported on Mali-400 devices.

    Also if you haven't seen how to use eglimage on linux platform already I would recommend have a look maybe this will help.

    /Wasim

  • Hello Wasim and thanks for the help,

    I've already seen the thread you link but I don't have any header with mali_egl_image_lock_ptr in my toolchain, I can see these functions with IDA in my libMali.so but I have (of course) no idea about the prototypes.

    Do you confirm I have to use this mali_egl interface (through dynamic linking and pointer casting, maybe) ? If so I'd need the signatures.

    I found other code on the Internet and I tried to eglLockSurfaceKHR or eglQuerySurface EGL_BITMAP_POINTER_KHR and EGL_BITMAP_PITCH_KHR but without success.

    I'm quite confused about the direction to take.

    To sum up the bigger picture, I'm updating a texture from CPU every frame (no choice there) and I'm trying to avoid the costly texture swizzling in glTexSubImage2D.

    I didn't find a way to specify a linear texture, which would alleviate the cost.

    I didn't find a way to upload a pre-swizzled texture (which I could do sooner, NEONized and multithreaded myself if I knew/reversed the swizzling pattern)

    and last, hence my questions, I didn't find a way to simulate PBO operation to work in place and do my fence synchronization by myself, which would be the best (I guess) option for excellent Mali performance.

    (I'm quite used to low-level programming so OpenGL and even more so OpenGL ES 2 is always a struggle to fight higher level abstraction cost )

    Cheers,

    Bertrand

  • I'm working on a R16 board, Cortex A7 with Mali400, without X11 support (fbdev instead).

    No libGLES_mali.so here

    # strings libMali.so | grep 'r[0-9]p[0-9]-'

    1.4 Linux-r4p0-00rel0

                Mali online shader compiler r4p0-00rel0 [Revision 96995].

    I know the swizzle to be costly (and CPU-bound)  because of my profiling with DS-5 Streamline.

    The offenders are _mali_convert_tex8_l_to_tex8_b if I do my palette lookup on the GPU and _mali_convert_tex32_l_to_tex32_b in the other case.

    I expect this to be the swizzling code (l_to_b => linear to block?) but it might be a wrong assumption.

    I'll just be rendering a full-screen quad every frame (maybe scaled with bilinear filtering), so I'm not sure that the swizzling is worth it (and if it is, I could probably do it with a lower latency than the implementation in libMali.so by using several cores and NEON code).

    We're just shipping on one definite platform so I'm definitely willing to specialize stuff to hit 60Hz and hardcode the swizzling pattern.

    I thought PBO was not an option because we have only OpenGL ES 2.0, which doesn't support it.

    But the main goal would be to have 0 copy and generate the texture in-place with the CPU, with double-buffering and fences to synchronize all this.

    When I first tried to address the problem I noticed that the following extensions are provided:

    EGL_KHR_image

    EGL_KHR_image_base

    EGL_KHR_gl_texture_2D_image

    EGL_KHR_reusable_sync

    EGL_KHR_fence_sync

    EGL_KHR_lock_surface

    EGL_KHR_lock_surface2

    and it seemed to me they were there to address exactly my problem. In a quite-portable way moreso, without using the mali_ namespace.

    I just can't for the life of me connect the dots, and that's why I'm asking for help.

    (I have looked at the dma stuff but EXT_image_dma_buf_import is not available for me)

    Thanks for reading all this !

    Bertrand

  • If dma_buf is not supported in your BSP the only other way is to use UMP.

    I will have to double check whether the magic code I have will work for that version of the driver or not but before I do that can you tell me if your driver is built with UMP?

  • I'm not sure, I'll ask the system team but their first gut feeling is "I don't think so".

  • I still don't have the answer. We contacted our supplier, ie your partner, to get some answers.

    When you're speaking about dma_buf vs UMP, you're thinking about the way to update device memory ?

    Is there a way to query the driver to know how glTexImage2D does it ? I was thinking the upload was done through DMA, after the swizzling is done on the CPU, but EXT_image_dma_buf_import is not exposed.

  • Hi tramboi,

    Going the dynamic linking route is not a good solution but could you provide the output of the following if you have access to libMali.so.

    strings libGLES_mali.so | grep 'r[0-9]p[0-9]-' or strings libMali.so | grep 'r[0-9]p[0-9]-'

    Which device are you using and where did you get the BSP from?

    Pre-Swizzled textures are too platform specific so I would avoid that. Also the swizzle patterns are not open.

    Why do you think texture swizzle is costly? Remember even if you have a 1:1 mapped texture onto the screen and you are using the texture multiple times in the frame you will still benefit from swizzle. Its very useful if you are drawing the texture in arbitrary orientation.

    If you don't have anything else to do while the upload happens then I understand otherwise why PBOs are not an option?

    Here are a few things you could try in the mean time. If your BSP has X11 support you could do a zero-copy upload of image data to OpenGL ES via Pixmaps. You will need to create an EGLImage from Pixmap and then Texture from the EGLImage. Your CPU code will have to find a way to fill the pixmap (not sure if this is possible).

    Another thing you could try is to use a Linux dma_buf file descriptor to create an EGLImage and then texture . More details https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt. AFAIK this is also a zero-copy operation.

    HTH,

    Wasim

  • UMP driver is usually built as a kernel module so if you look into your kernel whether its loaded or not. You will get an answer. I am actually trying to see if this technique mentioned How to share texture memory between CPU/GPU for firefly's/rk3288 fbdev Mali-T764 will work with UMP or not.

    tramboi wrote:

    Is there a way to query the driver to know how glTexImage2D does it ?

    I don't think so.

  • Here is the kind of boilerplate code that I'd like to work, by joining lots of disparateinformation from the web.

    Except lock/unlocking fails.

    #include <cassert>
    #include <cinttypes>
    #include <cstdint>
    #include <cstdio>
    #include <cstdlib>
    #include <cstring>
    #include <ctime>
    
    
    #define EGL_EGLEXT_PROTOTYPES
    #include <EGL/egl.h>
    #include <EGL/eglext.h>
    #define GL_GLEXT_PROTOTYPES
    #include <GLES2/gl2.h>
    #include <GLES2/gl2ext.h>
    
    
    #include <linux/fb.h>
    #include <sys/ioctl.h>
    #include <unistd.h>
    
    
    struct fbdev_window nativeWindow = {
      .width = 1280,
      .height = 720,
    };
    
    
    int main(int argc, char* argv []) {
      auto display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
      assert(display != EGL_NO_DISPLAY);
    
    
      {
      EGLint major = 0;
      EGLint minor = 0;
      auto success = eglInitialize(display, &major, &minor);
      assert(success);
      printf("EGL Initialize: %d.%d\n", major, minor);
      }
    
    
      printf("EGL Version: \"%s\"\n", eglQueryString(display, EGL_VERSION));
      printf("EGL Vendor: \"%s\"\n", eglQueryString(display, EGL_VENDOR));
      printf("EGL Extensions: \"%s\"\n", eglQueryString(display, EGL_EXTENSIONS));
    
    
      EGLConfig config = {};
      {
      EGLint configCount = 0;
      EGLint configAttributes [] = {
      EGL_RED_SIZE,
      8,
      EGL_GREEN_SIZE,
      8,
      EGL_BLUE_SIZE,
      8,
      EGL_ALPHA_SIZE,
      8,
      EGL_BUFFER_SIZE,
      32,
      EGL_STENCIL_SIZE,
      0,
      EGL_DEPTH_SIZE,
      0,
      EGL_SAMPLES,
      4,
      EGL_RENDERABLE_TYPE,
      EGL_OPENGL_ES2_BIT,
      EGL_SURFACE_TYPE,
      EGL_WINDOW_BIT | EGL_PIXMAP_BIT,
      EGL_NONE
      };
      eglChooseConfig(display, configAttributes, &config, 1, &configCount);
      }
    
    
      static EGLint const contextAttributes [] = {
      EGL_CONTEXT_CLIENT_VERSION,
      2,
      EGL_NONE
      };
      auto context =
        eglCreateContext(display, config, EGL_NO_CONTEXT, contextAttributes);
      assert(context != EGL_NO_CONTEXT);
    
    
      static EGLint const windowAttributes [] = {EGL_NONE};
      auto surface =
        eglCreateWindowSurface(display, config, &nativeWindow, windowAttributes);
      assert(surface != EGL_NO_SURFACE);
    
    
      GLint width = 0;
      {
      auto success = eglQuerySurface(display, surface, EGL_WIDTH, &width);
      assert(success);
      }
    
    
      GLint height = 0;
      {
      auto success = eglQuerySurface(display, surface, EGL_HEIGHT, &height);
      assert(success);
      }
      printf("Surface size: %dx%d\n", width, height);
    
    
      {
      auto success = eglMakeCurrent(display, surface, surface, context);
      assert(success);
      }
    
    
      printf("GL Vendor: \"%s\"\n", glGetString(GL_VENDOR));
      printf("GL Renderer: \"%s\"\n", glGetString(GL_RENDERER));
      printf("GL Version: \"%s\"\n", glGetString(GL_VERSION));
      printf("GL Extensions: \"%s\"\n", glGetString(GL_EXTENSIONS));
    
    
      auto vertShader = glCreateShader(GL_VERTEX_SHADER);
      assert(vertShader);
    
    
      static char const* vertShaderSource = "attribute vec4 aPosition;    \n"
                                          "attribute vec4 aColor;      \n"
                                          "                            \n"
                                          "varying vec4 vColor;        \n"
                                          "                            \n"
                                          "void main()                  \n"
                                          "{                            \n"
                                          "    vColor = aColor;        \n"
                                          "    gl_Position = aPosition; \n"
                                          "}                            \n";
      glShaderSource(vertShader, 1, &vertShaderSource, NULL);
      glCompileShader(vertShader);
    
    
      {
      GLint status = GL_FALSE;
      glGetShaderiv(vertShader, GL_COMPILE_STATUS, &status);
    
    
      {
      GLint length = 0;
      glGetShaderiv(vertShader, GL_INFO_LOG_LENGTH, &length);
    
    
      if (length > 0) {
      auto logBuffer = new char[length];
      memset(logBuffer, 0, length);
      glGetShaderInfoLog(vertShader, length, NULL, logBuffer);
      printf("Vertex shader log:\n%s\n", logBuffer);
      delete [] logBuffer;
      }
    
    
      assert(status == GL_TRUE);
      }
      }
    
    
      auto fragShader = glCreateShader(GL_FRAGMENT_SHADER);
      assert(fragShader);
    
    
      static char const* fragShaderSource =
    #ifndef NOEGLIMAGE
          "#extension GL_OES_EGL_image_external : require\n"
    #endif
                "precision mediump float;    \n"
                "                            \n"
                "varying vec4 vColor;        \n"
    #ifdef NOEGLIMAGE
                      "uniform sampler2D uTexture;  \n"
    #else
                            "uniform samplerExternalOES uTexture;  \n"
    #endif
                                  "uniform float uWidth;        \n"
                                  "uniform float uHeight;      \n"
                                  "                            \n"
                                  "void main()                  \n"
                                  "{                            \n"
                                  "    vec2 coord = gl_FragCoord.xy;\n"
                                  "    coord.x /= uWidth;\n"
                                  "    coord.y /= uHeight;\n"
                                  "    gl_FragColor = texture2D(uTexture, coord);  \n"
                                  "}                            \n";
      glShaderSource(fragShader, 1, &fragShaderSource, NULL);
      glCompileShader(fragShader);
    
    
      {
      GLint status = GL_FALSE;
      glGetShaderiv(fragShader, GL_COMPILE_STATUS, &status);
    
    
      {
      GLint length = 0;
      glGetShaderiv(fragShader, GL_INFO_LOG_LENGTH, &length);
    
    
      if (length > 0) {
      auto logBuffer = new char[length];
      memset(logBuffer, 0, length);
      glGetShaderInfoLog(fragShader, length, NULL, logBuffer);
      printf("Fragment shader log:\n%s\n", logBuffer);
      delete [] logBuffer;
      }
    
    
      assert(status == GL_TRUE);
      }
      }
    
    
      auto program = glCreateProgram();
      assert(program);
    
    
      glAttachShader(program, vertShader);
      glAttachShader(program, fragShader);
    
    
      glBindAttribLocation(program, 0, "aPosition");
      glBindAttribLocation(program, 1, "aColor");
    
    
      glLinkProgram(program);
    
    
      {
      GLint status = GL_FALSE;
      glGetProgramiv(program, GL_LINK_STATUS, &status);
    
    
      {
      GLint length = 0;
      glGetProgramiv(program, GL_INFO_LOG_LENGTH, &length);
    
    
      if (length > 0) {
      auto logBuffer = new char[length];
      memset(logBuffer, 0, length);
      glGetShaderInfoLog(program, length, NULL, logBuffer);
      printf("Program log:\n%s\n", logBuffer);
      delete [] logBuffer;
      }
    
    
      assert(status == GL_TRUE);
      }
      }
    
    
      glUseProgram(program);
      assert(glGetError() == GL_NO_ERROR);
    
    
      glClearColor(0.2, 0.2, 0.2, 1.0);
      assert(glGetError() == GL_NO_ERROR);
    
    
      static GLfloat const aPositions [] = {
      -0.8f,
      -0.8f,
      0.0f,
      1.0f, 0.0f,
      0.8f,
      0.0f, 1.0f, 0.8f,
      -0.8f,
      0.0f,
      1.0f
      };
      glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 0, aPositions);
      glEnableVertexAttribArray(0);
      assert(glGetError() == GL_NO_ERROR);
    
    
      static GLfloat const aColors [] = {
      1.0f,
      0.0f,
      0.0f,
      1.0f,
      0.0f,
      1.0f,
      0.0f,
      1.0f,
      0.0f,
      0.0f,
      1.0f,
      1.0f
      };
      glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 0, aColors);
      glEnableVertexAttribArray(1);
      assert(glGetError() == GL_NO_ERROR);
    
    
      GLuint texture = 0;
      glGenTextures(1, &texture);
      glActiveTexture(GL_TEXTURE0);
      assert(glGetError() == GL_NO_ERROR);
    
    
      glUniform1i(glGetUniformLocation(program, "uTexture"), 0);
      glUniform1f(glGetUniformLocation(program, "uWidth"), width);
      glUniform1f(glGetUniformLocation(program, "uHeight"), height);
      assert(glGetError() == GL_NO_ERROR);
    
    
      static uint8_t textureData[4 * 4][4] = {
      {0x00, 0x00, 0x00, 0xff},
      {0x00, 0x00, 0x7f, 0xff},
      {0x00, 0x7f, 0x00, 0xff},
      {0x00, 0x7f, 0x7f, 0xff},
    
    
      {0x00, 0xff, 0x00, 0xff},
      {0x00, 0xff, 0x7f, 0xff},
      {0x7f, 0x00, 0x00, 0xff},
      {0x7f, 0x00, 0x7f, 0xff},
    
    
      {0x7f, 0x7f, 0x00, 0xff},
      {0x7f, 0x7f, 0x7f, 0xff},
      {0x7f, 0xff, 0x00, 0xff},
      {0x7f, 0xff, 0x7f, 0xff},
    
    
      {0xff, 0x00, 0x00, 0xff},
      {0xff, 0x00, 0x7f, 0xff},
      {0xff, 0x7f, 0x00, 0xff},
      {0xff, 0x7f, 0x7f, 0xff},
      };
    
    
      fbdev_pixmap pixmap = {};
    
    
      pixmap.height = 4;
      pixmap.width = 4;
      pixmap.bytes_per_pixel = 4;
      pixmap.buffer_size = 32;
      pixmap.red_size = 8;
      pixmap.green_size = 8;
      pixmap.blue_size = 8;
      pixmap.alpha_size = 8;
      pixmap.luminance_size = 0;
      pixmap.flags = static_cast<fbdev_pixmap_flags>(FBDEV_PIXMAP_DEFAULT) /* |
        FBDEV_PIXMAP_EGL_MEMORY | FBDEV_PIXMAP_SUPPORTS_UMP |
                    FBDEV_PIXMAP_ALPHA_FORMAT_PRE | FBDEV_PIXMAP_COLORSPACE_sRGB | FBDEV_PIXMAP_DMA_BUF */;
      pixmap.data = reinterpret_cast<unsigned short*>(textureData);
      pixmap.format = 0;
    
    
      EGLint const imageAttributes [] = {
      EGL_IMAGE_PRESERVED_KHR,
      EGL_TRUE,
      EGL_NONE
      };
      auto image = eglCreateImageKHR(display,
      EGL_NO_CONTEXT,
      EGL_NATIVE_PIXMAP_KHR,
      reinterpret_cast<EGLClientBuffer>(&pixmap),
      imageAttributes);
      assert(eglGetError() == EGL_SUCCESS);
    
    
      glBindTexture(GL_TEXTURE_EXTERNAL_OES, texture);
      glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, image);
      glTexParameterf(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
      glTexParameterf(GL_TEXTURE_EXTERNAL_OES, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
      assert(glGetError() == GL_NO_ERROR);
    
    
      // Doesn't work
      /*
      EGLint bitmapAdress, bitmapPitch;
      eglQuerySurface(display, image, EGL_BITMAP_POINTER_KHR, &bitmapAdress);
      assert(eglGetError() == EGL_SUCCESS);
      eglQuerySurface(display, image, EGL_BITMAP_PITCH_KHR, &bitmapPitch);
      assert(eglGetError() == EGL_SUCCESS);
      printf("Address: %x\nPitch %d\n", bitmapAdress, bitmapPitch);
      */
    
    
      eglSwapInterval(display, 1);
      glViewport(0, 0, width, height);
      assert(glGetError() == GL_NO_ERROR);
    
    
      while (1) {
      glClear(GL_COLOR_BUFFER_BIT);
      glDrawArrays(GL_TRIANGLES, 0, 3);
      assert(glGetError() == GL_NO_ERROR);
    
    
      eglSwapBuffers(display, surface);
      assert(eglGetError() == EGL_SUCCESS);
    
    
      EGLint lockAttribList[] = {
      EGL_LOCK_USAGE_HINT_KHR,
      EGL_WRITE_SURFACE_BIT_KHR,
      EGL_NONE
      };
    
    
      // Doesn't work
      EGLBoolean lockResult = eglLockSurfaceKHR(display, image, lockAttribList);
      assert(eglGetError() == EGL_SUCCESS);
      assert(lockResult);
    
        // update the texture data
      for (auto i = 0; i < 4; ++i) {
      for (auto x = 0; x < 4; ++x) {
      for (auto y = 0; y < 4; ++y) {
      textureData[4 * x + y][i]++;
      }
      }
      }
    
    
      EGLBoolean unlockResult = eglUnlockSurfaceKHR(display, image);
      assert(eglGetError() == EGL_SUCCESS);
      assert(unlockResult);
    
    
      assert(glGetError() == GL_NO_ERROR);
      glBindTexture(GL_TEXTURE_EXTERNAL_OES, 0);
      glBindTexture(GL_TEXTURE_EXTERNAL_OES, texture);
      assert(glGetError() == GL_NO_ERROR);
      assert(eglGetError() == EGL_SUCCESS);
      //glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, 0);
      assert(glGetError() == GL_NO_ERROR);
      assert(eglGetError() == EGL_SUCCESS);
      glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, image);
      assert(glGetError() == GL_NO_ERROR);
      assert(eglGetError() == EGL_SUCCESS);
    
    
      }
    
    
      eglDestroyImageKHR(display, image);
    
    
      return 0;
    }
    
  • It's a bit crazy that nobody ever does this kind on thing on Linux.

    Well, I'll just give up and revert to the slow as hell gxTexImage2D.

    (and thanks for at least trying, Wasim ! Cheers)

  • Apologies for late reply. I have been away for the last 2 weeks. I a still looking at this I need some help from our driver engineers. As soon as I hear back I will update you.

  • Hi Tramboi,

    I am Wasim's colleague from driver team. I saw you mentioned it doesn't work when using eglLockSurfaceKHR. Do you mean there is an EGL error reported?

  • Sorry for not answering sooner but the project is shipping and I had to abandon experimentation for safe stuff.

    FYI,we decided a one frame latency so that we can asynchronously glTexSubImage2D image n-1 when computing image n with a pthread. It doesn' give good parallelism (because it both needs massive CPU processing and GL context access) but gave us big benefits.

    More specific stuff in a private message to you

  • Seems I can't PM you. Whatever. Here goes to the public place.

    _mali_convert_tex32_l_to_tex32_b would really need a bit of effort. Just replacing the swizzle tables with NEON code gave us 20% win for in-cache processing. And the original implementation doesn't output pixels in a linear way. I don't know much about write-combining for ARM SOCs (maybe there is documentation somewhere, next to the EGL sample ) but it seems that it's a bit risky regarding throughput.

  • Hi tramboi, Direct Messages can only be sent to users that are following you by policy. This is by design to prevent spam, unsolicited messages, etcc