We are currently migrating an embedded application from a Mali 400MP2 Utgard platform to one with a Mali T720 Midgard GPU. The application uses the following (probably fairly common) mali_egl_image* code to achieve zero-copy update of a texture:
EGLImageKHR eglImage = eglCreateImageKHR( display, EGL_NO_CONTEXT, EGL_NATIVE_PIXMAP_KHR, (EGLClientBuffer)(&fbPixMap), NULL );glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, eglImage);......mali_egl_image *mimg = mali_egl_image_lock_ptr( eglImage );unsigned char *buffer = mali_egl_image_map_buffer( mimg, attribs_rgb );// update buffer heremali_egl_image_unmap_buffer( mimg, attribs_rgb );mali_egl_image_unlock_ptr( eglImage );
These mali_egl_image_* functions do not appear to be available in the mali_midgard driver we received from our chip vendor.
Our application is written in C with, apart from the above, standard openGL ES2 calls.
What would be the equivalent approach for updating a texture directly (ie not using glTexSubImage2D() ) with the T720 Midgard driver? Thankfully the above code exists in a single function and called from many places, so ideally a direct replacement would be fantastic!
It appears you found an Utgard extension to speed up your image access that was never publicly released! So it is probably distinctly uncommon. It took a little tracking down, but I found someone who remembers its implementation.
The extension was to cover issues in Utgard, of a slow bus and slow memcpy translation to GPU memory formats, that have long been solved. This means that most options for fixes that will work in Midgard will be faster.
EGL Pixmaps are now very out of date, and indeed pixmaps are to be avoided if possible - better to use a more GPU friendly format. So rather than importing a pixmap surface handle /creating a pixmap surface and then promoting it to a GLES texture through the EGL Image interface, you're best either (depending on what the app is doing) replacing the EGL Pixmap/EGL Image setup using a normal texture or using eglCreateWindowSurface to use a window surface instead of a pixmap surface.
Then replace the Mali EGL Image interface block with glTexSubImage2D. More specifically - if you don't even want direct access but just want to modify the image without reallocating it, glTexSubImage is best. If you don't even care if it's reallocated and want to replace all the image data, glTexImage is even faster as you avoid any existing data dependencies.
The Mali EGL Image interface does have some locking scheme - it is possible you may need some EGL fences if this is relied upon.
Hope this solves your issue, it's been a very interesting history lesson finding this out!
Thanks for the information. It would be preferable to use vanilla glTex* operations from a support, understanding and compatibility view.
If I recall correctly, our Utgard observations gave us the impression that the image was held in an intermediate "queue" before eventually landing in the GPU texture. This unpredictability and slow transfer lead to the direct Pixmap solution, so the app knew when it was safe to reuse the source image buffer and could achieve a 60fps frame rate.
It is important for the app to know immediately that the image transfer is complete. Can we rely on the assumption that once the glTex[Sub]Image has returned, the image will have been converted (if necessary, RGB565 to RGBA8888 for example) and safely written into the GPU texture?
I'm not sure if the EGL locking contributed to the performance and stability of the app, I guess we'll see!
As we want the fastest transfer we'll switch over to glTexImage, with a healthy bit of profiling so we can get a handle on the performance.
When glTexImage returns it probably hasn't finished, but unless you're doing some very clever multi-context work, the GL driver will make sure it has before your next use of it, so you can act like it has.
We have a full-frame image generation thread running at full throttle that only wants to wait for the display thread to take each new image from it, and for that to complete in the shortest time possible. The display thread renders each new image full-frame while the generator builds the next, so isn't doing too much. Each new image is generated by applying deltas to the previous image, so the generator cannot flip between two buffers, which would have solved this timing issue!
What we achieved with the Utgard zero-copy functions was a single copy into the texture that occurred with predictable measurable time, and thus minimal interruption of the generation thread.
Based on what you've said, I wonder if we could either call glFinish straight after the glTexImage2D, or make a full copy of the image data before calling glTexImage2D, so the generator can get on with producing the next frame as soon as the copy is done. The latter would give the completion time predictability we need, but increase CPU load presumably.....
Understanding a little of what glTexImage2D is doing would help - is the data transfer within glTexImage2D() CPU bound (ie the driver does the transfer into texture memory using ARM/Neon instructions) or via a GPU/DMA transfer?
glFinish will block until it is complete, so that will work.
For further info: the GPU prefers images in "cache optimal" tiling, which dramatically speeds up GPU sampling. But when we import an image in GLES we use whatever tiling the image was created with - which may well be linear tiling. (If we create the image, we use "cache optimal").
When glTexImage2D is called the image will be converted from the linear input data to whatever the image uses. If this is also linear, it will be a simple memcpy that will return by the time glTexImage2D returns. If it's cache optimal then that conversion can take time and will be deferred, ie not done by the time glTexImage returns.
Conversion usually happens on GPU, but can be CPU depending on image size / format / GPU version. It's generally like a GPU render from the linear input to the cache optimal stored image.
Hope that helps,
Thanks Ben, that's really helpful.
Our source images are raster (either RGB565 and RGBA8888) and we are now "uploading" to the GPU image/texture using the familiar:
glTexImage2D( GL_TEXTURE_2D, 0, GL_RGB, 800, 600, 0, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, image )
In the previous Utgard version we copied and converted the RGB565 image straight into the mapped Mali texture as BGRA888 using an optimised Neon function, which was our only option really. Another part of the application (the UI) updates parts of textures, and now uses glTexSubImage2D where it too previously used the direct Mali texture mapping trick.
From what you have said, I wonder if we are falling foul of some inefficiencies. Does letting the Mali infrastructure copy and convert the image result in better and/or more optimal performance (resulting in textures stored as "cache optimal" perhaps)?
We want the GPU workload to be efficient, but at the same time require the source image to be copied for the render thread as fast as possible. I assume there will need to be a balance.
Yes I guess you've got a decision on which is the most important - the fast image copy, or fast image access thereafter. Keeping it linear will be very fast to copy (and it is what your Utgard version did if you want consistency), but you now have that potential much faster access if you create/convert the image as "cache optimal".
A colleague has pointed out that using glSync between your 2 threads reading from and writing to the image will be better than a full glFinish.
Fantastic. We'll look into glSync.
So that we can test and benchmark each image variant, and know what we're testing, could you confirm my understanding? :
If, as in out previous Utgard version, we create the image with eglCreateImageKHR() with EGL_NATIVE_PIXMAP_KHR and glEGLImageTargetTexture2DOES(), we will get a linear image?
If, as we're doing right now in the Midgard version, we create the image using glTexImage2D, we will get a "cache optimised" image?
If I've got that right, then what happens if we issue a glTexImage2D to replace the contents of an eglCreateImageKHR created image? Does it discard all of internal image attributes (such as the fact it is linear) and create a new "cache optimised" image, or will it merely reallocate the storage keeping the attributes and "linear" layout?
Thanks for taking the time to help us out with this - it is really important for us to understand, besides being very interesting.
I've clarified with the driver team, and glTexImage2D counts as a create rather than an import, so yes, it will change to "cache optimal" every time. glTexSubImage2D will not change the tiling, so if you want to keep linear you would need to use that.
As to how to get linear - if the allocator is the DDK it will be cache optimal. If the allocator is external and the image is imported it will be whatever the external allocator uses. For example, Android Gralloc will always allocate images using linear tiling whenever the image is host visible.
If you do it the old way with pixmaps it will end up linear as I understand it, yes.
Ben, that's all been tremendously helpful. Thank you.
View all questions in Graphics and Gaming forum