Mali-G71 GPU glReadPixels slow compared to linux desktop equivilant for Default Framebuffer

this question is very similar to https://community.arm.com/developer/tools-software/graphics/f/discussions/43630/mali-400-mp2-glreadpixels-alternatives however my usage case is for a cross-platform, user-space compositing window manager via ipc shared memory

do note that on Linux desktop, EGL and GLESv2 is used, while on Android, EGL and GLESv3 is used

add_subdirectory(testBuilder)

testBuilder_add_source(GLIS src/glis/backup/backup.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/framebuffer.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/renderbuffer.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/texture.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/program.cpp)
testBuilder_add_source(GLIS src/glis/compositor/compositor.cpp)
testBuilder_add_source(GLIS src/glis/font/font.cpp)
testBuilder_add_source(GLIS src/glis/internal/fps.cpp)
testBuilder_add_source(GLIS src/glis/internal/internal.cpp)
testBuilder_add_source(GLIS src/glis/internal/log.cpp)
testBuilder_add_source(GLIS src/glis/ipc/ashmem.cpp)
testBuilder_add_source(GLIS src/glis/ipc/ipc.cpp)
testBuilder_add_source(GLIS src/glis/ipc/serializer.cpp)
testBuilder_add_source(GLIS src/glis/ipc/server_core.cpp)
testBuilder_add_source(GLIS src/glis/ipc/shm.cpp)

if (ANDROID)
testBuilder_add_library(GLIS GLESv3)
testBuilder_add_library(GLIS android)
testBuilder_add_library(GLIS log)
elseif(UNIX)
testBuilder_add_source(GLIS src/glis/internal/xdg-shell-protocol.c)
testBuilder_add_library(GLIS GLESv2)
testBuilder_add_library(GLIS pthread)
testBuilder_add_library(GLIS X11)
testBuilder_add_library(GLIS wayland-client)
testBuilder_add_library(GLIS wayland-egl)
endif()

testBuilder_add_library(GLIS Magnum::Magnum)
testBuilder_add_library(GLIS Magnum::GL)
testBuilder_add_library(GLIS freetype)
testBuilder_add_library(GLIS glm)
testBuilder_add_library(GLIS EGL)
testBuilder_add_library(GLIS WinKernel)
testBuilder_build_shared_library(GLIS)

in linux (Ubuntu 20.4), this usually takes
for the Default Framebuffer, around 20ms to 30ms
for an FBO+Texture, around 10ms to 14 ms

in Android, this usually takes
for the Default Framebuffer i get around 500ms to 700ms (averaging between 590ms and 610ms)
for FBO+texture, around 9ms to 14ms

this performance on desktop is reasonable i think (tho i assume i would get around 10 to 5 fps with this as i am only testing single one off transfers and not, for example, a constant transfer such as rendering a rotating cube)

how could i improve pixel transfer from the Default Framebuffer?

this is my code for texture transfer

void GLIS::GLIS_upload_texture(GLIS_CLASS &GLIS, size_t &window_id, GLuint &texture_id,
GLint texture_width, GLint texture_height) {
LOG_INFO("uploading texture");
GLIS_SwapBuffers(GLIS);
GLIS_Sync_GPU();
GLIS_INTERNAL_SHARED_MEMORY.slot.command.store_int8_t(GLIS_SERVER_COMMANDS.texture);
GLIS_INTERNAL_SHARED_MEMORY.slot.additional_data_0.type_int64_t.store_int64_t(texture_width);
GLIS_INTERNAL_SHARED_MEMORY.slot.additional_data_1.type_int64_t.store_int64_t(texture_height);
GLIS_INTERNAL_SHARED_MEMORY.slot.additional_data_2.type_size_t.store_size_t(window_id);
auto s2 = now_ms();
glReadPixels(0, 0, texture_width, texture_height, GL_RGBA, GL_UNSIGNED_BYTE, GLIS_INTERNAL_SHARED_MEMORY.slot.texture.load_ptr());
auto e2 = now_ms();
LOG_INFO("glReadPixels completed in %ld milliseconds", e2-s2);
GLIS_error_to_string_GL("glReadPixels");
GLIS_sync_server("GLIS_upload_texture", window_id);
LOG_INFO("uploaded texture");
}

and this is a simple client application that uses the default framebuffer for rendering

//
// Created by konek on 8/18/2019.
//

#include <glis/glis.hpp>

GLIS_CLASS G;

GLIS glis;

int main() {
int W = GLIS_COMMON_WIDTH;
int H = GLIS_COMMON_HEIGHT;
if (glis.GLIS_setupOffScreenRendering(G, W, H)) {
glClearColor(1.0f, 0.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
LOG_INFO("creating window %d", 0);
size_t win_id1 = glis.GLIS_new_window(0, 0, W, H);
LOG_INFO("window id: %zu", win_id1);
SERVER_LOG_TRANSFER_INFO = true;
glis.GLIS_upload_texture(G, win_id1, W, H);
LOG_INFO("created window %d", 0);
LOG_INFO("Cleaning up");
glis.GLIS_destroy_GLIS(G);
LOG_INFO("Destroyed sub Compositor GLIS");
LOG_INFO("Cleaned up");
}
return 0;
}



and this is the log of my application output

smallville7123@smallville7123-MacBookPro:~/AndroidCompositor$ ./test_GLIS.sh DefaultFramebuffer
SERVER: SOCKET_SERVER : name was not supplied, conflicts are likely to happen
requesting 8778472 bytes (8.37 Megabytes) of memory
creating a new memfd
fd 5 is valid and should be able to be opened
region created with 8778472 size
CLIENT: SOCKET_SERVER : connecting to server
CLIENT: SOCKET_SERVER : connected to server
CLIENT: SOCKET_SERVER : writing message to fd 6
CLIENT: SOCKET_SERVER : send 8/8 size
CLIENT: SOCKET_SERVER : Wrote 8 bytes of data in 0 milliseconds (Total sent: 8 bytes of data)
CLIENT: SOCKET_SERVER : writing message to fd 6
CLIENT: SOCKET_SERVER : send 73/73 size
CLIENT: SOCKET_SERVER : Wrote 73 bytes of data in 0 milliseconds (Total sent: 81 bytes of data)
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : recv 8/8 size
CLIENT: SOCKET_SERVER : Obtained 8 bytes of data in 8 milliseconds (Total obtained: 89 bytes of data)
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : recv 126/126 size
CLIENT: SOCKET_SERVER : Obtained 126 bytes of data in 1 milliseconds (Total obtained: 215 bytes of data)
CLIENT: SOCKET_SERVER : closing connection to server
CLIENT: SOCKET_SERVER : closed connection to server
CLIENT: SOCKET_SERVER : Returning response
CLIENT: SOCKET_SERVER : setting name to 0
CLIENT: 0 : set name to 0
connecting to keep alive server
CLIENT: 0 : connecting to server
CLIENT: 0 : connected to server
waiting for GLIS_INIT_SHARED_MEMORY (for window id 18446744073709551615) to complete on server side
GLIS_INIT_SHARED_MEMORY (for window id 18446744073709551615) has completed on server side
Initializing
Initializing display
EGL initialized with version 1.4
EGL_CLIENT_APIS: OpenGL_ES
EGL_VENDOR: Android
EGL_VERSION: 1.4 Android META-EGL
EGL_EXTENSIONS: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_ANDROID_get_native_client_buffer EGL_ANDROID_front_buffer_auto_refresh EGL_ANDROID_get_frame_timestamps EGL_EXT_surface_SMPTE2086_metadata EGL_EXT_surface_CTA861_3_metadata EGL_KHR_image EGL_KHR_image_base EGL_EXT_image_gl_colorspace EGL_KHR_gl_colorspace EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_config_attribs EGL_KHR_surfaceless_context EGL_EXT_create_context_robustness EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable EGL_KHR_partial_update EGL_EXT_pixel_format_float EGL_KHR_mutable_render_buffer EGL_EXT_protected_content EGL_IMG_context_priority EGL_KHR_no_config_context
Initialized display
Debug mode enabled
Initializing configuration
Initialized configuration
Initializing surface
creating pixel buffer surface
Initialized surface
Initializing context
Initialized context
Switching to context
GL_VENDOR: ARM
GL_RENDERER: Mali-G71
GL_VERSION: OpenGL ES 3.2 v1.r16p0-01rel0.###other-sha0123456789ABCDEF0###
GL_SHADING_LANGUAGE_VERSION: OpenGL ES GLSL ES 3.20
GL_EXTENSIONS: GL_EXT_debug_marker GL_ARM_rgba8 GL_ARM_mali_shader_binary GL_OES_depth24 GL_OES_depth_texture GL_OES_depth_texture_cube_map GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_EXT_read_format_bgra GL_OES_compressed_paletted_texture GL_OES_compressed_ETC1_RGB8_texture GL_OES_standard_derivatives GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_image_external_essl3 GL_OES_EGL_sync GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_required_internalformat GL_OES_vertex_array_object GL_OES_mapbuffer GL_EXT_texture_format_BGRA8888 GL_EXT_texture_rg GL_EXT_texture_type_2_10_10_10_REV GL_OES_fbo_render_mipmap GL_OES_element_index_uint GL_EXT_shadow_samplers GL_OES_texture_compression_astc GL_KHR_texture_compression_astc_ldr GL_KHR_texture_compression_astc_hdr GL_KHR_texture_compression_astc_sliced_3d GL_KHR_debug GL_EXT_occlusion_query_boolean GL_EXT_disjoint_timer_query GL_EXT_blend_minmax GL_EXT_discard_framebuffer GL_OES_get_program_binary GL_OES_texture_3D GL_EXT_texture_storage GL_EXT_multisampled_render_to_texture GL_EXT_multisampled_render_to_texture2 GL_OES_surfaceless_context GL_OES_texture_stencil8 GL_EXT_shader_pixel_local_storage GL_ARM_shader_framebuffer_fetch GL_ARM_shader_framebuffer_fetch_depth_stencil GL_ARM_mali_program_binary GL_EXT_sRGB GL_EXT_sRGB_write_control GL_EXT_texture_sRGB_decode GL_EXT_texture_sRGB_R8 GL_EXT_texture_sRGB_RG8 GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent GL_OES_texture_storage_multisample_2d_array GL_OES_shader_image_atomic GL_EXT_robustness GL_EXT_draw_buffers_indexed GL_OES_draw_buffers_indexed GL_EXT_texture_border_clamp GL_OES_texture_border_clamp GL_EXT_texture_cube_map_array GL_OES_texture_cube_map_array GL_OES_sample_variables GL_OES_sample_shading GL_OES_shader_multisample_interpolation GL_EXT_shader_io_blocks GL_OES_shader_io_blocks GL_EXT_tessellation_shader GL_OES_tessellation_shader GL_EXT_primitive_bounding_box GL_OES_primitive_bounding_box GL_EXT_geometry_shader GL_OES_geometry_shader GL_ANDROID_extension_pack_es31a GL_EXT_gpu_shader5 GL_OES_gpu_shader5 GL_EXT_texture_buffer GL_OES_texture_buffer GL_EXT_copy_image GL_OES_copy_image GL_EXT_shader_non_constant_global_initializers GL_EXT_color_buffer_half_float GL_EXT_color_buffer_float GL_EXT_YUV_target GL_OVR_multiview GL_OVR_multiview2 GL_OVR_multiview_multisampled_render_to_texture GL_KHR_robustness GL_KHR_robust_buffer_access_behavior GL_EXT_draw_elements_base_vertex GL_OES_draw_elements_base_vertex GL_EXT_protected_textures GL_EXT_buffer_storage GL_EXT_external_buffer GL_EXT_EGL_image_array
Switched to context
Enabling debug callbacks
Enabled debug callbacks
Obtaining surface width and height
Obtained surface width and height
Initialized
creating window 0
waiting for GLIS_new_window (for window id 18446744073709551615) to complete on server side
GLIS_new_window (for window id 18446744073709551615) has completed on server side
window id: 3
uploading texture
glReadPixels completed in 631 milliseconds
waiting for GLIS_upload_texture (for window id 3) to complete on server side
uploaded texture
created window 0
Cleaning up
Uninitializing
Disabling debug callbacks
Disabled debug callbacks
Switching context to no context
Uninitializing context
GLIS_upload_texture (for window id 3) has completed on server side
Uninitializing surface
Uninitializing display
Setting display to no display
Uninitialized
Destroyed sub Compositor GLIS
Cleaned up
smallville7123@smallville7123-MacBookPro:~/AndroidCompositor$

Parents
  • I think it's going to be hard to provide any specific recommendations here without more information and/or a reproducer. Something is definitely "wrong" with a glReadPixels call taking almost a second.

    * What platform are you running on?

    * Have you tried profiling with something like our Streamline profiler to see where the time is going (CPU, GPU, thread synchronization?).


    Kind regards, 
    Pete

Reply
  • I think it's going to be hard to provide any specific recommendations here without more information and/or a reproducer. Something is definitely "wrong" with a glReadPixels call taking almost a second.

    * What platform are you running on?

    * Have you tried profiling with something like our Streamline profiler to see where the time is going (CPU, GPU, thread synchronization?).


    Kind regards, 
    Pete

Children
No data
More questions in this forum