Mali-G71 GPU glReadPixels slow compared to linux desktop equivilant for Default Framebuffer

this question is very similar to https://community.arm.com/developer/tools-software/graphics/f/discussions/43630/mali-400-mp2-glreadpixels-alternatives however my usage case is for a cross-platform, user-space compositing window manager via ipc shared memory

do note that on Linux desktop, EGL and GLESv2 is used, while on Android, EGL and GLESv3 is used

add_subdirectory(testBuilder)

testBuilder_add_source(GLIS src/glis/backup/backup.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/framebuffer.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/renderbuffer.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/texture.cpp)
testBuilder_add_source(GLIS src/glis/backup/types/program.cpp)
testBuilder_add_source(GLIS src/glis/compositor/compositor.cpp)
testBuilder_add_source(GLIS src/glis/font/font.cpp)
testBuilder_add_source(GLIS src/glis/internal/fps.cpp)
testBuilder_add_source(GLIS src/glis/internal/internal.cpp)
testBuilder_add_source(GLIS src/glis/internal/log.cpp)
testBuilder_add_source(GLIS src/glis/ipc/ashmem.cpp)
testBuilder_add_source(GLIS src/glis/ipc/ipc.cpp)
testBuilder_add_source(GLIS src/glis/ipc/serializer.cpp)
testBuilder_add_source(GLIS src/glis/ipc/server_core.cpp)
testBuilder_add_source(GLIS src/glis/ipc/shm.cpp)

if (ANDROID)
testBuilder_add_library(GLIS GLESv3)
testBuilder_add_library(GLIS android)
testBuilder_add_library(GLIS log)
elseif(UNIX)
testBuilder_add_source(GLIS src/glis/internal/xdg-shell-protocol.c)
testBuilder_add_library(GLIS GLESv2)
testBuilder_add_library(GLIS pthread)
testBuilder_add_library(GLIS X11)
testBuilder_add_library(GLIS wayland-client)
testBuilder_add_library(GLIS wayland-egl)
endif()

testBuilder_add_library(GLIS Magnum::Magnum)
testBuilder_add_library(GLIS Magnum::GL)
testBuilder_add_library(GLIS freetype)
testBuilder_add_library(GLIS glm)
testBuilder_add_library(GLIS EGL)
testBuilder_add_library(GLIS WinKernel)
testBuilder_build_shared_library(GLIS)

in linux (Ubuntu 20.4), this usually takes
for the Default Framebuffer, around 20ms to 30ms
for an FBO+Texture, around 10ms to 14 ms

in Android, this usually takes
for the Default Framebuffer i get around 500ms to 700ms (averaging between 590ms and 610ms)
for FBO+texture, around 9ms to 14ms

this performance on desktop is reasonable i think (tho i assume i would get around 10 to 5 fps with this as i am only testing single one off transfers and not, for example, a constant transfer such as rendering a rotating cube)

how could i improve pixel transfer from the Default Framebuffer?

this is my code for texture transfer

void GLIS::GLIS_upload_texture(GLIS_CLASS &GLIS, size_t &window_id, GLuint &texture_id,
GLint texture_width, GLint texture_height) {
LOG_INFO("uploading texture");
GLIS_SwapBuffers(GLIS);
GLIS_Sync_GPU();
GLIS_INTERNAL_SHARED_MEMORY.slot.command.store_int8_t(GLIS_SERVER_COMMANDS.texture);
GLIS_INTERNAL_SHARED_MEMORY.slot.additional_data_0.type_int64_t.store_int64_t(texture_width);
GLIS_INTERNAL_SHARED_MEMORY.slot.additional_data_1.type_int64_t.store_int64_t(texture_height);
GLIS_INTERNAL_SHARED_MEMORY.slot.additional_data_2.type_size_t.store_size_t(window_id);
auto s2 = now_ms();
glReadPixels(0, 0, texture_width, texture_height, GL_RGBA, GL_UNSIGNED_BYTE, GLIS_INTERNAL_SHARED_MEMORY.slot.texture.load_ptr());
auto e2 = now_ms();
LOG_INFO("glReadPixels completed in %ld milliseconds", e2-s2);
GLIS_error_to_string_GL("glReadPixels");
GLIS_sync_server("GLIS_upload_texture", window_id);
LOG_INFO("uploaded texture");
}

and this is a simple client application that uses the default framebuffer for rendering

//
// Created by konek on 8/18/2019.
//

#include <glis/glis.hpp>

GLIS_CLASS G;

GLIS glis;

int main() {
int W = GLIS_COMMON_WIDTH;
int H = GLIS_COMMON_HEIGHT;
if (glis.GLIS_setupOffScreenRendering(G, W, H)) {
glClearColor(1.0f, 0.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
LOG_INFO("creating window %d", 0);
size_t win_id1 = glis.GLIS_new_window(0, 0, W, H);
LOG_INFO("window id: %zu", win_id1);
SERVER_LOG_TRANSFER_INFO = true;
glis.GLIS_upload_texture(G, win_id1, W, H);
LOG_INFO("created window %d", 0);
LOG_INFO("Cleaning up");
glis.GLIS_destroy_GLIS(G);
LOG_INFO("Destroyed sub Compositor GLIS");
LOG_INFO("Cleaned up");
}
return 0;
}



and this is the log of my application output

smallville7123@smallville7123-MacBookPro:~/AndroidCompositor$ ./test_GLIS.sh DefaultFramebuffer
SERVER: SOCKET_SERVER : name was not supplied, conflicts are likely to happen
requesting 8778472 bytes (8.37 Megabytes) of memory
creating a new memfd
fd 5 is valid and should be able to be opened
region created with 8778472 size
CLIENT: SOCKET_SERVER : connecting to server
CLIENT: SOCKET_SERVER : connected to server
CLIENT: SOCKET_SERVER : writing message to fd 6
CLIENT: SOCKET_SERVER : send 8/8 size
CLIENT: SOCKET_SERVER : Wrote 8 bytes of data in 0 milliseconds (Total sent: 8 bytes of data)
CLIENT: SOCKET_SERVER : writing message to fd 6
CLIENT: SOCKET_SERVER : send 73/73 size
CLIENT: SOCKET_SERVER : Wrote 73 bytes of data in 0 milliseconds (Total sent: 81 bytes of data)
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : recv 8/8 size
CLIENT: SOCKET_SERVER : Obtained 8 bytes of data in 8 milliseconds (Total obtained: 89 bytes of data)
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : reading message from fd 6
CLIENT: SOCKET_SERVER : recv 126/126 size
CLIENT: SOCKET_SERVER : Obtained 126 bytes of data in 1 milliseconds (Total obtained: 215 bytes of data)
CLIENT: SOCKET_SERVER : closing connection to server
CLIENT: SOCKET_SERVER : closed connection to server
CLIENT: SOCKET_SERVER : Returning response
CLIENT: SOCKET_SERVER : setting name to 0
CLIENT: 0 : set name to 0
connecting to keep alive server
CLIENT: 0 : connecting to server
CLIENT: 0 : connected to server
waiting for GLIS_INIT_SHARED_MEMORY (for window id 18446744073709551615) to complete on server side
GLIS_INIT_SHARED_MEMORY (for window id 18446744073709551615) has completed on server side
Initializing
Initializing display
EGL initialized with version 1.4
EGL_CLIENT_APIS: OpenGL_ES
EGL_VENDOR: Android
EGL_VERSION: 1.4 Android META-EGL
EGL_EXTENSIONS: EGL_KHR_get_all_proc_addresses EGL_ANDROID_presentation_time EGL_KHR_swap_buffers_with_damage EGL_ANDROID_get_native_client_buffer EGL_ANDROID_front_buffer_auto_refresh EGL_ANDROID_get_frame_timestamps EGL_EXT_surface_SMPTE2086_metadata EGL_EXT_surface_CTA861_3_metadata EGL_KHR_image EGL_KHR_image_base EGL_EXT_image_gl_colorspace EGL_KHR_gl_colorspace EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_KHR_create_context EGL_KHR_config_attribs EGL_KHR_surfaceless_context EGL_EXT_create_context_robustness EGL_ANDROID_image_native_buffer EGL_KHR_wait_sync EGL_ANDROID_recordable EGL_KHR_partial_update EGL_EXT_pixel_format_float EGL_KHR_mutable_render_buffer EGL_EXT_protected_content EGL_IMG_context_priority EGL_KHR_no_config_context
Initialized display
Debug mode enabled
Initializing configuration
Initialized configuration
Initializing surface
creating pixel buffer surface
Initialized surface
Initializing context
Initialized context
Switching to context
GL_VENDOR: ARM
GL_RENDERER: Mali-G71
GL_VERSION: OpenGL ES 3.2 v1.r16p0-01rel0.###other-sha0123456789ABCDEF0###
GL_SHADING_LANGUAGE_VERSION: OpenGL ES GLSL ES 3.20
GL_EXTENSIONS: GL_EXT_debug_marker GL_ARM_rgba8 GL_ARM_mali_shader_binary GL_OES_depth24 GL_OES_depth_texture GL_OES_depth_texture_cube_map GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_EXT_read_format_bgra GL_OES_compressed_paletted_texture GL_OES_compressed_ETC1_RGB8_texture GL_OES_standard_derivatives GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_image_external_essl3 GL_OES_EGL_sync GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_required_internalformat GL_OES_vertex_array_object GL_OES_mapbuffer GL_EXT_texture_format_BGRA8888 GL_EXT_texture_rg GL_EXT_texture_type_2_10_10_10_REV GL_OES_fbo_render_mipmap GL_OES_element_index_uint GL_EXT_shadow_samplers GL_OES_texture_compression_astc GL_KHR_texture_compression_astc_ldr GL_KHR_texture_compression_astc_hdr GL_KHR_texture_compression_astc_sliced_3d GL_KHR_debug GL_EXT_occlusion_query_boolean GL_EXT_disjoint_timer_query GL_EXT_blend_minmax GL_EXT_discard_framebuffer GL_OES_get_program_binary GL_OES_texture_3D GL_EXT_texture_storage GL_EXT_multisampled_render_to_texture GL_EXT_multisampled_render_to_texture2 GL_OES_surfaceless_context GL_OES_texture_stencil8 GL_EXT_shader_pixel_local_storage GL_ARM_shader_framebuffer_fetch GL_ARM_shader_framebuffer_fetch_depth_stencil GL_ARM_mali_program_binary GL_EXT_sRGB GL_EXT_sRGB_write_control GL_EXT_texture_sRGB_decode GL_EXT_texture_sRGB_R8 GL_EXT_texture_sRGB_RG8 GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent GL_OES_texture_storage_multisample_2d_array GL_OES_shader_image_atomic GL_EXT_robustness GL_EXT_draw_buffers_indexed GL_OES_draw_buffers_indexed GL_EXT_texture_border_clamp GL_OES_texture_border_clamp GL_EXT_texture_cube_map_array GL_OES_texture_cube_map_array GL_OES_sample_variables GL_OES_sample_shading GL_OES_shader_multisample_interpolation GL_EXT_shader_io_blocks GL_OES_shader_io_blocks GL_EXT_tessellation_shader GL_OES_tessellation_shader GL_EXT_primitive_bounding_box GL_OES_primitive_bounding_box GL_EXT_geometry_shader GL_OES_geometry_shader GL_ANDROID_extension_pack_es31a GL_EXT_gpu_shader5 GL_OES_gpu_shader5 GL_EXT_texture_buffer GL_OES_texture_buffer GL_EXT_copy_image GL_OES_copy_image GL_EXT_shader_non_constant_global_initializers GL_EXT_color_buffer_half_float GL_EXT_color_buffer_float GL_EXT_YUV_target GL_OVR_multiview GL_OVR_multiview2 GL_OVR_multiview_multisampled_render_to_texture GL_KHR_robustness GL_KHR_robust_buffer_access_behavior GL_EXT_draw_elements_base_vertex GL_OES_draw_elements_base_vertex GL_EXT_protected_textures GL_EXT_buffer_storage GL_EXT_external_buffer GL_EXT_EGL_image_array
Switched to context
Enabling debug callbacks
Enabled debug callbacks
Obtaining surface width and height
Obtained surface width and height
Initialized
creating window 0
waiting for GLIS_new_window (for window id 18446744073709551615) to complete on server side
GLIS_new_window (for window id 18446744073709551615) has completed on server side
window id: 3
uploading texture
glReadPixels completed in 631 milliseconds
waiting for GLIS_upload_texture (for window id 3) to complete on server side
uploaded texture
created window 0
Cleaning up
Uninitializing
Disabling debug callbacks
Disabled debug callbacks
Switching context to no context
Uninitializing context
GLIS_upload_texture (for window id 3) has completed on server side
Uninitializing surface
Uninitializing display
Setting display to no display
Uninitialized
Destroyed sub Compositor GLIS
Cleaned up
smallville7123@smallville7123-MacBookPro:~/AndroidCompositor$

More questions in this forum