Hi,
We are working with Mali-400 driver r3p2-01rel0 on Exynos4412, under Linux/X11.
base: BUILD=RELEASE ARCH=arch_011_udd PLATFORM=default_7a TRACE=0 THREAD= GEOM= CORES=MALI400 USING_MALI400=1 TARGET_CORE_REVISION=0x0101 TOPLEVEL_REPO_URL=Linux-r3p2-01rel0 REVISION=Linux-r3p2-01rel0 CHANGED_REVISION=Linux-r3p2-01rel0 REPO_URL=Linux-r3p2-01rel0 BUILD_DATE=Fri Jan 11 14:58:31 UTC 2013 CHANGE_DATE=Linux-r3p2-01rel0 TARGET_TOOLCHAIN=gcc HOST_TOOLCHAIN=gcc TARGET_TOOLCHAIN_VERSION=gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) HOST_TOOLCHAIN_VERSION=gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) TARGET_SYSTEM=gcc-arm-linux-gnueabihf HOST_SYSTEM=gcc-arm-linux-gnueabihf CPPFLAGS= CUSTOMER=internal VARIANT=mali400-r3p2-gles11-gles20-linux-ump-x11 HOSTLIB=direct INSTRUMENTED=FALSE USING_MRI=FALSE MALI_TEST_API= UDD_OS=linux
The Mali README explains that Mali must be integrated with the display controller driver of the host system. We're trying to do just that. In this case, the display driver is exynos-drm, which uses DRI2. We require this over fbdev for the ability to change resolutions dynamically (via KMS), for perfect vblank synchronization, and to reduce the amount of CPU copying in order to get GPU rendering results on the screen.
I am starting this with the xf86-video-armsoc driver (which is authored by ARM) and I integrate it with Mali as follows: for each new GEM buffer created, I obtain a UMP secure ID for that memory and store it in the DRI2 buffer name for that allocation. This should be all that is needed, but unfortunately Mali does not seem to adhere to the basic DRI2 standards, which means that this doesn't work. The 2 main problems are:
For the first problem, a double-buffered DRI2 rendering client should always call DRI2GetBuffers in order to get the back buffer before starting to draw. While I can see that the command sequence is often GetBuffers SwapBuffers GetBuffers SwapBuffers... I also often see cases where it does GetBuffers SwapBuffers SwapBuffers SwapBuffers... This confuses the buffer reuse logic in the DRI2 implementation in the X server and results in the client and server disagreeing about which buffers are front and back at a given time.
For the second issue, I confirmed the problem by checksumming the buffers at different points. The old front buffer is reused as soon as the X driver's ScheduleSwap() function returns, which does not indicate that the swap has completed. The buffer is still on the screen for a while longer. But Mali draws to it right away resulting in a nasty visual glitch, corrected momentarily after when the swap completes.
Mali seems to have a bit of a fundamental misunderstanding with SwapBuffers. I already saw in Re: Mali deadlock with X server grab that Mali appears to create a dedicated thread in order to call SwapBuffers, which seems bizarre as SwapBuffers is an asynchronous operation (with completion later notified by the BufferSwapComplete event). But from this and the behaviour observed above, I guess Mali developers have misunderstood and implemented it as purely synchronous - i.e. it is a blocking function, and after it returns, the old front buffer is available for immediate reuse. I also see this comment in xf86-video-mali:
/*
* MaliDRI2ScheduleSwap is the implementation of DRI2SwapBuffers, this function
* should wait for vblank event which will trigger registered event handler.
* Event handler will do FLIP/SWAP/BLIT according to event type.
*
* Current DRM doesn't support vblank well, so this function just do FLIP/
* SWAP/BLIT directly, according to drawable information.
*/
Perhaps I could help here in regard to the comment "Current DRM doesn't support vblank well" - it certainly has not been a problem for other drivers, I'm sure we could find a solution for Mali too.
Making ScheduleSwap purely synchronous (as suggested in xf86-video-mali implementation) kills performance, as it causes the X server to block without processing any requests until a vblank occurs. Also on the calling side, I observed in Mali deadlock with X server grab that the rendering client also blocks waiting for the response, while holding a global Mali lock.
Could this be improved in future Mali-400 versions?
Has the situation changed at all in R4P0?
Thanks.
Daniel
Hi dsd,
I have this feedback from the driver team:
Customer reported: I am starting this with the xf86-video-armsoc driver (which is authored by ARM) and I integrate it with Mali as follows: for each new GEM buffer created, I obtain a UMP secure ID for that memory and store it in the DRI2 buffer name for that allocation. This should be all that is needed, but unfortunately Mali does not seem to adhere to the basic DRI2 standards, which means that this doesn't work. So I would like to clarify xf86-video-armsoc driver should work with exynos DRM in Linux Kernel, we have verified the latest exynos DRM is working with some internal patches. Which means, we don’t obtain any UMP secure ID now. Steps to integrate xf86-video-armsoc and exynos DRM: 1. Build libDRM git clone git://anongit.freedesktop.org/mesa/drm libdrmcd libdrm./autogen.sh --prefix=/usr --enable-exynos-experimental-apimake && make install 2. Build xf86-video-armsoc git clone /scratch/git/xf86-video-armsoc.git/./autogen.sh --with-drmmode=exynos --prefix=/usrmake && make install 3. Build Mali.ko cd trunk/src/devicedrv/maliKDIR=/work/x11/kernel_x11_odroidx2_3.8.13.7/ CROSS_COMPILE=arm-linux-gnueabihf- ARCH=arm TARGET_PLATFORM=odroidq USING_OS_MEMORY=1 USING_MMU=1 USING_PMM=1 USING_MALI_RUN_TIME_PM=0 BUILD=debug make -j4 4. Build mali-400 DDK cd trunk VARIANT="mali400-gles11-gles20-linux-x11-no_profiling-dma_buf-max_pp_split_count_4-rgb_is_xrgb" CONFIG=debug TARGET_PLATFORM=odroidq CROSS_COMPILE=arm-linux-gnueabihf- make suites -j4 You could notice that we remove –ump- from VARIANT, but add –dma_buf-, and removing USING_UMP=1 when building Mali.ko.
Customer reported:
I am starting this with the xf86-video-armsoc driver (which is authored by ARM) and I integrate it with Mali as follows: for each new GEM buffer created, I obtain a UMP secure ID for that memory and store it in the DRI2 buffer name for that allocation. This should be all that is needed, but unfortunately Mali does not seem to adhere to the basic DRI2 standards, which means that this doesn't work.
So I would like to clarify xf86-video-armsoc driver should work with exynos DRM in Linux Kernel, we have verified the latest exynos DRM is working with some internal patches. Which means, we don’t obtain any UMP secure ID now.
Steps to integrate xf86-video-armsoc and exynos DRM:
git clone git://anongit.freedesktop.org/mesa/drm libdrmcd libdrm./autogen.sh --prefix=/usr --enable-exynos-experimental-apimake && make install
git clone /scratch/git/xf86-video-armsoc.git/./autogen.sh --with-drmmode=exynos --prefix=/usrmake && make install
cd trunk/src/devicedrv/maliKDIR=/work/x11/kernel_x11_odroidx2_3.8.13.7/ CROSS_COMPILE=arm-linux-gnueabihf- ARCH=arm TARGET_PLATFORM=odroidq USING_OS_MEMORY=1 USING_MMU=1 USING_PMM=1 USING_MALI_RUN_TIME_PM=0 BUILD=debug make -j4
cd trunk
VARIANT="mali400-gles11-gles20-linux-x11-no_profiling-dma_buf-max_pp_split_count_4-rgb_is_xrgb" CONFIG=debug TARGET_PLATFORM=odroidq CROSS_COMPILE=arm-linux-gnueabihf- make suites -j4
You could notice that we remove –ump- from VARIANT, but add –dma_buf-, and removing USING_UMP=1 when building Mali.ko.
Let me know if this helps. I'm thinking 1.4 might need hardkernel to rebuild the DDK for you. I'm asking a few more questions internally and will reply again if anything comes up.
Thanks,
Chris
Thanks!
I'm not clear how switching from UMP to DMA-BUF will solve the behaviour I reported. Unless libMali drastically changes internal behaviour w.r.t. SwapBuffers based on whether UMP/DMABUF is chosen.
Nevertheless, I certainly need to give that configuration a try, and it is exciting to hear that Mali-400 has dmabuf support that will also help things later down the line. Thanks a lot for providing the build instructions too.
Yes, it requires someone (hardkernel?) to build the DDK for us, as detailed. Fingers crossed...
Now I understand better, R4P0 added dma-buf support in this context. So at the same time, it is indeed likely that SwapBuffers behaviour was overhauled. Nice. We're expecting to receive R4P0 binaries from hardkernel next month.
Thanks! r4p0 does seem to fix the problem where the old front buffer was written to immediately after SwapBuffers returns, now it waits for the event. Great!
I haven't had a chance to confirm if it fixes the other problem (where it sometimes called SwapBuffers without calling GetBuffers first) but I will check that soon. I am also worried that the dedicated SwapBuffers thread will continue to cause problems, but maybe we can regard it as working for today...
Thanks for the feedback, glad that it's solved some of your issues. Please let us know if you confirm any outstanding issues and I can push these back to the driver team. It sounds like they might be aware of them but it helps getting them prioritised if we can show people asking for it