This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Bad interaction with DRI2 for vsync

Hi,

We are working with Mali-400 driver r3p2-01rel0 on Exynos4412, under Linux/X11.

base: BUILD=RELEASE ARCH=arch_011_udd PLATFORM=default_7a TRACE=0 THREAD= GEOM= CORES=MALI400 USING_MALI400=1 TARGET_CORE_REVISION=0x0101 TOPLEVEL_REPO_URL=Linux-r3p2-01rel0 REVISION=Linux-r3p2-01rel0 CHANGED_REVISION=Linux-r3p2-01rel0 REPO_URL=Linux-r3p2-01rel0 BUILD_DATE=Fri Jan 11 14:58:31 UTC 2013 CHANGE_DATE=Linux-r3p2-01rel0 TARGET_TOOLCHAIN=gcc HOST_TOOLCHAIN=gcc TARGET_TOOLCHAIN_VERSION=gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)  HOST_TOOLCHAIN_VERSION=gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)  TARGET_SYSTEM=gcc-arm-linux-gnueabihf HOST_SYSTEM=gcc-arm-linux-gnueabihf CPPFLAGS= CUSTOMER=internal VARIANT=mali400-r3p2-gles11-gles20-linux-ump-x11 HOSTLIB=direct INSTRUMENTED=FALSE USING_MRI=FALSE MALI_TEST_API= UDD_OS=linux

The Mali README explains that Mali must be integrated with the display controller driver of the host system. We're trying to do just that. In this case, the display driver is exynos-drm, which uses DRI2. We require this over fbdev for the ability to change resolutions dynamically (via KMS), for perfect vblank synchronization, and to reduce the amount of CPU copying in order to get GPU rendering results on the screen.

I am starting this with the xf86-video-armsoc driver (which is authored by ARM) and I integrate it with Mali as follows: for each new GEM buffer created, I obtain a UMP secure ID for that memory and store it in the DRI2 buffer name for that allocation. This should be all that is needed, but unfortunately Mali does not seem to adhere to the basic DRI2 standards, which means that this doesn't work. The 2 main problems are:

  1. The current half-drawn back buffer is often the one that ends up settled displayed on the screen
  2. Right after scheduling a page flip, the "old" front buffer (which is still displayed on-screen until the next vblank) seems to be modified by Mali even before the driver has posted a  DRI2BufferSwapComplete event (I confirmed this by checksumming the buffers at different points in the pipeline).

For the first problem, a double-buffered DRI2 rendering client should always call DRI2GetBuffers in order to get the back buffer before starting to draw. While I can see that the command sequence is often GetBuffers SwapBuffers GetBuffers SwapBuffers... I also often see cases where it does GetBuffers SwapBuffers SwapBuffers SwapBuffers... This confuses the buffer reuse logic in the DRI2 implementation in the X server and results in the client and server disagreeing about which buffers are front and back at a given time.

For the second issue, I confirmed the problem by checksumming the buffers at different points. The old front buffer is reused as soon as the X driver's ScheduleSwap() function returns, which does not indicate that the swap has completed. The buffer is still on the screen for a while longer. But Mali draws to it right away resulting in a nasty visual glitch, corrected momentarily after when the swap completes.

Mali seems to have a bit of a fundamental misunderstanding with SwapBuffers. I already saw in Re: Mali deadlock with X server grab that Mali appears to create a dedicated thread in order to call SwapBuffers, which seems bizarre as SwapBuffers is an asynchronous operation (with completion later notified by the BufferSwapComplete event). But from this and the behaviour observed above, I guess Mali developers have misunderstood and implemented it as purely synchronous - i.e. it is a blocking function, and after it returns, the old front buffer is available for immediate reuse. I also see this comment in xf86-video-mali:

/*

* MaliDRI2ScheduleSwap is the implementation of DRI2SwapBuffers, this function

* should wait for vblank event which will trigger registered event handler.

* Event handler will do FLIP/SWAP/BLIT according to event type.

*

* Current DRM doesn't support vblank well, so this function just do FLIP/

* SWAP/BLIT directly, according to drawable information.

*/

Perhaps I could help here in regard to the comment "Current DRM doesn't support vblank well" - it certainly has not been a problem for other drivers, I'm sure we could find a solution for Mali too.

Making ScheduleSwap purely synchronous (as suggested in xf86-video-mali implementation) kills performance, as it causes the X server to block without processing any requests until a vblank occurs. Also on the calling side, I observed in Mali deadlock with X server grab that the rendering client also blocks waiting for the response, while holding a global Mali lock.

Could this be improved in future Mali-400 versions?

Has the situation changed at all in R4P0?

Thanks.

Daniel

Parents
  • Hi Daniel,

    I've created a ticket internally as this will need a look by the driver team. Is there any reason you are using r3p2 over r4p0, ignoring the integration issue you are currently seeing on r3p2?

    Also, are you a Mali licensee, or a 3rd party working on the Origen development board for that SoC?

    Thanks,

    Chris

Reply
  • Hi Daniel,

    I've created a ticket internally as this will need a look by the driver team. Is there any reason you are using r3p2 over r4p0, ignoring the integration issue you are currently seeing on r3p2?

    Also, are you a Mali licensee, or a 3rd party working on the Origen development board for that SoC?

    Thanks,

    Chris

Children
  • Hi Chris,

    Thanks for passing this on.

    At the moment the provider of the board we're working with (Hardkernel) does not offer R4P0. We are also looking at switching to a different board/SoC but the provider there has never built Mali for X11/Linux before so it is taking a really long time... And without so much as a changelog even publically available, it's not even remotely clear if it's worth our effort to become concerned with R4P0.

    This situation is really painful. I'd be interested in learning more about the intended design of this licensing model.

    As I understand it, the DDK licenses are intended for SoC manufacturers. It is done this way as each SoC manufacturer must modify the DDK to produce a libMali/libEGL/libGLESv2 that is specific to the SoC - this is why ARM do not release those binaries.

    However, the SoC manufacturers generally do not care much about Linux/X11 so any third parties (like us) that want to try and do something interesting here are really constrained. And such third parties would not benefit from purchasing whatever license to the DDK, because the DDK must be customized for the SoC, which requires knowledge that is specific to the manufacturer of that specific SoC, which we do not have.

    Any clarification appreciated. Would love to work with a smoother model that allows us to run ARM's latest developments on existing hardware, without having to rely on other parties. Also, does the picture change if we later move to developing our own board (but again with a pre-existing SoC)?

  • No clarification needed, that's essentially how it works.

    ARM are aware of the issues faced by developers such as yourself, and we are actively working to improve the situation. For example we have recently released a guide and driver set on malideveloper.arm.com for the Chromebook in an effort to get developers up and running with an FBDEV/X11 Linux environment on T604.

    For what it's worth, HardKernel are ostensibly one of the best dev-board companies out there when it comes to software enablement/support, so I would recommend you also hit their forums with any issues you are facing.

    If you were to move to developing your own board with pre-existing SoC, then there are likely many options available to you, for example licensing the DDK from ARM, or sub-licensing from your vendor. In the latter case, they would ideally provide you with a DDK pre-integrated . This is a conversation we can revisit if you ever move that way.

    Hope this helps,

    Chris