I have been using both the X11 and fbdev drivers (r4p0) on an Odroid-XU3 with Mali-T628, the kernel driver integrated by the vendor (Hardkernel) and the binary blob from armdeveloper website.
The performance was really bad - especially on fbdev, framerates were less than half than on X11. On X11, framerates were worse than on an Odroid-U3 with Mali400.
Now with the release of r5p0 drivers, early tests see a similar situation, unfortunately.
On r4p0-MaliT628 X11, I get just under 200fps on es2_gears, and ~55 glmark2 score. I've seen the results of Mali-T764 on the RK3288 board from http://pastebin.com/Qzrh51Yv and are similarly bad.
Is this something you are aware of?
On Mali-400, I get ~260 fps in es2_gears, and a glmark2 score of ~60. fbdev and x11 performance is similar.
But I would even be happy with these numbers in fbdev. es2_gears does not work on fbdev, but some of the apps I've been trying showed exceptionally poor framerates (15 in fbdev vs. 60 in X11) and were unusable.
On the forums, I remembered seeing a similar post with a benchmark on Arndale octa or Chromebook (can't remember exactly, T6xx in any case), where framerates were also smaller with fbdev drivers for the same app.
Is there any comment on the performance of fbdev vs. x11? Or the lack of 3D performance seen in the Txxx from various vendors?
> Is there any comment on the performance of fbdev vs. x11? Or the lack of 3D performance seen in the Txxx from various vendors?
The fbdev driver we provide is designed for hardware verification and new platform bring up - it's generally designed for simplicity and reliability rather than performance - so it really isn't optimized. My guess is something in the reference integration is serializing frames so the GPU pipeline drains - I can raise this with our BSP team, but the general assumption is that most people will use a real windowing system (X11, Android Surface Flinger, etc).
I'm not entirely sure what the X11 integration with the host platform looks like on these boards - it generally gets customized by most vendors shipping commercial products.
I'm surprised that the fbdev drivers are promoted as a hardware verification because I couldn't find this isn't explicitly mentioned in the docs. What more concerning that is that I would expect the drivers to perform to at least showcase the capabilities of the GPU not the other way round. For example I have customers who need see a prototype of their application running on their chosen SOC to get a feel for performance plus determine power/cooling requirements. It's very difficult to tell the end customer that we can't give you performance metrics .
Regards fbdev, from our experience they are useful because:
1. QT supports fbdev and is still used for embedded application development
2. X11 is a heavy weight stack and doesn't perform particularly well on ARM (for numerous reasons), in some circumstances we rewrite the application to use fbdev to lower CPU usage with the added benefit of less heat/power.
3. Some of your competitors provide fbdev support, useful when customers are comparing feature of the SOC.
Thanks for your explains, i mean it's the same on Odroid-C1. I will just try this immediatly
Yes, all good points - I see the need, I'm just explaining what to expect out of the fbdev drivers we currently ship on malideveloper.com.
That said, to be honest I'm really surprised that the fbdev drivers are as slow as they are - so I will definitely be raising with our BSP team.
Kind regards, Pete
Did you manage to get a response from the BSP team?
We've run some benchmarks to compare r4p1 and r5p0 drivers for Mali-T60x, T62x and T76x and found an increased performance in all cases, so the results reported here are surprising. On Firefly with r5p0, we've seen the triangle SDK sample running at over 100fps in fbdev mode. Could you please describe which system you have running on Firefly that shows only 52 fps?
Then regarding fbdev vs X11 performance, I believe the main problem here is that there is typically no zero-copy support available for fbdev so the user-side GPU driver has to keep calling memcpy to copy the contents of the GPU output buffer into the display framebuffer. Some kernels and framebuffer drivers have support for DMA-BUF but it's not that common. X11 typically uses the Direct Rendering Manager (DRM) which has better standard support for zero-copy (i.e. the GPU writes directly into the display framebuffer). This means that an off-screen benchmark should give the same results with fbdev and X11, but the on-screen performance will hit the bottleneck of memcpy on fbdev. It's worth noting that X11 is fairly heavy so it will also degrade the fps score compared to off-screen. Whenever available, fbdev with zero-copy is typically the fastest solution.
Regarding Pete's earlier comment, for pure GPU driver validation purposes, the display integration is a non-issue.
We're looking into how to fix fbdev, but it will always depend on the platform as on each platform a different framebuffer driver will need to implement a DMA-BUF exporter mechanism.
Back to the original question, could you please describe your Firefly set-up (OS, kernel version...) and maybe try to run some off-screen benchmarks?
Many thanks for the response, I'm currently testing against the chromium (3.14) kernel (on the Firefly) . Given your seen a higher fps rate I suspect the issue may lie with the KMS driver implementation because I'm also seeing high CPU usage. The 3.10 kernel uses CONFIG_FB_ROCKCHIP which is a simple frame-buffer driver. I'll see if I can patch the 3.10 kernel and give that a go.
I'm surprised that DMA-BUF isn't used but as you point out it's probably going to be SOC specific. I would assume the Android Mali drivers are quite similar to fbdev, do they use DMA?
Thanks, so we're at least using different kernels. We've run our benchmarks using this Firefly 3.10 kernel branch:
I don't think either of them is using DMA-BUF in fbdev, but the 3.14 Chromium kernel should definitely have it enabled in X11 as that's what Chrome OS uses, or at least has used until now. It would still be good to know that really makes this fps difference in fbdev. We might run our benchmarks again with 3.14 at some point, but please also let us know if you try 3.10 on your side and get different results. Also, as fbdev uses memcpy which takes a fair amount of CPU usage and memory bandwidth, if the CPU is busy doing anything else at the same time then this may indirectly impact the graphics performance.
Android kernels provide the ION framework which essentially does the same thing as DMA-BUF to share a buffer between the display and GPU drivers. In principle, any modern kernel used in a production Android device will have ION enabled. Some may use DMA-BUF, but none of them would want to use software memcpy... The Mali user-side drivers are built for a specific windowing system, and the difference is mainly about how to set up the zero-copy by sharing the display buffer with the GPU driver. For example, with DMA-BUF this is typically achieved by passing a file descriptor from the display driver to the GPU driver via user-space.
View all questions in Graphics and Gaming forum