Hi,
I have been using both the X11 and fbdev drivers (r4p0) on an Odroid-XU3 with Mali-T628, the kernel driver integrated by the vendor (Hardkernel) and the binary blob from armdeveloper website.
The performance was really bad - especially on fbdev, framerates were less than half than on X11. On X11, framerates were worse than on an Odroid-U3 with Mali400.
Now with the release of r5p0 drivers, early tests see a similar situation, unfortunately.
On r4p0-MaliT628 X11, I get just under 200fps on es2_gears, and ~55 glmark2 score. I've seen the results of Mali-T764 on the RK3288 board from http://pastebin.com/Qzrh51Yv and are similarly bad.
Is this something you are aware of?
On Mali-400, I get ~260 fps in es2_gears, and a glmark2 score of ~60. fbdev and x11 performance is similar.
But I would even be happy with these numbers in fbdev. es2_gears does not work on fbdev, but some of the apps I've been trying showed exceptionally poor framerates (15 in fbdev vs. 60 in X11) and were unusable.
On the forums, I remembered seeing a similar post with a benchmark on Arndale octa or Chromebook (can't remember exactly, T6xx in any case), where framerates were also smaller with fbdev drivers for the same app.
Is there any comment on the performance of fbdev vs. x11? Or the lack of 3D performance seen in the Txxx from various vendors?
Thanks.
We've run some benchmarks to compare r4p1 and r5p0 drivers for Mali-T60x, T62x and T76x and found an increased performance in all cases, so the results reported here are surprising. On Firefly with r5p0, we've seen the triangle SDK sample running at over 100fps in fbdev mode. Could you please describe which system you have running on Firefly that shows only 52 fps?
Then regarding fbdev vs X11 performance, I believe the main problem here is that there is typically no zero-copy support available for fbdev so the user-side GPU driver has to keep calling memcpy to copy the contents of the GPU output buffer into the display framebuffer. Some kernels and framebuffer drivers have support for DMA-BUF but it's not that common. X11 typically uses the Direct Rendering Manager (DRM) which has better standard support for zero-copy (i.e. the GPU writes directly into the display framebuffer). This means that an off-screen benchmark should give the same results with fbdev and X11, but the on-screen performance will hit the bottleneck of memcpy on fbdev. It's worth noting that X11 is fairly heavy so it will also degrade the fps score compared to off-screen. Whenever available, fbdev with zero-copy is typically the fastest solution.
Regarding Pete's earlier comment, for pure GPU driver validation purposes, the display integration is a non-issue.
We're looking into how to fix fbdev, but it will always depend on the platform as on each platform a different framebuffer driver will need to implement a DMA-BUF exporter mechanism.
Back to the original question, could you please describe your Firefly set-up (OS, kernel version...) and maybe try to run some off-screen benchmarks?
Best wishes,
Guillaume
Hi Guillaume,
Many thanks for the response, I'm currently testing against the chromium (3.14) kernel (on the Firefly) . Given your seen a higher fps rate I suspect the issue may lie with the KMS driver implementation because I'm also seeing high CPU usage. The 3.10 kernel uses CONFIG_FB_ROCKCHIP which is a simple frame-buffer driver. I'll see if I can patch the 3.10 kernel and give that a go.
I'm surprised that DMA-BUF isn't used but as you point out it's probably going to be SOC specific. I would assume the Android Mali drivers are quite similar to fbdev, do they use DMA?
thanks
Jasbir
Hi Jasbir,
Thanks, so we're at least using different kernels. We've run our benchmarks using this Firefly 3.10 kernel branch:
https://bitbucket.org/T-Firefly/firefly-rk3288-kernel.git
I don't think either of them is using DMA-BUF in fbdev, but the 3.14 Chromium kernel should definitely have it enabled in X11 as that's what Chrome OS uses, or at least has used until now. It would still be good to know that really makes this fps difference in fbdev. We might run our benchmarks again with 3.14 at some point, but please also let us know if you try 3.10 on your side and get different results. Also, as fbdev uses memcpy which takes a fair amount of CPU usage and memory bandwidth, if the CPU is busy doing anything else at the same time then this may indirectly impact the graphics performance.
Android kernels provide the ION framework which essentially does the same thing as DMA-BUF to share a buffer between the display and GPU drivers. In principle, any modern kernel used in a production Android device will have ION enabled. Some may use DMA-BUF, but none of them would want to use software memcpy... The Mali user-side drivers are built for a specific windowing system, and the difference is mainly about how to set up the zero-copy by sharing the display buffer with the GPU driver. For example, with DMA-BUF this is typically achieved by passing a file descriptor from the display driver to the GPU driver via user-space.