Arm Community
Site
Search
User
Site
Search
User
Support forums
Mobile, Graphics, and Gaming forum
bad performance on 3.8 kernel
Jump...
Cancel
Locked
Locked
Replies
8 replies
Subscribers
137 subscribers
Views
7133 views
Users
0 members are here
Mali Drivers
Mali-GPU
Mali-400
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
bad performance on 3.8 kernel
Marian Mihailescu
over 12 years ago
Note: This was originally posted on 18th June 2013 at
http://forums.arm.com
Mali 400 on an exynos-based board:
with 3.0 kernel, EGL working fine, with up to 600fps in es2gears
ported drivers to 3.8 kernel, and mali acceleration working, however, the performance is roughly 50%.
I have debugged the issue at the gp job start wrapper - _mali_ukk_gp_start_job, which is called now 50% more times than on the 3.0 kernel...
Here is a comparison between the 2 kernels:
1) with SKIP_GP_JOBS and retuning the job straight away from _mali_ukk_gp_start_job, both 3.0 and 3.8 kernel results in the same number of mali_ioctl calls and the same performance - 650fps in es2gears
2) i modified es2gears to stop after 600 frames and here are my results (from bottom to top):
GP jobs actually done - calls to "mali_gp_job_start": 299 on 3.0 kernel, 302 on 3.8 kernel
calls to mali_group_start_gp_job (which calls mali_gp_job_start): 299 on 3.0, 302 on 3.8 kernel
executions of mali_gp_scheduler_schedule (which calls mali_group_start_gp_job): 299 on 3.0, 302 on 3.8 kernel -- appears as "mali_gp_scheduler_schedule() {" in ftrace
calls to mali_gp_scheduler_schedule: 0 on 3.0, 299 on 3.8 kernel -- appears as "mali_gp_scheduler_schedule();" in ftrace
system calls served (mali_ioctl) : 960 on 3.0 kernel, 1373 on 3.8 kernel
results: ~600fps on 3.0 kernel, ~380fps on 3.8 kernel
So the conclusion is that the slowdown is due to a much larger number (almost double) of mali_ioctls for MALI_IOC_GP2_START_JOB.
Since I don't have the code for libMali to debug why exactly it's making so many syscalls, I hope somebody here can help me and give me an idea where to look.
A strange thing is the job numbers assigned.
In the 3.0 kernel, they are all multiples of 4, like: Mali GP scheduler: Job 2405 (0xE6581B80) queued; 2409, 2413, 2417, 2421, 2425, ...
In the 3.8 kernel, they increment either by 2, 4 or 6: 8825, 8829, 8833, 8835, 8841, 8843, 8849, 8853, ...
Parents
Marian Mihailescu
over 12 years ago
Note: This was originally posted on 29th June 2013 at
http://forums.arm.com
Some more details about my traces are posted here:
http://forum.odroid.com/viewtopic.php?f=55&t=305&p=11748#p11748
On the 3.0 kernel, in most ioctl calls for GP jobs, there are 2 sets of frame registers that are read for jobs, 2 jobs created and executed, the ioctl ends after the second job. On the 3.8 kernel, there is only 1 job processed in a ioctl call, i.e. only 1 set of frame registers. The frame registers are used alternately: ioctl gp job from frame registers set 1, ioctl gp job from frame registers set 2, ioctl set 1, ioctl set 2. Moreover, jobs from set 2 of frame registers end up not being scheduled immediately, with the scheduler exiting because the slot is in use. Probably, the job is scheduled when the previous job was finished.
I guess the question is: how was it than in 3.0 the ioctl would read both sets of frame registers and create 2 jobs, and in 3.8 there is only 1 job created per ioctl ? Since the mali drivers are the same, I can only think that somehow platform is initialized differently, or maybe there is something wrong with the UMP memory allocated?
Any ideas are welcomed.
Cancel
Vote up
0
Vote down
Cancel
Reply
Marian Mihailescu
over 12 years ago
Note: This was originally posted on 29th June 2013 at
http://forums.arm.com
Some more details about my traces are posted here:
http://forum.odroid.com/viewtopic.php?f=55&t=305&p=11748#p11748
On the 3.0 kernel, in most ioctl calls for GP jobs, there are 2 sets of frame registers that are read for jobs, 2 jobs created and executed, the ioctl ends after the second job. On the 3.8 kernel, there is only 1 job processed in a ioctl call, i.e. only 1 set of frame registers. The frame registers are used alternately: ioctl gp job from frame registers set 1, ioctl gp job from frame registers set 2, ioctl set 1, ioctl set 2. Moreover, jobs from set 2 of frame registers end up not being scheduled immediately, with the scheduler exiting because the slot is in use. Probably, the job is scheduled when the previous job was finished.
I guess the question is: how was it than in 3.0 the ioctl would read both sets of frame registers and create 2 jobs, and in 3.8 there is only 1 job created per ioctl ? Since the mali drivers are the same, I can only think that somehow platform is initialized differently, or maybe there is something wrong with the UMP memory allocated?
Any ideas are welcomed.
Cancel
Vote up
0
Vote down
Cancel
Children
No data