Arm Community
Site
Search
User
Site
Search
User
Groups
Education Hub
Distinguished Ambassadors
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello forum
Operating Systems forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Internet of Things (IoT) blog
Operating Systems blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Graphics, Gaming, and VR forum
bad performance on 3.8 kernel
Jump...
Cancel
Locked
Locked
Replies
8 replies
Subscribers
136 subscribers
Views
6517 views
Users
0 members are here
Mali Drivers
Mali-GPU
Mali-400
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
bad performance on 3.8 kernel
Marian Mihailescu
over 11 years ago
Note: This was originally posted on 18th June 2013 at
http://forums.arm.com
Mali 400 on an exynos-based board:
with 3.0 kernel, EGL working fine, with up to 600fps in es2gears
ported drivers to 3.8 kernel, and mali acceleration working, however, the performance is roughly 50%.
I have debugged the issue at the gp job start wrapper - _mali_ukk_gp_start_job, which is called now 50% more times than on the 3.0 kernel...
Here is a comparison between the 2 kernels:
1) with SKIP_GP_JOBS and retuning the job straight away from _mali_ukk_gp_start_job, both 3.0 and 3.8 kernel results in the same number of mali_ioctl calls and the same performance - 650fps in es2gears
2) i modified es2gears to stop after 600 frames and here are my results (from bottom to top):
GP jobs actually done - calls to "mali_gp_job_start": 299 on 3.0 kernel, 302 on 3.8 kernel
calls to mali_group_start_gp_job (which calls mali_gp_job_start): 299 on 3.0, 302 on 3.8 kernel
executions of mali_gp_scheduler_schedule (which calls mali_group_start_gp_job): 299 on 3.0, 302 on 3.8 kernel -- appears as "mali_gp_scheduler_schedule() {" in ftrace
calls to mali_gp_scheduler_schedule: 0 on 3.0, 299 on 3.8 kernel -- appears as "mali_gp_scheduler_schedule();" in ftrace
system calls served (mali_ioctl) : 960 on 3.0 kernel, 1373 on 3.8 kernel
results: ~600fps on 3.0 kernel, ~380fps on 3.8 kernel
So the conclusion is that the slowdown is due to a much larger number (almost double) of mali_ioctls for MALI_IOC_GP2_START_JOB.
Since I don't have the code for libMali to debug why exactly it's making so many syscalls, I hope somebody here can help me and give me an idea where to look.
A strange thing is the job numbers assigned.
In the 3.0 kernel, they are all multiples of 4, like: Mali GP scheduler: Job 2405 (0xE6581B80) queued; 2409, 2413, 2417, 2421, 2425, ...
In the 3.8 kernel, they increment either by 2, 4 or 6: 8825, 8829, 8833, 8835, 8841, 8843, 8849, 8853, ...
Parents
Marian Mihailescu
over 11 years ago
Note: This was originally posted on 29th June 2013 at
http://forums.arm.com
Some more details about my traces are posted here:
http://forum.odroid.com/viewtopic.php?f=55&t=305&p=11748#p11748
On the 3.0 kernel, in most ioctl calls for GP jobs, there are 2 sets of frame registers that are read for jobs, 2 jobs created and executed, the ioctl ends after the second job. On the 3.8 kernel, there is only 1 job processed in a ioctl call, i.e. only 1 set of frame registers. The frame registers are used alternately: ioctl gp job from frame registers set 1, ioctl gp job from frame registers set 2, ioctl set 1, ioctl set 2. Moreover, jobs from set 2 of frame registers end up not being scheduled immediately, with the scheduler exiting because the slot is in use. Probably, the job is scheduled when the previous job was finished.
I guess the question is: how was it than in 3.0 the ioctl would read both sets of frame registers and create 2 jobs, and in 3.8 there is only 1 job created per ioctl ? Since the mali drivers are the same, I can only think that somehow platform is initialized differently, or maybe there is something wrong with the UMP memory allocated?
Any ideas are welcomed.
Cancel
Up
0
Down
Cancel
Reply
Marian Mihailescu
over 11 years ago
Note: This was originally posted on 29th June 2013 at
http://forums.arm.com
Some more details about my traces are posted here:
http://forum.odroid.com/viewtopic.php?f=55&t=305&p=11748#p11748
On the 3.0 kernel, in most ioctl calls for GP jobs, there are 2 sets of frame registers that are read for jobs, 2 jobs created and executed, the ioctl ends after the second job. On the 3.8 kernel, there is only 1 job processed in a ioctl call, i.e. only 1 set of frame registers. The frame registers are used alternately: ioctl gp job from frame registers set 1, ioctl gp job from frame registers set 2, ioctl set 1, ioctl set 2. Moreover, jobs from set 2 of frame registers end up not being scheduled immediately, with the scheduler exiting because the slot is in use. Probably, the job is scheduled when the previous job was finished.
I guess the question is: how was it than in 3.0 the ioctl would read both sets of frame registers and create 2 jobs, and in 3.8 there is only 1 job created per ioctl ? Since the mali drivers are the same, I can only think that somehow platform is initialized differently, or maybe there is something wrong with the UMP memory allocated?
Any ideas are welcomed.
Cancel
Up
0
Down
Cancel
Children
No data