How many atoms are performed for one time slice during multi-kernel processing?

Hi,

I 'm doing experiment about context switching in gpu, firefly-rk3399 Mali-T860

by running OpenCL polybench 2MM programs.

I also turn off lightdm for precise result.

During experiment , I find something weird thing.

If I run only one 2MM bench, it ended about 12sec,

and I run two 2MM bench,  Both ended at 12sec, 24 sec each.

but I run three 3MM bench, then all of them ended about 36sec about at the same time

It was so weird because,

If I execute even number of programs, there are some problem happened during context switching in GPU.

So, I checked kernel debug message by using dmesg. ( 2 2MM polybench executed)

[  283.676716] midgard_kbase:kbase_job_hw_submit:138: mali ff9a0000.gpu: JS: Submitting atom ffffff800b679688 from ctx ffffff800b679000 to js[1] with head=0x7fb1269080, affinity=0xf

 [  283.676799] midgard_kbase:kbase_job_hw_submit:138: mali ff9a0000.gpu: JS: Submitting atom ffffff800b6d7688 from ctx ffffff800b6d7000 to js[1] with head=0x7f9cc3f080, affinity=0xf

[  283.864274] Soft-stop

[  283.875955] midgard_kbase:kbase_job_done:375: mali ff9a0000.gpu: Job ended with status 0x00000003

[  283.876412] midgard_kbase:kbase_job_hw_submit:138: mali ff9a0000.gpu: JS: Submitting atom ffffff800b679688 from ctx ffffff800b679000 to js[1] with head=0x7fb1269080, affinity=0xf

 [  283.876526] midgard_kbase:kbase_job_hw_submit:138: mali ff9a0000.gpu: JS: Submitting atom ffffff800b6d7688 from ctx ffffff800b6d7000 to js[1] with head=0x7f9cc3f080, affinity=0xf

[  284.064630] Soft-stop

[  284.076354] midgard_kbase:kbase_job_done:375: mali ff9a0000.gpu: Job ended with status 0x00000003

As seen above, 

During one time slice, 2 atoms submitted to job slot[1], but only one of atoms is executed. 

Is it work correcty?

More questions in this forum