This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali Linux Driver Error Message when running opencl program

Hi all:

I use RK3288 with Linux 4.4 and opencl program running on it. I saw some error message print of the console.


[ 6247.837157] mali ffa30000.gpu: JS: Job Hard-Stopped (took more than 50 ticks at 100 ms/tick)
[ 6247.845711] mali ffa30000.gpu: JS: Job Hard-Stopped (took more than 50 ticks at 100 ms/tick)
[ 6247.854615] mali ffa30000.gpu: error detected from slot 0, job status 0x00000004 (TERMINATED)
[ 6247.863451] mali ffa30000.gpu: t6xx: GPU fault 0x04 from job slot 0
[ 6252.976364] mali ffa30000.gpu: JS: Job Hard-Stopped (took more than 50 ticks at 100 ms/tick)

Does it mean the opencl programs occupy GPU too long?
When the opencl program is running, the X-windows screen is hanged but the Linux ssh console still works.
How to avoid X-windows hanged when running opencl programs.

Thank you

-Jack

  • hiwu said:
    Does it mean the OpenCL programs occupy GPU too long?

    Yes, more specifically that it didn't yield quickly enough when asked to suspend.

    Preempting work at an arbitrary point at the thread level is prohibitively expensive for a GPU (thousands of threads to save and restore), so Mali implements a cooperative yield scheduling scheme at task granularity. For compute workloads a task will be some multiple of the workgroup size (typically up to 256 threads in total, so may contain multiple workgroups if your workgroups are small).

    When a workload is asked to yield (soft stop) by the kernel driver it has a certain time limit (5 seconds in your case) to do so, failing to do that will cause the kernel driver to issue a kill (hard stop). The time taken to soft-stop is the time to run a small number of tasks / work groups. The general principle for "playing nicely" is to write your compute kernels so that single workgroups complete relatively quickly (milliseconds is OK, so "quick" may still be a few million cycles). Whole compute kernels can iterate over very large ndrange problem sizes which are much larger than the task timeout, provided that they preempt at the task level cleanly.

    This will avoid the window system hanging too - the compute workload yielding will give the UI rendering time to run.

    HTH, 
    Pete