We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
hi,
my application use android java JNI to process camera vidéo in real time. The C part use openCL and multicore threading with médiatek 9200+ and mali G715.
Something strange append with the application. After fews seconde of processing, 70-80 frames, i go from 60-70ms per frame to 140-160 per frame.
What i am doing :
1) i do some kernel to conver YUV and extract data
2) Then i procees the extracted data with CPU multicore threading(4 thread at the time) 7 time.
3) Then i use kernel again to extract data and send it back to Java.
If i remove all the CPU work, the GPU time remain stable between 20ms at the begining(70-80 frames) to 40-45. But with the CPU work time increase dramaticly after 70-80 frame using the same amount of input data. The same problem appened with my old mali g72 after 20 frames.
I tried to use streamline, but as i run windows7 i cannot have the analizer, how just run from windows10. But i can get the graph and i remarked a strange activity on the GPU.
1) the Mali Memory Read Latency after to get red after the 70-80 frames. (it show 25 mega beats ?)
2) the Mali geométry culling rate start to ocsillate. (it show 100%)
3) the Mali geométry efficiency start to ocsillate. (it show 3 trheads)
4) Mali Early ZS rate is red
and many other thing start to ocillate. But Device Thermal State is 100% green.
in fact after 70-80 frames a lot of thing start to ocsillate.
And streamline is to complicated for me. there is too much things to know and anderstant. I do not have the time.
So i am wondering if it could be possible for an expert to analyse it.
As i said i had the same problem on old maliG78 so at some point there is something that goes wrong using in alternance GPU-CPU-GPU. In one of my post about SVM someone told me that using GPU and CPU was not a good odea. But i cannot do with GPU what i am doing with GPU. Or i do not know how to do with GPU what i am doing with CPU. At some time i need to procces data with CPU.
thanks for the help.
Happy to take a look if you can share a debuggable APK or the Streamline capture. Feel free to email performancestudio@arm.com.
Thank a lot. i will send you a debuggable APK late today.
thanks again.
did you receive my e-mail ?
hi again,
I made some more testing.
1)I removed the data transfer from GPU to CPU (enqueueMapBuffer). So multicore processing process zéro data and trtake 1 to 2 ms. And i removed the transfer from CPU to GPU (enqueueWriteBuffer or cl::bufferCL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR because i tested with the two possibility of transfer) and i removed the final GPU to JNI buffer for display (enqueueReadBuffer). But i the process time still double after few frames.
2)
I also tried to remove tha all JNI call, so no more openCL and no more multicore threading. In this case the speed is stable and the streamline is not balancing anymore.
3)
i also tried to only remove all the CPU processing and keep only the OpenCL, YUV transfor and all the read and write to CPU.and in this case i remaked that at time double after 20 frame and that the Mali Memory Read Lantency start to get very red after 7 seconde like if the CPU were processing data. so from 7ms for the éà first frame then 20ms. And if i removed all the GPU/CPU transfer there is still i little bit of red and frame are processed fromm 0ms to 4ms after 7 seconde.
So it look like there is somehing wrong when transfering data from GPU to GPU and GPU to CPU. And of course CPU processing data increase with the amount the data processed. And the good indicator is the Mali Mémory Read Latency. But it may be something alse. I am not good enough to help more.
> Did you receive my e-mail?
No, sorry. Can you try sending to peter.harris@arm.com.
it is done. Let me know if you got it.
No, sorry =(
Surprising.
Sorry but as you can see. I send it 3 times. The last time was today 14/09/2025 at 15:59 and the first time the 03/09/2025 at performancestudio@arm.com.
and the confirmation for the last one. So if yoi have not received it. I come from ARM how stop it for some reason ?
Strange ;))
So by the way i show you the result of the streamline. As i said the trouble start from 7 secondes. But i forgot to save the time picture. But you can see it in the Mali Geometry Usage and Mali GPU Utilization.
I did some test this morning. And this time i removed all the multi core threading. And the big surprise is there is no more double time. So i conclude that it is the use of the CPU or multicore threading that cause the problem. All OpenCL command are fine, enqueuemapbuffer, enqueuewritebuffer and enqueuereadbuffer.
It look like the massive usage of the CPU is my problem. As soon as i start the massive CPU work the problem appends. But i do not anderstand why it slow done after alway the same amount of frame proceced.