processus time double after fews seconde

hi,

my application use android java JNI to process camera vidéo in real time. The C part use openCL and multicore threading with médiatek 9200+ and mali G715.

Something strange append with the application. After fews seconde of processing, 70-80 frames, i go from 60-70ms per frame to 140-160 per frame.

What i am doing :

1) i do some kernel to conver YUV and extract data

2) Then i procees the extracted data with CPU multicore threading(4 thread at the time) 7 time.

3) Then i use kernel again to extract data and send it back to Java.

If i remove all the CPU work, the GPU time remain stable between 20ms at the begining(70-80 frames)  to 40-45. But with the CPU work time increase dramaticly after 70-80 frame using the same amount of input data. The same problem appened with my old mali g72 after 20 frames.

I tried to use streamline, but as i run windows7 i cannot have the analizer, how just run from windows10. But i can get the graph and i remarked a strange activity on the GPU.

1) the Mali Memory Read Latency after to get red after the 70-80 frames. (it show 25 mega beats ?)

2) the Mali geométry culling rate start to ocsillate. (it show 100%)

3) the Mali geométry efficiency start to ocsillate. (it show 3 trheads)

4) Mali Early ZS rate is red

and many other thing start to ocillate. But Device Thermal State is 100% green.

in fact after 70-80 frames a lot of thing start to ocsillate.

And streamline is to complicated for me. there is too much things to know and anderstant. I do not have the time.

So i am wondering if it could be possible for an expert to analyse it.

As i said i had the same problem on old maliG78 so at some point there is something that goes wrong using in alternance GPU-CPU-GPU. In one of my post about SVM someone told me that using GPU and CPU was not a good odea. But i cannot do with GPU what i am doing with GPU. Or i do not know how to do with GPU what i am doing with CPU. At some time i need to procces data with CPU.

thanks for the help.

Parents
  • hi,

    i have removed the streamline picture. Because too big. But this morning i woke up very early with a new idéa to how to explain what appends in détail. That wake me up ;))

    I will show you 3 picture, that représent the all proccessing  secnde per seconde for each step of the all processing.

    fir all the picture, abscissa information are time in seconde and in ordonate is the duration of the step for every frame processed. It appends that some time for some reason the debug forgot one step or two during CPU processing.

    By the way i can send you more détailled informations but for 21 seconde of analyse it is 500 line of data for 218 frame. quite a lot.

    First picture. I do the first steep of kernel. YUV convertion and some other kernel and enqueueMapBuffer to make available the data for CPU processing.

     
    As you can see procees time is very stable in time.

    Now, the second picture. In this step where a do all the multicore processing (5 time 4 pthread)
    and (form 1 to 12 time (small amount of data) (2 time 4 pthread) and (1 time 2 pthread)) and some other work in single
    process for drawing pixel for output.i work with a struc of int (1024*1024). i could use Array ?

    Time processing dépending of the amount of data to be processed. But in all the picture i procees
    nearly the same amount of data.I look at the same picture with the camera at the same distance.


    As you can see Multicore processing is completly choatique. And it is for this step that some
    time the debug start missinf frame.

    here is an example of missing frame. look at the NB frame at seconde 32. so one seconde after the app start.

    2025-10-11 10:02:32.337 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 18 in 63 ms
    2025-10-11 10:02:32.340 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:32.363 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 25 ms
    2025-10-11 10:02:32.392 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 54 ms
    2025-10-11 10:02:32.395 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 19 in 57 ms
    2025-10-11 10:02:32.406 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:32.438 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 34 ms
    2025-10-11 10:02:32.633 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 53 ms
    2025-10-11 10:02:32.635 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 23 in 54 ms
    2025-10-11 10:02:32.641 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:32.666 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 28 ms
    2025-10-11 10:02:32.871 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 25 ms
    2025-10-11 10:02:33.010 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 51 ms
    2025-10-11 10:02:33.013 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 28 in 55 ms
    2025-10-11 10:02:33.016 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:33.037 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 23 ms
    2025-10-11 10:02:33.366 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 31 ms
    2025-10-11 10:02:33.796 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 58 ms
    2025-10-11 10:02:33.799 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 37 in 61 ms
    2025-10-11 10:02:33.802 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:33.831 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 31 ms
    2025-10-11 10:02:33.926 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 60 ms
    2025-10-11 10:02:33.930 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 39 in 64 ms
    2025-10-11 10:02:33.934 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 3 ms
    2025-10-11 10:02:33.963 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 33 ms
    2025-10-11 10:02:34.198 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:34.229 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 33 ms
    2025-10-11 10:02:34.254 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 58 ms
    2025-10-11 10:02:34.256 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 42 in 61 ms

    and the last picture were i do 2 kernel and an enqueueReadBuffer before getting out of JNI for drawing.



    as you can see it is quite stable, between 2 and 5ms. The zéro in every picture are just ecxel blanc value.

    May be the Debug ;))

    here are the same 3 picture but this time a removed nearly all the debug and specially the one that inside
    the function called by pthread.





    So i found the solution just because i did this post. At the end i ask myself and if it was just the debug.
    And it was.

    So very sorry. I feel very stupid rigth now. un peu la honte ;))

    have a good day.
Reply
  • hi,

    i have removed the streamline picture. Because too big. But this morning i woke up very early with a new idéa to how to explain what appends in détail. That wake me up ;))

    I will show you 3 picture, that représent the all proccessing  secnde per seconde for each step of the all processing.

    fir all the picture, abscissa information are time in seconde and in ordonate is the duration of the step for every frame processed. It appends that some time for some reason the debug forgot one step or two during CPU processing.

    By the way i can send you more détailled informations but for 21 seconde of analyse it is 500 line of data for 218 frame. quite a lot.

    First picture. I do the first steep of kernel. YUV convertion and some other kernel and enqueueMapBuffer to make available the data for CPU processing.

     
    As you can see procees time is very stable in time.

    Now, the second picture. In this step where a do all the multicore processing (5 time 4 pthread)
    and (form 1 to 12 time (small amount of data) (2 time 4 pthread) and (1 time 2 pthread)) and some other work in single
    process for drawing pixel for output.i work with a struc of int (1024*1024). i could use Array ?

    Time processing dépending of the amount of data to be processed. But in all the picture i procees
    nearly the same amount of data.I look at the same picture with the camera at the same distance.


    As you can see Multicore processing is completly choatique. And it is for this step that some
    time the debug start missinf frame.

    here is an example of missing frame. look at the NB frame at seconde 32. so one seconde after the app start.

    2025-10-11 10:02:32.337 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 18 in 63 ms
    2025-10-11 10:02:32.340 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:32.363 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 25 ms
    2025-10-11 10:02:32.392 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 54 ms
    2025-10-11 10:02:32.395 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 19 in 57 ms
    2025-10-11 10:02:32.406 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:32.438 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 34 ms
    2025-10-11 10:02:32.633 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 53 ms
    2025-10-11 10:02:32.635 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 23 in 54 ms
    2025-10-11 10:02:32.641 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:32.666 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 28 ms
    2025-10-11 10:02:32.871 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 25 ms
    2025-10-11 10:02:33.010 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 51 ms
    2025-10-11 10:02:33.013 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 28 in 55 ms
    2025-10-11 10:02:33.016 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:33.037 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 23 ms
    2025-10-11 10:02:33.366 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 31 ms
    2025-10-11 10:02:33.796 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 58 ms
    2025-10-11 10:02:33.799 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 37 in 61 ms
    2025-10-11 10:02:33.802 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:33.831 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 31 ms
    2025-10-11 10:02:33.926 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 60 ms
    2025-10-11 10:02:33.930 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 39 in 64 ms
    2025-10-11 10:02:33.934 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 3 ms
    2025-10-11 10:02:33.963 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 33 ms
    2025-10-11 10:02:34.198 10482-10482/com.example.xiaomi E/JNIProcessor:  4 traitement enqueueNDRangeKernel finished in 2 ms
    2025-10-11 10:02:34.229 10482-10482/com.example.xiaomi E/JNIProcessor:  6 enqueueMapBuffer and all kernel start CPU multicore finished in 33 ms
    2025-10-11 10:02:34.254 10482-10482/com.example.xiaomi E/JNIProcessor:  10 traitement enqueueWriteBuffer and all CPU finished in 58 ms
    2025-10-11 10:02:34.256 10482-10482/com.example.xiaomi E/JNIProcessor:  12 enqueueReadBuffer and last kernel finished ready to end JNI before display Nb Frame 42 in 61 ms

    and the last picture were i do 2 kernel and an enqueueReadBuffer before getting out of JNI for drawing.



    as you can see it is quite stable, between 2 and 5ms. The zéro in every picture are just ecxel blanc value.

    May be the Debug ;))

    here are the same 3 picture but this time a removed nearly all the debug and specially the one that inside
    the function called by pthread.





    So i found the solution just because i did this post. At the end i ask myself and if it was just the debug.
    And it was.

    So very sorry. I feel very stupid rigth now. un peu la honte ;))

    have a good day.
Children
No data