processus time double after fews seconde

hi,

my application use android java JNI to process camera vidéo in real time. The C part use openCL and multicore threading with médiatek 9200+ and mali G715.

Something strange append with the application. After fews seconde of processing, 70-80 frames, i go from 60-70ms per frame to 140-160 per frame.

What i am doing :

1) i do some kernel to conver YUV and extract data

2) Then i procees the extracted data with CPU multicore threading(4 thread at the time) 7 time.

3) Then i use kernel again to extract data and send it back to Java.

If i remove all the CPU work, the GPU time remain stable between 20ms at the begining(70-80 frames)  to 40-45. But with the CPU work time increase dramaticly after 70-80 frame using the same amount of input data. The same problem appened with my old mali g72 after 20 frames.

I tried to use streamline, but as i run windows7 i cannot have the analizer, how just run from windows10. But i can get the graph and i remarked a strange activity on the GPU.

1) the Mali Memory Read Latency after to get red after the 70-80 frames. (it show 25 mega beats ?)

2) the Mali geométry culling rate start to ocsillate. (it show 100%)

3) the Mali geométry efficiency start to ocsillate. (it show 3 trheads)

4) Mali Early ZS rate is red

and many other thing start to ocillate. But Device Thermal State is 100% green.

in fact after 70-80 frames a lot of thing start to ocsillate.

And streamline is to complicated for me. there is too much things to know and anderstant. I do not have the time.

So i am wondering if it could be possible for an expert to analyse it.

As i said i had the same problem on old maliG78 so at some point there is something that goes wrong using in alternance GPU-CPU-GPU. In one of my post about SVM someone told me that using GPU and CPU was not a good odea. But i cannot do with GPU what i am doing with GPU. Or i do not know how to do with GPU what i am doing with CPU. At some time i need to procces data with CPU.

thanks for the help.

Parents
  • hi,

    I made an error in the prévious post. The problem is not solved by removing all the debug. That just inprove the performance. and the picture are wrong because i forgot to modified the abscissa value to the correct time. Both are strating at 30. But i have seen it after posting. Little mistake.

    here are the correct picture without debug. And after more than 50 test it is alway the same. Dépending on the amount of data, after fews seconde there is a drop in time.

    the comment are in the picture like before.

    my conclusion is that  when step 1 (picture 1) slow down. The inpact is a time augmentation in step 2 (picture 2).

    And this is completly strange because if step 1 goes faster step2 should go faster also. But it is the inverse. ?

    I will look at step 1 in more detai to the next post.

Reply
  • hi,

    I made an error in the prévious post. The problem is not solved by removing all the debug. That just inprove the performance. and the picture are wrong because i forgot to modified the abscissa value to the correct time. Both are strating at 30. But i have seen it after posting. Little mistake.

    here are the correct picture without debug. And after more than 50 test it is alway the same. Dépending on the amount of data, after fews seconde there is a drop in time.

    the comment are in the picture like before.

    my conclusion is that  when step 1 (picture 1) slow down. The inpact is a time augmentation in step 2 (picture 2).

    And this is completly strange because if step 1 goes faster step2 should go faster also. But it is the inverse. ?

    I will look at step 1 in more detai to the next post.

Children
  • I suspect the problem is that your algorithm switches between the CPU and the GPU without pipelining them, so both the CPU and GPU are going idle while the other processor is busy. The idle time on a processor often causes frequency scaling control logic to decide that the processor is clocked too high, and so clock frequency gets reduced. 

    How frequency control works is decided by the OEM, so not really something Arm can help with.

  • hi,

    You are rigth. I just check the CPU frequency when running the application and of course after fews seconde the fréquency drop by 4, from 2000 to 400/600 and than back to 2000 for fews ms and than back again to low frequency.

    Thanks for the confirmation.

    I supose that is for energy and heat purpose.

    By the way does someone know witch ARM mobile allow full CPU speed ? would be good to know.

  • hi,

    Sorry i come back to the discussion. So if i anderstoud your answer some processor procees the GPU driver and some other processor the multicore and when i finished both work anothrer processor manage the the display and the camera forcing the processor that use  GPU and CPU became low frequency because it is not used at the moment. That what you called frequency scalling control logic.

    And it is for that reason that every frame can have deffiérent time of processing. Has i can see on the picture.

    It is possible to do what you said ! pipelining CPU and GPU ?

  • hi,

    Using only GPU do not get trouble with CPU frequency scalling control logic because everything ate done to inprove GPU performance.

    The main problem for my algorithme is to use massive CPU work. Because  CPU frequency scalling control logic only work with
    CPU.

    So at the end it is only a problem of heat and battery. Why not adding a cooling system on the chip. Spécially if ARM want to move from mobile to laptop or PC.

    And why increasing the speed of the CPU if we can only use it at an average of 30%. I look at the CPU use with the apk "3C CPU manager" and CPU speed is very rarelly use at is top frequency.

    I do not anderstand why alway inproving speed if it cannot be used. It would be more usefull to have 8 core at 1800 Mhz. On the 92000+ médiatek the X3 never goes faster than 1400 Mhz. May be for vulkan CPU can be usefull but it is only for GPU work not really for CPU work.

    In the old time CPU work was the main purpose, now it is only GPU. But GPU do not work like CPU and I need both to work fast. Not only showing nice image but procees data. Nobody will make a server with only GPU it is not usefull.

    There is matrice work and row work. Both are necessary. May be not yet in phone. But it will come.

  • In fact using mobile take care of energy and heat. At the begining (70 frame),frequency scaling give all the power to the CPU. But in my case it use a lot of work. the algorithme will prefert the GPU work rather than the CPU work. There is no scalling fresuancy about GPU. Or i am not aware. By the way after 70 frame GPU goes faster and CPU slower just for energy and heat safe.

    It is normal, smartphonne are not made for massive computing(CPU) but for for massive frame display[GPU). This is why mobile do not have any cooling système. And ARM surperform in using less énergy. Because of not massive CPU work.

    So the best is to try to reduce CPU work on mobile.

    I am doing something about it. I will let you know.

    But massive CPU on mobile is something to avoid after 6 seconde of work. Does not matter the number of data to process. CPU just drop down working.