This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code optimization for a Samsung S5P6818 Octa-Core Cortex-A53, 400M Hz - 1.4G Hz

Dear colleagues.

I'm doing this academic nature project helping a colleague from another professional area (public administration) using an ARM Cortex-A53 that was sent to me friendly by FriendlyARM for testing.

In this case, I'm using an open source used in our country for bicycle counting the bike lane and bike lanes.

The project still needs to be matured and would immediately improve the collection of the same for the ARM architecture to because so far has only been tested on Intel architectural.

I've done a first compilation NanoPI and everything worked, including the software has worked.

But is that I can get the maximum performance improving compilation?

can anybody help me?

With reading suggestions and tips change the code? Always focusing on the chip Samsung S5P6818 Octa-Core Cortex-A53, 400M Hz - 1.4G Hz

Who wants to know the source can be found, is a basic design still maturing:: GitHub - carlosdelfino/ContadorDeCiclistas

Thanks.

Parents
  • A generic way for analysing performance and optimising with GCC is to use the profiling options built into it. However, I don't know if they are efficient on ARM systems.

    The GCC options for generating profiling informations are :

    -pg -fprofile-generate --coverage

    You need to also pass these options to LD.

    Once the software compiled and linked with these options, profiling informations files will be generated in the source tree when you use the software on the same machine.

    So, first, you'll have to use the software like you're used to, on the build machine.

    Then, you can either use :

    • gprof

    gprof bin/CycloTracker gmon.out > performances_report.txt

    This will provide you informations about how much CPU time was consumed by each executed function. This should help you decide which part needs to be optimised first.

    See gprof complete documentation for more informations about how to analyse the performances of your binary with gprof.

    • gcov (Code coverage)

    gcov CycloTracker.cpp

    This will generate a ton of gcov files, approximatively one per source-file referenced in the one passed to gcov.

    Each gcov file are annotated source files which provides you informations about which function is executed and how many times it is executed.

    Multiple software exist for parsing and presenting these informations in a more meaningful way. Examples : LCOV, Gcovr, ...

    More informations about GCOV are provided in the GCC documentation

    Once a profile generated, rebuilding the whole software with the following options passed to GCC, and LD, will generate a Profile Guided Optimised binary :

    -fprofile-correction -fprofile-use

    Similar flags and tools also exist for CLANG.

Reply
  • A generic way for analysing performance and optimising with GCC is to use the profiling options built into it. However, I don't know if they are efficient on ARM systems.

    The GCC options for generating profiling informations are :

    -pg -fprofile-generate --coverage

    You need to also pass these options to LD.

    Once the software compiled and linked with these options, profiling informations files will be generated in the source tree when you use the software on the same machine.

    So, first, you'll have to use the software like you're used to, on the build machine.

    Then, you can either use :

    • gprof

    gprof bin/CycloTracker gmon.out > performances_report.txt

    This will provide you informations about how much CPU time was consumed by each executed function. This should help you decide which part needs to be optimised first.

    See gprof complete documentation for more informations about how to analyse the performances of your binary with gprof.

    • gcov (Code coverage)

    gcov CycloTracker.cpp

    This will generate a ton of gcov files, approximatively one per source-file referenced in the one passed to gcov.

    Each gcov file are annotated source files which provides you informations about which function is executed and how many times it is executed.

    Multiple software exist for parsing and presenting these informations in a more meaningful way. Examples : LCOV, Gcovr, ...

    More informations about GCOV are provided in the GCC documentation

    Once a profile generated, rebuilding the whole software with the following options passed to GCC, and LD, will generate a Profile Guided Optimised binary :

    -fprofile-correction -fprofile-use

    Similar flags and tools also exist for CLANG.

Children
  • Thank @myy.

    I will examine the suggested material, read the documentation and do suggested.

    I added to the Makefile three directives to compile the code specifically for Cortex-A53 and NEON.

    -mtune = cortex-m53 -mcpu = cortex-A53 -mfpu = neon

    In the case of NEON I know there are other options, but still could not identify the best for this chip S5P6818 .

  • ARM specific options for GCC are specified in this chapter of the official documentation.

    If automatic NEON optimisations work like automatic SIMD optimisations for x86 architectures, you might need to use aligned memory allocators like aligned_alloc, aligned_malloc and, in C++, operator new to ensure the compiler that everything will be correctly aligned and, therefore, let it use SIMD optimisations as much as it can.

    Analysis of the assembly produced by hotspot is still highly recommended though. In my opinion, it might help to :

    • analyse a current hotspot in your application
    • examine the assembly code
    • try to produce a better assembly code
    • if you can, try to search how you can help the compiler produces that kind of assembly code

    This web service could help you in this task, as it provides a quick look at the assembly code produced by GCC from a specified C/C++ source code.

  • I still could not get results with the profiling tried several actions made a quick study of the manual, and everything seems to be correct, I performed the new instructions added to the Makefile but only once was generated gmon.out file, but this still did not bring any information, I tried several times and nothing, the archive file is not generated, only the extension .gcno files are generated.

    Now I try again, clearing the workspace and generating all the files again, but could not identify what I'm doing wrong.

    If I can not positive, I will leave for OpenCV studies and maybe recompile it specifically for the Cortex-A53 with NEON.

       

  • I found the cause of the problem.

    I was in the "CV::waithKey()" function. It was not properly capturing the output instruction with [esc] then to stop the code with [ctrl] + [c] does not generate the file.

    Now it is generating the file correctly.