Dear colleagues.
I'm doing this academic nature project helping a colleague from another professional area (public administration) using an ARM Cortex-A53 that was sent to me friendly by FriendlyARM for testing.
In this case, I'm using an open source used in our country for bicycle counting the bike lane and bike lanes.
The project still needs to be matured and would immediately improve the collection of the same for the ARM architecture to because so far has only been tested on Intel architectural.
I've done a first compilation NanoPI and everything worked, including the software has worked.
But is that I can get the maximum performance improving compilation?
can anybody help me?
With reading suggestions and tips change the code? Always focusing on the chip Samsung S5P6818 Octa-Core Cortex-A53, 400M Hz - 1.4G Hz
Who wants to know the source can be found, is a basic design still maturing:: GitHub - carlosdelfino/ContadorDeCiclistas
Thanks.
A generic way for analysing performance and optimising with GCC is to use the profiling options built into it. However, I don't know if they are efficient on ARM systems.
The GCC options for generating profiling informations are :
-pg -fprofile-generate --coverage
You need to also pass these options to LD.
Once the software compiled and linked with these options, profiling informations files will be generated in the source tree when you use the software on the same machine.
So, first, you'll have to use the software like you're used to, on the build machine.
Then, you can either use :
gprof bin/CycloTracker gmon.out > performances_report.txt
This will provide you informations about how much CPU time was consumed by each executed function. This should help you decide which part needs to be optimised first.
See gprof complete documentation for more informations about how to analyse the performances of your binary with gprof.
gcov CycloTracker.cpp
This will generate a ton of gcov files, approximatively one per source-file referenced in the one passed to gcov.
Each gcov file are annotated source files which provides you informations about which function is executed and how many times it is executed.
Multiple software exist for parsing and presenting these informations in a more meaningful way. Examples : LCOV, Gcovr, ...
More informations about GCOV are provided in the GCC documentation
Once a profile generated, rebuilding the whole software with the following options passed to GCC, and LD, will generate a Profile Guided Optimised binary :
-fprofile-correction -fprofile-use
Similar flags and tools also exist for CLANG.
Thank @myy.
I will examine the suggested material, read the documentation and do suggested.
I added to the Makefile three directives to compile the code specifically for Cortex-A53 and NEON.
-mtune = cortex-m53 -mcpu = cortex-A53 -mfpu = neon
In the case of NEON I know there are other options, but still could not identify the best for this chip S5P6818 .
ARM specific options for GCC are specified in this chapter of the official documentation.
If automatic NEON optimisations work like automatic SIMD optimisations for x86 architectures, you might need to use aligned memory allocators like aligned_alloc, aligned_malloc and, in C++, operator new to ensure the compiler that everything will be correctly aligned and, therefore, let it use SIMD optimisations as much as it can.
Analysis of the assembly produced by hotspot is still highly recommended though. In my opinion, it might help to :
This web service could help you in this task, as it provides a quick look at the assembly code produced by GCC from a specified C/C++ source code.
I still could not get results with the profiling tried several actions made a quick study of the manual, and everything seems to be correct, I performed the new instructions added to the Makefile but only once was generated gmon.out file, but this still did not bring any information, I tried several times and nothing, the archive file is not generated, only the extension .gcno files are generated.
Now I try again, clearing the workspace and generating all the files again, but could not identify what I'm doing wrong.
If I can not positive, I will leave for OpenCV studies and maybe recompile it specifically for the Cortex-A53 with NEON.
Glad you got it working.
I take note of this interesting gprof/gconv quirk.
I found the cause of the problem.
I was in the "CV::waithKey()" function. It was not properly capturing the output instruction with [esc] then to stop the code with [ctrl] + [c] does not generate the file.
Now it is generating the file correctly.