Hello ArmNN experts,I'm currently facing an issue at runtime when using a statically built ArmNN lib.
I have been able to build a STATIC version of libarmnn with all dependancies and then build my own app for a Wandboard target (armv7).At runtime I got this error: "ERROR: None of the preferred backends [CpuRef ] are supported. Current platform provides "
When I compile SHARED libraries (*.so) the exact same app is running fine, and the inference is done as expected.
ArmNN version: 21.02Model/Target: TFLite on armv7Build options: cmake .. -DCMAKE_LINKER=/usr/bin/arm-linux-gnueabihf-ld -DCMAKE_C_COMPILER=/usr/bin/arm-linux-gnueabihf-gcc -DCMAKE_CXX_COMPILER=/usr/bin/arm-linux-gnueabihf-g++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=14 -DCMAKE_CXX_FLAGS=-mfpu=neon -DARMCOMPUTE_ROOT=$BASEDIR/ComputeLibrary -DARMCOMPUTE_BUILD_DIR=$BASEDIR/ComputeLibrary/build -DBOOST_ROOT=$BASEDIR/boost_1_64_0 -DTF_GENERATED_SOURCES=$BASEDIR/tensorflow-protobuf -DBUILD_TF_PARSER=0 -DBUILD_ONNX_PARSER=0 -DONNX_GENERATED_SOURCES=$BASEDIR/onnx -DBUILD_TF_LITE_PARSER=1 -DTF_LITE_GENERATED_PATH=$BASEDIR/tflite -DFLATBUFFERS_ROOT=$BASEDIR/flatbuffers-arm32 -DFLATC_DIR=$BASEDIR/flatbuffers/build -DPROTOBUF_ROOT=$BASEDIR/protobuf-arm -DARMCOMPUTENEON=1 -DARMNNREF=1
Does anyone have an idea of a fix or additionnal investigations ??Thanks!!
That's interesting, thanks for the update!
I guess when you build ArmNN as a shared library, it's then loaded at runtime and this global variable is created, which leads to registering the backend:https://github.com/ARM-software/armnn/blob/branches/armnn_21_05/src/backends/reference/RefRegistryInitializer.cpp
When you build it statically, then, probably, the linker strips this piece and only leaves the functions that are needed for your application (but I'm not sure about that).
This is maybe the reason but I didn't found any informations about this topic in ARM documentation.
I've also been able to compile an application using the CpuAcc (aka Neon). Issues where coming from missing lib in my build command.
But when I run the app I got a segmentation fault when I try to optimize the network. Keep on digging ...
For information I'm trying this static approach instead of shared because I have been a little bit disappointed by the performance of ArmNN inference compared to TF-Lite. And I've read somewhere that using shared libs can have an impact on performances.
From what I've seen the static version of CpuRef is a little bit faster for my usecase than the shared version: 5~10% faster for inference.
By using the "-whole-archive" in the compiler command line, the manual backend register I mentionned earlier can be removed from the application ... as in the SHARED tests.
This is cool but this does not fix the segmentation fault when trying to use the CpuAcc backend, and this generates a huge binary size which is not the goal of using a STATIC lib :-)