Hi,
I have installed ARM RAL version,23.04 on my Linux machine and successfully built static libraries for Release mode with Arm GNU Toolchain 12.2.Rel1 (arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz) for targets NEON and SVE.I enabled build options ,-DBUILD_TESTING=On` and `-DBUILD_EXAMPLES=On. Python version installed on my machine is 3.8.10.and Linux, Perf tool version is perf version 5.15.87,required for running benchmarks.
After build,I installed ARMRAL libraries for 2 targets in 2 different installation folders.
Afterwords,I tried running benchmarks with command, make bench for 2 targets separately one by one but for each one I am getting errors like following for all benchmarks.....
If I direct o/p to file,JSON objects printed are like below indicating that benchmarks not ran at all.:--
{"name": "demodulation_qam256_8", "error": true, "ts": 304990550, "dur": 192588}{"name": "demodulation_qam256_31", "error": true, "ts": 305183343, "dur": 192250}{"name": "demodulation_qam256_32", "error": true, "ts": 305375812, "dur": 194988}
Since I am evaluating ARMRAL for my work and new to ARMRAL, could you kindly look into the error and help me to identify root cause(eg whether I m missing something/making some error) and fix the error soon?
It would help a lot.
Regards,
Pankaj K
---------------------------------------------------------------------------------------------------------------------------------------------------------
Failed to run command: ['/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py', '/home/sitteam/PankajK/arm_lib_2304_build_NEON_static_rel/bench_demodulation', '{"name": "demodulation_qam16_276", "args": "1 276", "reps": 300000}']Traceback (most recent call last): File "/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py", line 65, in <module> main() File "/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py", line 61, in main run(args.exe_path, case_json) File "/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py", line 37, in run all_cycles = [run_perf(args) / float(reps) for _ in range(10)] File "/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py", line 37, in <listcomp> all_cycles = [run_perf(args) / float(reps) for _ in range(10)] File "/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py", line 20, in run_perf result = exec_and_check_cmd(cmd) File "/home/sitteam/PankajK/arm-ran-acceleration-library-23.04/bench/default_runner.py", line 14, in exec_and_check_cmd result.check_returncode() File "/usr/lib/python3.8/subprocess.py", line 448, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout,subprocess.CalledProcessError: Command '['perf', 'stat', '-x', ' ', '-e', 'cycles:u', '/home/sitteam/PankajK/arm_lib_2304_build_NEON_static_rel/bench_demodulation', '1', '276', '300000']' returned non-zero exit status 255.
Hi Nick,
Thanks a lot for providing pointers to debug the issue I am facing.
I am pasting below the Linux console outputs after running 2 commands as indicated above for demodulation benchmark without and with perf tool.
Requesting to help me further to debug and resolve this issue.
Pankaj K.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1)
sitteam@sitteam-Lenovo-S510:~/PankajK/arm_lib_2304_build_NEON_static_rel$ /home/sitteam/PankajK/arm_lib_2304_build_NEON_static_rel/bench_demodulation 1 276 300000bash: /home/sitteam/PankajK/arm_lib_2304_build_NEON_static_rel/bench_demodulation: cannot execute binary file: Exec format error
========================================================================================================
2)
sitteam@sitteam-Lenovo-S510:~/PankajK/arm_lib_2304_build_NEON_static_rel$ perf stat -x " " -e cycles:u /home/sitteam/PankajK/arm_lib_2304_build_NEON_static_rel/bench_demodulation 1 276 300000Error:Access to performance monitoring and observability operations is limited.Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to openaccess to performance monitoring and observability operations for processeswithout CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.More information can be found at 'Perf events and tool security' document:www.kernel.org/.../perf-security.htmlperf_event_paranoid setting is 4: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK>= 0: Disallow raw and ftrace function tracepoint access>= 1: Disallow CPU event access>= 2: Disallow kernel profilingTo make the adjusted perf_event_paranoid setting permanent preserve itin /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
Hi Pankaj,
I think it is the first of the issues that is the most important here. The error message suggests that the binary created is not suitable for the system you are running it on. Most likely this is because you are trying to run it on an x86 system rather than an Arm-based one.
ArmRAL is designed for Arm hardware, and the compiler toolchain you have used is intended to work on x86 bit cross-compile code for it to be run on an Arm system instead.
Perhaps the easiest way to get past this problem if you don't have access to an Arm system yourself, is to use a Cloud service to get access to an Arm-based Linux instance. (AWS, Azure, Oracle, Google and others all have them). For instance on AWS choosing either their Graviton2 or Graviton3 hardware (like c6g or c7g instances) should get you on a system where the default compilers already produce AArch64 binaries and the code should run. Perf should also need minimal set-up to get working.
Do feel free to keep following up here.
Thanks.
Chris
Hi Chris,
Thanks again for your reply providing the likely root causes of the issues I am facing while running ARMRAL lib benchmarks at my end.
In my opinion, you have pointed out the issue, i.e. I have cross compiled using toolchain,arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz on my x86_64 based machine where Ubuntu is installed.I assumed that ARMRAL runs in simulator mode too so that I could run performance benchmarks with perf tool on my Linux machine in simulator mode so I tried running benchmarks with generated static libs in release mode using make bench command.
I will check at my end about how I can access a Cloud service to use an Arm-based Linux instance to run benchmarks as suggested by you and get back if I face any issues further.
Meanwhile, I have a suggestion for ARM RAL dev and maintenance team to include a feature to enable running perf benchmarks in some kind of simulator mode which could help developers who are enhancing ARMRAL library with their new code which can be benchmarked for available architectures without having direct access to Cloud service to get access to an Arm-based Linux instance or an ARM system itself.
Requesting to please let me know abt any such existing feature to run tests, do FEC simulations and do benchmarking in simulator mode and how to access and use it.
Hi.
Unfortunately there's not great tooling around that use-case if you're not on an Arm-based system. Your best options are QEMU and DynamoRio, assuming you're happy with instruction counting. However perf won't give you anything useful, and you'd also need a model of how the cores run to be able to work out anything useful which are not open source things.
Sorry.
Thanks for your reply.I am checking with my manager abt getting Arm system access or access to Cloud service for Arm-based Linux instance as you indicated.
1 issue I am still facing is following:--
As indicated in "Run tests" section of readme, since I am not developing on an AArch64 machine, I compiled to configure the library toprefix the tests with `qemu-aarch64` for NEON as follows to run the tests on my x86_64,ubuntu linux machine :--
cmake -DBUILD_TESTING=On -DARMRAL_TEST_RUNNER=qemu-aarch64 -DBUILD_EXAMPLES=On -DBUILD_SIMULATION=On -DCMAKE_INSTALL_PREFIX=/home/sitteam/PankajK/arm_lib_2304_install_neon_static_rel /home/sitteam/PankajK/arm-ran-acceleration-library-23.04 -DCMAKE_C_COMPILER=/home/sitteam/PankajK/arm_gnu_toolchain_12_2/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -DCMAKE_CXX_COMPILER=/home/sitteam/PankajK/arm_gnu_toolchain_12_2/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-g++
make check
------------------------------------------------------------------------------
But I got following error abt which I am unsure, I mean why qemu-aarch64 executable is not getting generated despite providing cmake option in bold as above? Requesting to provide pointers to debug/resolve this issue. .Also can you provide more info abt "Dynamo Rio"?
Start 56: tail_biting_convolutional_decodingCould not find executable qemu-aarch64Looked in the following places:qemu-aarch64qemu-aarch64Release/qemu-aarch64Release/qemu-aarch64Debug/qemu-aarch64Debug/qemu-aarch64MinSizeRel/qemu-aarch64MinSizeRel/qemu-aarch64RelWithDebInfo/qemu-aarch64RelWithDebInfo/qemu-aarch64Deployment/qemu-aarch64Deployment/qemu-aarch64Development/qemu-aarch64Development/qemu-aarch64Unable to find executable: qemu-aarch6456/56 Test #56: tail_biting_convolutional_decoding ...***Not Run 0.00 sec
0% tests passed, 56 tests failed out of 56
Total Test time (real) = 0.01 sec
------------------------------------------------------------------------------------------------
Hi there,
Unfortunately qemu-aarch64 is not distributed as part of ArmRAL. Instead please see https://www.qemu.org/ for instructions on how to download and build it for your machine.
qemu-aarch64
Once you have QEMU installed you should provide the full path to the qemu-aarch64 executable in the ARMRAL_TEST_RUNNER CMake variable. Alternatively, if you place qemu-aarch64 on your executable path you can set -DARMRAL_TEST_RUNNER=qemu-aarch64 as you did before.
ARMRAL_TEST_RUNNER
-DARMRAL_TEST_RUNNER=qemu-aarch64
Nick