Is there an easy way to measure performance of the underlying OpenCL kernels in the Arm Compute Library? Ideally, I'd like to use the OpenCL clGetEventProfilingInfo call.
I'm running my experiments on Hikey-960, Hikey-970, and Odroid-C4 platforms using the Mali GPU.
Thanks!
Sorry for the slow response - for ACL you'll get the best response on their github issues page: https://github.com/ARM-software/ComputeLibrary/issues
But I've consulted an ACL expert, and yes it is possible to profile OpenCl kernel performance through ACL.
In ACL there is a flag for building the graph examples (the ones in the examples/ folder) for benchmarking. The flag I am referring to is https://github.com/ARM-software/ComputeLibrary/blob/master/tests/SConscript#L31 By default, the flag is set to true.
If ACL has been built with this flag, the developer can find all the graph examples ready to be benchmarked under build/tests (e.g. benchmark_graph_mobilenet, benchmark_graph_alexnet)
Example usages -
OpenCL - F32 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f32 --iterations=100
OpenCL - F32 (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f32 --iterations=100 --instruments=opencl_timer_ms
OpenCL - F16 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16 --iterations=100
OpenCL - F16 (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16 --iterations=100 --instruments=opencl_timer_ms
OpenCL - F16+fast-math (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16,--fast-math --iterations=100
OpenCL - F16+fast-math (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16,--fast-math --iterations=100 --instruments=opencl_timer_ms
NEON - F32 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=NEON,--type=f32 --iterations=100
NEON - F32 (NEON kernel profiling): ./benchmark_graph_alexnet --example_args=--target=NEON,--enable-tuner,--type=f32 --iterations=100 --instruments=scheduler_timer_ms