This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Measuring performance of underlying OpenCL kernels in Arm Compute Library?

Is there an easy way to measure performance of the underlying OpenCL kernels in the Arm Compute Library? Ideally, I'd like to use the OpenCL clGetEventProfilingInfo call. 

I'm running my experiments on Hikey-960, Hikey-970, and Odroid-C4 platforms using the Mali GPU.

Thanks!

Parents
  • Sorry for the slow response - for ACL you'll get the best response on their github issues page: https://github.com/ARM-software/ComputeLibrary/issues 

    But I've consulted an ACL expert, and yes it is possible to profile OpenCl kernel performance through ACL.

    In ACL there is a flag for building the graph examples (the ones in the examples/ folder) for benchmarking. The flag I am referring to is https://github.com/ARM-software/ComputeLibrary/blob/master/tests/SConscript#L31  By default, the flag is set to true.

    If ACL has been built with this flag, the developer can find all the graph examples ready to be benchmarked under build/tests (e.g. benchmark_graph_mobilenet, benchmark_graph_alexnet)

    Example usages - 

    OpenCL - F32 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f32 --iterations=100

    OpenCL - F32 (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f32 --iterations=100 --instruments=opencl_timer_ms

    OpenCL - F16 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16 --iterations=100

    OpenCL - F16 (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16 --iterations=100 --instruments=opencl_timer_ms 

    OpenCL - F16+fast-math (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16,--fast-math  --iterations=100

    OpenCL - F16+fast-math (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16,--fast-math --iterations=100 --instruments=opencl_timer_ms 

    NEON - F32 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=NEON,--type=f32 --iterations=100

    NEON - F32 (NEON kernel profiling): ./benchmark_graph_alexnet --example_args=--target=NEON,--enable-tuner,--type=f32 --iterations=100 --instruments=scheduler_timer_ms

Reply
  • Sorry for the slow response - for ACL you'll get the best response on their github issues page: https://github.com/ARM-software/ComputeLibrary/issues 

    But I've consulted an ACL expert, and yes it is possible to profile OpenCl kernel performance through ACL.

    In ACL there is a flag for building the graph examples (the ones in the examples/ folder) for benchmarking. The flag I am referring to is https://github.com/ARM-software/ComputeLibrary/blob/master/tests/SConscript#L31  By default, the flag is set to true.

    If ACL has been built with this flag, the developer can find all the graph examples ready to be benchmarked under build/tests (e.g. benchmark_graph_mobilenet, benchmark_graph_alexnet)

    Example usages - 

    OpenCL - F32 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f32 --iterations=100

    OpenCL - F32 (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f32 --iterations=100 --instruments=opencl_timer_ms

    OpenCL - F16 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16 --iterations=100

    OpenCL - F16 (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16 --iterations=100 --instruments=opencl_timer_ms 

    OpenCL - F16+fast-math (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16,--fast-math  --iterations=100

    OpenCL - F16+fast-math (OpenCL kernel profiling): ./benchmark_graph_alexnet --example_args=--target=CL,--enable-tuner,--type=f16,--fast-math --iterations=100 --instruments=opencl_timer_ms 

    NEON - F32 (Wall clock time): ./benchmark_graph_alexnet --example_args=--target=NEON,--type=f32 --iterations=100

    NEON - F32 (NEON kernel profiling): ./benchmark_graph_alexnet --example_args=--target=NEON,--enable-tuner,--type=f32 --iterations=100 --instruments=scheduler_timer_ms

Children
No data