Hi there :),
I have a question, I am preforming a study on Mali GPUs, the study is done using OpenCL to execute programs on the GPU side. I am trying to use Streamline tool for profiling.
I have installed Gator version 7.8, and Streamline version is 7.8 as well. But I am not able to get OpenCL annotations at all as depicted in the following figure. I tried to reconfigure the DDK, but I didn't find any options related to openCL, is there a certain guide to follow in order to get the output as in the below image?
My Hardware platform is Odroid N2+, Linux based with Mali GPU G52.
In the provided image everything is clearly shown, OpenCL Queues, Threads execution.. etc
Any Support would be much appreciated. Provided bellow, all Mali flags taken from my Linux Kernel config file.
# CONFIG_DRM_MALI_DISPLAY is not set
# CONFIG_MALI_MIDGARD_DVFS is not set
# CONFIG_MALI_MIDGARD_ENABLE_TRACE is not set
# CONFIG_MALI_DEVFREQ is not set
# CONFIG_MALI_DMA_FENCE is not set
# CONFIG_MALI_CORESTACK is not set
# CONFIG_MALI_PRFCNT_SET_SECONDARY is not set
# CONFIG_MALI_DEBUG is not set
# CONFIG_MALI_FENCE_DEBUG is not set
# CONFIG_MALI_NO_MALI is not set
# CONFIG_MALI_SYSTEM_TRACE is not set
# CONFIG_MALI_JOB_DUMP is not set
# CONFIG_MALI_2MB_ALLOC is not set
# CONFIG_MALI_PWRSOFT_765 is not set
# CONFIG_GATOR_MALI_4XXMP is not set
Additional hint is that its mentioned in Streamline FAQ Page (https://developer.arm.com/tools-and-software/embedded/legacy-tools/ds-5-development-studio/streamline/streamline-faqs)
that one should build Mali DDK with the following flags using scons. ['cl=1; streamline_annotate=1; instr=1; timeline=cl_timeline; gator=2'.];
1. The DDK does not have the Sconstruct file, so scons command fails.
2. I didn''t find any flags related to OpenCL.
3. The <process_name>.instr_config is never generated.
Looking forward for your support.
OpenCL has been proposed as a means of accelerating functional computation using FPGA and GPU accelerators. Although it provides ease of programmability and code portability, questions remain about the performance portability and underlying vendor’s compiler capabilities to generate efficient implementations without user-defined, platform specific optimizations. In this work, we systematically evaluate this by formalizing a design space exploration strategy using platformindependent micro-architectural and application-specific optimizations only. The optimizations are then applied across Altera FPGA, NVIDIA GPU and ARM Mali GPU platforms for three computing examples, namely matrix-matrix multiplication, binomial-tree option pricing and 3-dimensional finite difference time domain. DQFanFeedback Survey