Hi there :),
I have a question, I am preforming a study on Mali GPUs, the study is done using OpenCL to execute programs on the GPU side. I am trying to use Streamline tool for profiling.
I have installed Gator version 7.8, and Streamline version is 7.8 as well. But I am not able to get OpenCL annotations at all as depicted in the following figure. I tried to reconfigure the DDK, but I didn't find any options related to openCL, is there a certain guide to follow in order to get the output as in the below image?
My Hardware platform is Odroid N2+, Linux based with Mali GPU G52.
In the provided image everything is clearly shown, OpenCL Queues, Threads execution.. etc
Any Support would be much appreciated. Provided bellow, all Mali flags taken from my Linux Kernel config file.
# CONFIG_DRM_MALI_DISPLAY is not set
# CONFIG_MALI_MIDGARD_DVFS is not set
# CONFIG_MALI_MIDGARD_ENABLE_TRACE is not set
# CONFIG_MALI_DEVFREQ is not set
# CONFIG_MALI_DMA_FENCE is not set
# CONFIG_MALI_CORESTACK is not set
# CONFIG_MALI_PRFCNT_SET_SECONDARY is not set
# CONFIG_MALI_DEBUG is not set
# CONFIG_MALI_FENCE_DEBUG is not set
# CONFIG_MALI_NO_MALI is not set
# CONFIG_MALI_SYSTEM_TRACE is not set
# CONFIG_MALI_JOB_DUMP is not set
# CONFIG_MALI_2MB_ALLOC is not set
# CONFIG_MALI_PWRSOFT_765 is not set
# CONFIG_GATOR_MALI_4XXMP is not set
Additional hint is that its mentioned in Streamline FAQ Page (https://developer.arm.com/tools-and-software/embedded/legacy-tools/ds-5-development-studio/streamline/streamline-faqs)
that one should build Mali DDK with the following flags using scons. ['cl=1; streamline_annotate=1; instr=1; timeline=cl_timeline; gator=2'.];
1. The DDK does not have the Sconstruct file, so scons command fails.
2. I didn''t find any flags related to OpenCL.
3. The <process_name>.instr_config is never generated.
Looking forward for your support.
Where's the screen capture from? We have a couple of useful blogs taking you through Streamline for ML (Linux here, Android here, but with notes on annotations), but they don't seem to answer your question, and I'm not sure myself, so I'll raise your question with the internal experts...
Thanks a lot for your reply.
In this link [developer.arm.com/.../OpenCL-mode its described how to run streamline with OpenCL mode. Its mentioned that "OpenCL mode is only available on platforms with an OpenCL timeline compiled into the Mali driver."
However, I don't see any configurations related to OpenCL in my linux kernel. I only see options related to Gator.
As mentioned, I am also not able to build the DDK, since the scons command fails due to the absence of the Sconstruct file.
I am developing an application that requires some computations, and I am using opencl for these computations, thats why I would like to see how it works on Streamline.
Hi AhmedYou should not need to recompile the kernel module, and in any case, those instructions refer to the closed-source userspace driver.Provided you have a sufficiently recent Mali driver on your board, OpenCL timeline support is built in. If your driver version is too old, you will need to seek an update from the vendor.Assuming your driver is sufficiently recent, you can enable the OpenCL timeline by creating a file called '.mali_config' (or more recently 'mali_debug.config') with the following lines:
Thanks for your reply.
I have created the 2 files. and placed them in the same directory where my executable resides. I ran the Gator daemon, and connected it to Streamline in my PC. However, I still dont see any OpenCL options.Am I looking at the wrong place? Regarding Gator, I am running Gator Application as Daemon service.
I tried saving the captured data locally, and look for any info related to opencl and here is what I found.
Thank you. Regards,
Shall I change something in the provided MALI DDK flags? (I have mentioned them in the main post)
Thank you for your support.
OpenCL has been proposed as a means of accelerating functional computation using FPGA and GPU accelerators. Although it provides ease of programmability and code portability, questions remain about the performance portability and underlying vendor’s compiler capabilities to generate efficient implementations without user-defined, platform specific optimizations. In this work, we systematically evaluate this by formalizing a design space exploration strategy using platformindependent micro-architectural and application-specific optimizations only. The optimizations are then applied across Altera FPGA, NVIDIA GPU and ARM Mali GPU platforms for three computing examples, namely matrix-matrix multiplication, binomial-tree option pricing and 3-dimensional finite difference time domain. DQFanFeedback Survey