We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I wish to implement an optimised sgemm for Mali MidGard Gpu whichas of now only support OpenCL 1.2. As far as I know, OpenCL 1.2 doesn't support subgroup extensions and Mali GPUs don't have any benefits for local memory tiling. So What should be the best way to perform sgemm on Mali without any memory reshaping such that it performs better or at least equivalent to the cpu implementation ? KIndly give me some pointers other than Arm Compute ML. Really appreciate it.
Old issue. Closing.