Support forums

Graphics, Gaming, and VR forum Optimised GPU convolution for low memory integrated devices -such as arm processors /GPUs?

State Accepted Answer
+2 person also asked this people also asked this
Locked Locked
Replies 2 replies
Subscribers 135 subscribers
Views 35834 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimised GPU convolution for low memory integrated devices -such as arm processors /GPUs?

abhi.verma over 4 years ago

I wish to implement convolution on arm mali GPUs and want it to be optimised for both speed and memory ? What's the best way to do this? GEMM based MCMK convolutions are not suited as they utilise a lot of memory. Also, a direct implementation on GPU is way slower than the corresponding CPU version. Any time for memory reshaping should be taken into account for timing calculations.

Top replies

Kévin Petit over 4 years ago +2 verified

Hi, Here are a few leads: 1. Have you considered using the Arm Compute Library [1]? It supports a number of convolution kernels optimised for Mali GPUs. We'd love to hear if your use-case isn't covered...

Parents

+2 Kévin Petit over 4 years ago

Hi,

Here are a few leads:

1. Have you considered using the Arm Compute Library [1]? It supports a number of convolution kernels optimised for Mali GPUs. We'd love to hear if your use-case isn't covered or if the library isn't convenient to use for some reason.

2. You could try to use sub group operations to exchange data in a direct implementation.

Hope this helps.

Regards,

Kévin

[1] github.com/.../ComputeLibrary
Cancel
Up +2 Down

Cancel

Reply

+2 Kévin Petit over 4 years ago

Hi,

Here are a few leads:

1. Have you considered using the Arm Compute Library [1]? It supports a number of convolution kernels optimised for Mali GPUs. We'd love to hear if your use-case isn't covered or if the library isn't convenient to use for some reason.

2. You could try to use sub group operations to exchange data in a direct implementation.

Hope this helps.

Regards,

Kévin

[1] github.com/.../ComputeLibrary
Cancel
Up +2 Down

Cancel

Children

0 abhi.verma over 4 years ago in reply to Kévin Petit

Hi, I am primarily working with OpenCL 1.2 and subgroups is not supported till OpenCL 2.0. Another thing is I wish to know about the implementation details regarding the best way to do convolution in terms of memory and performance. I am primarily concerned with single kernel convolution. Kindly help.
Cancel
Up 0 Down

Cancel