This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Explicit vector style compute kernels on Bifrost G7x/G31?

pixelio over 7 years ago

Will the compute kernel compilers (OpenCL 1.2 or some future Vulkan 1.1) still support explicit vector style programming (e.g. float4, uint4, int4, half8) on the Bifrost GPUs with 4-wide execution engines?

I'm asking because I have some "embarrassingly parallel" algorithms that map well to SIMD-style vector programming but benefit from inter-lane communications.

On a scalar-per-thread design, this can be accomplished with shuffles.

But if shuffles aren't available I would prefer to use explicit vectors and permutations.

Any tips on whether this is possible on OpenCL?

Or, maybe, VK 1.1 will bring subgroup shuffles to G7x/G31?

Thanks,

-ASM

Top replies

Parents

+1 Peter Harris over 7 years ago in reply to pixelio

Pre-Bifrost Mali is a SIMD architecture.

Bifrost is 4-wide SIMT with a 32-bit data path. Narrower types treat the 32-bit path as a small SIMD unit (e.g. to get efficiency benefits for fp16 computation you need something which converts into clean SIMD vec2 operations).

For vector operations on 32-bit types, such as your examples, the two architectures should be similar. You'll generate efficient SIMD code, and the compiler can always scalarize the equivalent of the SIMD code for SIMT architectures. The inverse is not true - vectorizing code can be difficult - so always try to write vector code where you can.
Cancel
Vote up +1 Vote down

Cancel

Reply

+1 Peter Harris over 7 years ago in reply to pixelio

Pre-Bifrost Mali is a SIMD architecture.

Bifrost is 4-wide SIMT with a 32-bit data path. Narrower types treat the 32-bit path as a small SIMD unit (e.g. to get efficiency benefits for fp16 computation you need something which converts into clean SIMD vec2 operations).

For vector operations on 32-bit types, such as your examples, the two architectures should be similar. You'll generate efficient SIMD code, and the compiler can always scalarize the equivalent of the SIMD code for SIMT architectures. The inverse is not true - vectorizing code can be difficult - so always try to write vector code where you can.
Cancel
Vote up +1 Vote down

Cancel

Children

No data