Support forums

Architectures and Processors forum Vectors optimization

State Accepted Answer
Locked Locked
Replies 4 replies
Subscribers 350 subscribers
Views 5304 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vectors optimization

MarkL over 8 years ago

Hi,

I have some dataset used by other algorithms. So, the layout of it cannot be modified.

That is my problem.

So what is left: scattered data in memory but in a contiguous way per group and all group are of the same length.

gr1: offset 0 : AA BB CC DD

gr2: offset 256 : EE FF GG HH

gr3: offset 512 : II JJ KK LL

gr4: offset 768 : MM NN OO PP

Keep in mind that EE == (AA+256), II == (EE + 256) and so on.

And I need:

AA EE II MM

BB FF JJ NN

CC GG KK OO

DD HH LL PP

So it is basically a transposition. And we have the vtrn instruction that can do this just fine. BUT it needs to be repeated three times. (see vectors arrangement)

Is there a way to avoid those three instruction to arrange my vectors properly or are there any other (faster) way to do it?

Currently I need to load the contiguous data and permute the vectors. VLDn don't seem to help me.

Do you see a possibility I missed somewhere?

Thanks