Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.

We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.

Thank you for your understanding.


This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vectors optimization

Hi,

 

I have some dataset used by other algorithms. So, the layout of it cannot be modified.

That is my problem.

So what is left: scattered data in memory but in a contiguous way per group and all group are of the same length.

 

gr1: offset 0    :  AA BB CC DD

gr2: offset 256 : EE FF GG HH

gr3: offset 512 : II JJ KK LL

gr4: offset 768 : MM NN OO PP

Keep in mind that EE == (AA+256), II == (EE + 256) and so on.

 

And I need:

AA EE II MM

BB FF JJ NN

CC GG KK OO

DD HH LL PP

 

So it is basically a transposition. And we have the vtrn instruction that can do this just fine. BUT it needs to be repeated three times. (see vectors arrangement)

Is there a way to avoid those three instruction to arrange my vectors properly or are there any other (faster) way to do it?

Currently I need to load the contiguous data and permute the vectors. VLDn don't seem to help me.

 

Do you see a possibility I missed somewhere?

Thanks

Parents Reply Children
No data