Hi all,
It is a well known fact that performing an aligned vector load with an unaligned memory address should lead to segmentation fault.
However, when I do try to run code segment below using the same, i do not see any segmentation fault.
------------------------------------------------------------------------------------------------
#include <stdio.h>#include <stdlib.h>#include <time.h>#include "arm_neon.h"void add (uint32x4_t *data_a,uint32x4_t *data_b) { /* Set each sixteen values of the vector to 3. * * Remark: a 'q' suffix to intrinsics indicates * the instruction run for 128 bits registers. */ *data_a = vaddq_u32 (*data_a, *data_b);}int main (int argc,char** argv) { unsigned int n = atoi(argv[1]); /* Create custom arbitrary data. */ uint32_t uint32_data_a[n]; uint32_t uint32_data_b[n]; uint32_t uint32_data_c[n]; struct timespec start,end; for(uint32_t i = 1; i <= n ; i+=1) { uint32_data_a[i-1] = i; uint32_data_b[i-1] = i; uint32_data_c[i-1] = i; } /* Create the vector with our data. */ uint32x4_t data_a; uint32x4_t data_b; clock_gettime(CLOCK_MONOTONIC,&start); for(int count = 0; count < 10; count++) { for(int i = 1; i < n ; i+=4) { /* Load our custom data into the vector register. */ data_a = vld1q_u32 (uint32_data_a + i); data_b = vld1q_u32 (uint32_data_b + i); /* Call of the add3 function. */ add(&data_a,&data_b); vst1q_u32(uint32_data_c + i,data_a); } } clock_gettime(CLOCK_MONOTONIC,&end); double time_usec=(((double)end.tv_sec * 1000000 + (double)end.tv_nsec/1000) - ((double)start.tv_sec *1000000 + (double)start.tv_nsec/1000)); printf("Time taken for aligned load is : %fus and count is %d \n", time_usec/10,n ); for(uint32_t i = 0; i < n ; i++) printf("%2d ",uint32_data_c[i]); printf("\n"); return 0;}
----------------------------------------------------------------------------------------------------
Clearly almost every access to memory in this case is unaligned? is there any reason for this inconsistent behavior? Thanks in advance.
Aketh TM
I'm sure you've figured it out by now, but just for people who are perusing this topic (as this was my top google search result): unlike their x86 counterparts: VLDR and VSTR do not require aligned addresses. You can see the link below about what these instructions are doing per-cycle to handle the "always unaligned" assumption they must make to allow for this nice behavior.
developer.arm.com/.../neon-load-store-instructions