Hi all,
It is a well known fact that performing an aligned vector load with an unaligned memory address should lead to segmentation fault.
However, when I do try to run code segment below using the same, i do not see any segmentation fault.
------------------------------------------------------------------------------------------------
#include <stdio.h>#include <stdlib.h>#include <time.h>#include "arm_neon.h"void add (uint32x4_t *data_a,uint32x4_t *data_b) { /* Set each sixteen values of the vector to 3. * * Remark: a 'q' suffix to intrinsics indicates * the instruction run for 128 bits registers. */ *data_a = vaddq_u32 (*data_a, *data_b);}int main (int argc,char** argv) { unsigned int n = atoi(argv[1]); /* Create custom arbitrary data. */ uint32_t uint32_data_a[n]; uint32_t uint32_data_b[n]; uint32_t uint32_data_c[n]; struct timespec start,end; for(uint32_t i = 1; i <= n ; i+=1) { uint32_data_a[i-1] = i; uint32_data_b[i-1] = i; uint32_data_c[i-1] = i; } /* Create the vector with our data. */ uint32x4_t data_a; uint32x4_t data_b; clock_gettime(CLOCK_MONOTONIC,&start); for(int count = 0; count < 10; count++) { for(int i = 1; i < n ; i+=4) { /* Load our custom data into the vector register. */ data_a = vld1q_u32 (uint32_data_a + i); data_b = vld1q_u32 (uint32_data_b + i); /* Call of the add3 function. */ add(&data_a,&data_b); vst1q_u32(uint32_data_c + i,data_a); } } clock_gettime(CLOCK_MONOTONIC,&end); double time_usec=(((double)end.tv_sec * 1000000 + (double)end.tv_nsec/1000) - ((double)start.tv_sec *1000000 + (double)start.tv_nsec/1000)); printf("Time taken for aligned load is : %fus and count is %d \n", time_usec/10,n ); for(uint32_t i = 0; i < n ; i++) printf("%2d ",uint32_data_c[i]); printf("\n"); return 0;}
----------------------------------------------------------------------------------------------------
Clearly almost every access to memory in this case is unaligned? is there any reason for this inconsistent behavior? Thanks in advance.
Aketh TM
Should have been more specific. I was referring to the ARMv7 case as this is when the alignment requirements are loosened. The alignment bit is off by default here as well and all instructions sans the LDM/STM variants will work with the added small penalties.