Hi all,
It is a well known fact that performing an aligned vector load with an unaligned memory address should lead to segmentation fault.
However, when I do try to run code segment below using the same, i do not see any segmentation fault.
------------------------------------------------------------------------------------------------
#include <stdio.h>#include <stdlib.h>#include <time.h>#include "arm_neon.h"void add (uint32x4_t *data_a,uint32x4_t *data_b) { /* Set each sixteen values of the vector to 3. * * Remark: a 'q' suffix to intrinsics indicates * the instruction run for 128 bits registers. */ *data_a = vaddq_u32 (*data_a, *data_b);}int main (int argc,char** argv) { unsigned int n = atoi(argv[1]); /* Create custom arbitrary data. */ uint32_t uint32_data_a[n]; uint32_t uint32_data_b[n]; uint32_t uint32_data_c[n]; struct timespec start,end; for(uint32_t i = 1; i <= n ; i+=1) { uint32_data_a[i-1] = i; uint32_data_b[i-1] = i; uint32_data_c[i-1] = i; } /* Create the vector with our data. */ uint32x4_t data_a; uint32x4_t data_b; clock_gettime(CLOCK_MONOTONIC,&start); for(int count = 0; count < 10; count++) { for(int i = 0; i < n ; i+=4) { /* Load our custom data into the vector register. */ data_a = vld1q_u32 (uint32_data_a + i); data_b = vld1q_u32 (uint32_data_b + i); /* Call of the add3 function. */ add(&data_a,&data_b); vst1q_u32(uint32_data_c + i,data_a); } } clock_gettime(CLOCK_MONOTONIC,&end); double time_usec=(((double)end.tv_sec * 1000000 + (double)end.tv_nsec/1000) - ((double)start.tv_sec *1000000 + (double)start.tv_nsec/1000)); printf("Time taken for aligned load is : %fus and count is %d \n", time_usec/10,n ); for(uint32_t i = 0; i < n ; i++) printf("%2d ",uint32_data_c[i]); printf("\n"); return 0;}
----------------------------------------------------------------------------------------------------
Clearly almost every access to memory in this case is unaligned? is there any reason for this inconsistent behavior? Thanks in advance.
Aketh TM
Also jason there are 2 keypoints you have missed here.
1) The kernel needs an argument to be passed to it, say ./a.out 322) I have accidentally have posted the wrong kernel. The loop with i must begin with i = 1. (I am try and edit the original post if possible)Here is the correct kernel for reference#include <stdio.h>#include <stdlib.h>#include <time.h>#include "arm_neon.h"void add (uint32x4_t *data_a,uint32x4_t *data_b) { /* Set each sixteen values of the vector to 3. * * Remark: a 'q' suffix to intrinsics indicates * the instruction run for 128 bits registers. */ *data_a = vaddq_u32 (*data_a, *data_b);}int main (int argc,char** argv) { unsigned int n = atoi(argv[1]); /* Create custom arbitrary data. */ uint32_t uint32_data_a[n]; uint32_t uint32_data_b[n]; uint32_t uint32_data_c[n]; struct timespec start,end; for(uint32_t i = 1; i <= n ; i+=1) { uint32_data_a[i-1] = i; uint32_data_b[i-1] = i; uint32_data_c[i-1] = i; } /* Create the vector with our data. */ uint32x4_t data_a; uint32x4_t data_b; clock_gettime(CLOCK_MONOTONIC,&start); for(int count = 0; count < 10; count++) { for(int i = 0; i < n ; i+=4) { /* Load our custom data into the vector register. */ data_a = vld1q_u32 (uint32_data_a + i); data_b = vld1q_u32 (uint32_data_b + i); /* Call of the add3 function. */ add(&data_a,&data_b); vst1q_u32(uint32_data_c + i,data_a); } } clock_gettime(CLOCK_MONOTONIC,&end); double time_usec=(((double)end.tv_sec * 1000000 + (double)end.tv_nsec/1000) - ((double)start.tv_sec *1000000 + (double)start.tv_nsec/1000)); printf("Time taken for aligned load is : %fus and count is %d \n", time_usec/10,n ); for(uint32_t i = 0; i < n ; i++) printf("%2d ",uint32_data_c[i]); printf("\n"); return 0;
Hi,
The program works for me know, thanks for the corrections.
I'm interested to find out why you think there are unaligned memory accesses and how you determined this? A little more info on what you think the problem is would be helpful.
Thanks,
Jason
Notice the for loop, it clearly begins with 1, hence1) if the array was indeed aligned to a 128 bit/16 byte boundary we would see a segmentation fault straight away, since a vector load at an unaligned boundary must cause a segmentation fault.
2) if the array wasn't aligned and by chance always the element 1 was processed okay, we would expect a segmentation fault atleast on any of other values of i (since every element of an array can never be aligned). Note :- The loop has stride 1, (i = 0 ; i < n ; i++). Hence, I expect a segmentation fault.
I recommend to narrow the focus to a specific instruction and load address you think should be causing segfault.
Use objdump -S to see the source code mixed with the disassembly, gdb to single step by source and assembly and print registers, or just printf() statements to print pointer values.
I didn't see any unaligned accesses. The stride of 1 is used for pointer arithmetic which moves to the next vector.