This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

No segmentation fault when expected with aligned load and store

Hi all,

It is a well known fact that performing an aligned vector load with an unaligned memory address should lead to segmentation fault.

However, when I do try to run code segment below using the same, i do not see any segmentation fault.

------------------------------------------------------------------------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "arm_neon.h"

void add (uint32x4_t *data_a,uint32x4_t *data_b)
{
    /* Set each sixteen values of the vector to 3.
     *
     * Remark: a 'q' suffix to intrinsics indicates
     * the instruction run for 128 bits registers.
     */
    
    *data_a = vaddq_u32 (*data_a, *data_b);
}

int main (int argc,char** argv)
{
    unsigned int n = atoi(argv[1]);

    /* Create custom arbitrary data. */
    uint32_t uint32_data_a[n];
    uint32_t uint32_data_b[n];
    uint32_t uint32_data_c[n];
    struct timespec start,end;

    for(uint32_t i = 1; i <= n ; i+=1)
    {
        uint32_data_a[i-1] = i;
        uint32_data_b[i-1] = i;
        uint32_data_c[i-1] = i;
    }

    /* Create the vector with our data. */
    uint32x4_t data_a;
    uint32x4_t data_b;

   
    clock_gettime(CLOCK_MONOTONIC,&start);

    for(int count = 0; count < 10; count++)
    {
        for(int i = 1; i < n ; i+=4)
        {    
            /* Load our custom data into the vector register. */
            data_a  = vld1q_u32 (uint32_data_a + i);
        data_b  = vld1q_u32 (uint32_data_b + i);

            /* Call of the add3 function. */
            add(&data_a,&data_b);

            vst1q_u32(uint32_data_c + i,data_a);
    }    
    }

    clock_gettime(CLOCK_MONOTONIC,&end);

    double time_usec=(((double)end.tv_sec * 1000000 + (double)end.tv_nsec/1000) - ((double)start.tv_sec *1000000 + (double)start.tv_nsec/1000));
    printf("Time taken for aligned load is : %fus and count is %d \n", time_usec/10,n );

    for(uint32_t i = 0; i < n ; i++) printf("%2d ",uint32_data_c[i]);
    printf("\n");
    return 0;
}

----------------------------------------------------------------------------------------------------

Clearly almost every access to memory in this case is unaligned? is there any reason for this inconsistent behavior? Thanks in advance.

Aketh TM

Parents
  • I'm sure you've figured it out by now, but just for people who are perusing this topic (as this was my top google search result): unlike their x86 counterparts: VLDR and VSTR do not require aligned addresses. You can see the link below about what these instructions are doing per-cycle to handle the "always unaligned" assumption they must make to allow for this nice behavior.

    developer.arm.com/.../neon-load-store-instructions

Reply
  • I'm sure you've figured it out by now, but just for people who are perusing this topic (as this was my top google search result): unlike their x86 counterparts: VLDR and VSTR do not require aligned addresses. You can see the link below about what these instructions are doing per-cycle to handle the "always unaligned" assumption they must make to allow for this nice behavior.

    developer.arm.com/.../neon-load-store-instructions

Children
No data