This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Is there an intrinsic to store 3 float values?

I have the following code in assembler:

    vst1.32            {d10}, [%[pOutVertex2]]          
   fsts               s22, [%[pOutVertex2], #8]             

This stores s20, s21, s22 into pOutVertex which is an array of 3 floats. Is there an intrinsic to do this? I can only find vst1q_f32, but that would overwrite the 4th value in pOutVertex.

Parents
  • Lefty,

    You are likely looking for "vst3_lane_f32()", however, this stores a single element from each of three registers, i.e. the values you wish to store would have to be in a single lane of a float32x2x3_t.

    As a bad example for illustration only:

    #include <arm_neon.h>
    
    void store_three_floats(float a, float b, float c, float *dst)
    {
      float32x2x3_t vec;                            // Declare trio of vectors
    
      vec.val[0] = vset_lane_f32(a, vec.val[0], 0); // Set lowest lane in vector 0
      vec.val[1] = vset_lane_f32(b, vec.val[1], 0); // Set lowest lane in vector 1
      vec.val[2] = vset_lane_f32(c, vec.val[2], 0); // Set lowest lane in vector 2
    
      vst3_lane_f32(dst, vec, 0); // Store lowest element from each of the trio
    }
    

    In the general case, the three values would already be in independent vectors (e.g. R, G, B), and thus only the vst3 would be required without the lane insertions.

    hth

    Simon.

Reply
  • Lefty,

    You are likely looking for "vst3_lane_f32()", however, this stores a single element from each of three registers, i.e. the values you wish to store would have to be in a single lane of a float32x2x3_t.

    As a bad example for illustration only:

    #include <arm_neon.h>
    
    void store_three_floats(float a, float b, float c, float *dst)
    {
      float32x2x3_t vec;                            // Declare trio of vectors
    
      vec.val[0] = vset_lane_f32(a, vec.val[0], 0); // Set lowest lane in vector 0
      vec.val[1] = vset_lane_f32(b, vec.val[1], 0); // Set lowest lane in vector 1
      vec.val[2] = vset_lane_f32(c, vec.val[2], 0); // Set lowest lane in vector 2
    
      vst3_lane_f32(dst, vec, 0); // Store lowest element from each of the trio
    }
    

    In the general case, the three values would already be in independent vectors (e.g. R, G, B), and thus only the vst3 would be required without the lane insertions.

    hth

    Simon.

Children