This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Understanding VPf and NEON link

Note: This was originally posted on 14th March 2011 at http://forums.arm.com

I'd like to understand exactly how NEON can be used instead of VPf.

I understand that
FADDS
is now replaced by
VADD.f32

But I have some questions about that.
- Does the 2 syntax have the same memory representation (do they have the same hexadecimal representation) ??? (I guess Yes, but i'd like to be sure)
- FADDS can be conditionnal !!! Can anybody give me an example ?
- if FADDS can be conditionnal then VADD.f32 should be too ! What is the correct syntaxe for a conditionnal VADD.f32
- FADDS is now VADD.f32 and is executed into NEON pipeline. Does it mean that VADD.f32 (and FADDS) execute in 1 cycle instead of 9.

Is I'm right,
FADDD is replaced by VADD.f64 but is not executed into NEON pipeline, so the VPF cycle table must be used !!!
FADDD not seems to be a conditionnal instruction! Are we ok about that ?

Thank's
Etienne
Parents
  • Note: This was originally posted on 14th March 2011 at http://forums.arm.com

    Ok I've made so test...

    In this code


    fmsr            s14, r0
    ...
    fmuls   s0, s14, s14
    ...
    fmrs   r0, s0
    mov   pc, lr



    fmuls take 12 cycles. this is the same time as vmul.f32 (because as you said, this is the same instruction)
    I don't know if it use NEON pipeline or not. I suppose NO because it should take only 7 cycles if it was the case.

    and in this code



    fmsr            s14, r0
    ...
    vmul.f32  d0, d7, d7
    ...
    fmrs   r0, s0
    mov   pc, lr


    The vmul.f32 take only 1 cycle !!!

    What can be faster ???
Reply
  • Note: This was originally posted on 14th March 2011 at http://forums.arm.com

    Ok I've made so test...

    In this code


    fmsr            s14, r0
    ...
    fmuls   s0, s14, s14
    ...
    fmrs   r0, s0
    mov   pc, lr



    fmuls take 12 cycles. this is the same time as vmul.f32 (because as you said, this is the same instruction)
    I don't know if it use NEON pipeline or not. I suppose NO because it should take only 7 cycles if it was the case.

    and in this code



    fmsr            s14, r0
    ...
    vmul.f32  d0, d7, d7
    ...
    fmrs   r0, s0
    mov   pc, lr


    The vmul.f32 take only 1 cycle !!!

    What can be faster ???
Children
No data