This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Understanding VPf and NEON link

Note: This was originally posted on 14th March 2011 at http://forums.arm.com

I'd like to understand exactly how NEON can be used instead of VPf.

I understand that
FADDS
is now replaced by
VADD.f32

But I have some questions about that.
- Does the 2 syntax have the same memory representation (do they have the same hexadecimal representation) ??? (I guess Yes, but i'd like to be sure)
- FADDS can be conditionnal !!! Can anybody give me an example ?
- if FADDS can be conditionnal then VADD.f32 should be too ! What is the correct syntaxe for a conditionnal VADD.f32
- FADDS is now VADD.f32 and is executed into NEON pipeline. Does it mean that VADD.f32 (and FADDS) execute in 1 cycle instead of 9.

Is I'm right,
FADDD is replaced by VADD.f64 but is not executed into NEON pipeline, so the VPF cycle table must be used !!!
FADDD not seems to be a conditionnal instruction! Are we ok about that ?

Thank's
Etienne
Parents
  • Note: This was originally posted on 14th March 2011 at http://forums.arm.com

    The ARM Architecture defines what the instructions are, and what they do (ditto for VFP, NEON).  What it doesn't define is how the hardware engineers should implement any of it.  So the designers could go for a 3 stage pipeline, an 8 stage pipe, or 5000000000000 stage pipe.  As long as it functioned in the way the architecture docs define it does technically matter.  Obviously, some approaches will being better than others.  Parts of the instruction set are optional, such as a NEON and VFP, the designers have the choice to include them or not.  But again, how they do so is up to them.  They could create one hardware block that implements both.  Or, separate blocks for each.  As long as it functions correctly, its up to the designers.

    VADD.F32 d0, d1, d2 is a NEON instruction...  it will do the following   d0[31:0] = d1[31:0] + d2[31:0],   d0[63:32] = d1[63:32] + d2[62:32]

    This is where the vectored nature of NEON comes in.  The .F32 says this is 32-bit (single precision) arithmetic.  The "d" registers show that you are using 64-bit (double) registers, which hold 2x single precision values.  So you get two parallel additions.

    I'm afraid I've not player around enough with optimization to know which will be quicker.
Reply
  • Note: This was originally posted on 14th March 2011 at http://forums.arm.com

    The ARM Architecture defines what the instructions are, and what they do (ditto for VFP, NEON).  What it doesn't define is how the hardware engineers should implement any of it.  So the designers could go for a 3 stage pipeline, an 8 stage pipe, or 5000000000000 stage pipe.  As long as it functioned in the way the architecture docs define it does technically matter.  Obviously, some approaches will being better than others.  Parts of the instruction set are optional, such as a NEON and VFP, the designers have the choice to include them or not.  But again, how they do so is up to them.  They could create one hardware block that implements both.  Or, separate blocks for each.  As long as it functions correctly, its up to the designers.

    VADD.F32 d0, d1, d2 is a NEON instruction...  it will do the following   d0[31:0] = d1[31:0] + d2[31:0],   d0[63:32] = d1[63:32] + d2[62:32]

    This is where the vectored nature of NEON comes in.  The .F32 says this is 32-bit (single precision) arithmetic.  The "d" registers show that you are using 64-bit (double) registers, which hold 2x single precision values.  So you get two parallel additions.

    I'm afraid I've not player around enough with optimization to know which will be quicker.
Children
No data