This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Understanding VPf and NEON link

Note: This was originally posted on 14th March 2011 at http://forums.arm.com

I'd like to understand exactly how NEON can be used instead of VPf.

I understand that
FADDS
is now replaced by
VADD.f32

But I have some questions about that.
- Does the 2 syntax have the same memory representation (do they have the same hexadecimal representation) ??? (I guess Yes, but i'd like to be sure)
- FADDS can be conditionnal !!! Can anybody give me an example ?
- if FADDS can be conditionnal then VADD.f32 should be too ! What is the correct syntaxe for a conditionnal VADD.f32
- FADDS is now VADD.f32 and is executed into NEON pipeline. Does it mean that VADD.f32 (and FADDS) execute in 1 cycle instead of 9.

Is I'm right,
FADDD is replaced by VADD.f64 but is not executed into NEON pipeline, so the VPF cycle table must be used !!!
FADDD not seems to be a conditionnal instruction! Are we ok about that ?

Thank's
Etienne
Parents
  • Note: This was originally posted on 14th March 2011 at http://forums.arm.com

    Ok Thank's

    I do not uderstand what you want to say by
    "[color=#222222][size=2]Some have separate (ish) blocks for each, others will have more tightly integrated blocks." ?[/size][/color]
    [size=2]
    [/size]
    [size=2]Since My previous post I notice that in the documentation[/size]
    [size=2]
    [/size]
    [size=2]Each VFP instruction takes 7 cycles to execute in the NFP pipeline because of this restriction.[/size]
    [size=2]
    [/size][size=2]
    [/size]
    [size=2]
    [/size]
    [size=2]So finaly it seem's that the optimisation is not so interesting.[/size]
    [size=2]
    [/size]
    [size=2]FADDS take 9-10 cycles (on VPF execution)[/size]
    [size=2]VADD.f32 take 7 cycles. (on NEON execution)[/size]
    [size=2]
    [/size]
    [size=2]It seems that the fastest way to use 32 bit floating instruction is to use NEON with 64 bit registers[/size]
    [size=2]VADD.f32 d0, d1, d2 while take only 1 cycle.[/size]
    [size=2]
    [/size]
    [size=2]Let suppose that the 32 highest bit are loose.[/size]
    [size=2]
    [/size]
    [size=2]I will make so tests this night !!![/size]
Reply
  • Note: This was originally posted on 14th March 2011 at http://forums.arm.com

    Ok Thank's

    I do not uderstand what you want to say by
    "[color=#222222][size=2]Some have separate (ish) blocks for each, others will have more tightly integrated blocks." ?[/size][/color]
    [size=2]
    [/size]
    [size=2]Since My previous post I notice that in the documentation[/size]
    [size=2]
    [/size]
    [size=2]Each VFP instruction takes 7 cycles to execute in the NFP pipeline because of this restriction.[/size]
    [size=2]
    [/size][size=2]
    [/size]
    [size=2]
    [/size]
    [size=2]So finaly it seem's that the optimisation is not so interesting.[/size]
    [size=2]
    [/size]
    [size=2]FADDS take 9-10 cycles (on VPF execution)[/size]
    [size=2]VADD.f32 take 7 cycles. (on NEON execution)[/size]
    [size=2]
    [/size]
    [size=2]It seems that the fastest way to use 32 bit floating instruction is to use NEON with 64 bit registers[/size]
    [size=2]VADD.f32 d0, d1, d2 while take only 1 cycle.[/size]
    [size=2]
    [/size]
    [size=2]Let suppose that the 32 highest bit are loose.[/size]
    [size=2]
    [/size]
    [size=2]I will make so tests this night !!![/size]
Children
No data