I have a couple of question about the default FPU in ARMv8. In some places is implied that the FPU and the SIMD engine is same unit, implying that the FPU and the NEON share the same pipeline execution state(s). But in other places is implied that NEON and VFP are different units: NEON being the AdvSIMD engine and VFP the FPU. Like the ARM website for Cortex-A53, that implies they are separated units (looking at the image) and hints that VFPv4 is the Floating-Point Unit.
Some people say that the AdvSIMD is the FPU for AArch64 and VFPv3 is the FPU for AArch32, but I have my doubts.
For instance, if I run a FP instruction, like FADD:
FADD S0, S1, S2
In which unit will it be executed? VFP or NEON/AdvSIMD?
I searched for some days, and don't got much luck., so I would me honored and grateful if someone would help me.
pedrobotelho15 said:In some places is implied that the FPU and the SIMD engine is same unit, implying that the FPU and the NEON share the same pipeline execution state(s). But in other places is implied that NEON and VFP are different units: NEON being the AdvSIMD engine and VFP the FPU.
Armv8-A is an architecture. An architecture describes function of a processor (e.g. what instructions exist, what they do, how they are represented in memory). Architecture does not describe the design of a processor (e.g. the pipeline), Arm refers to that as micro-architecture.
Why that matters is that your question is about whether FPU and SIMD instructions are handled in one unit or multiple. That's a micro-architecture question, not an architecture question. Any given two Armv8-A compliant designs might make different design trade offs. You could answer the question specifically for (say) the Cortex-A53, but the answer might be different for the Cortex-A57.
pedrobotelho15 said:For instance, if I run a FP instruction, like FADD: FADD S0, S1, S2 In which unit will it be executed? VFP or NEON/AdvSIMD?
Your example instruction is a scaler 32-bit floating point addition. To be a SIMD instruction it would need to use "v" registers instead of "s" registers.
But that doesn't tell you exactly which pipe a given processor design will choose to execute it in.
Can I ask, what problem are you trying to solve?
The majority of Cortex-A, from ARMv8, that I saw, have VFP and NEON. So, please, let's work in that scenario. I just want to know the main FPU. That's it. And I know this instruction is scalar. I want to know in which unit the scalar FP operations takes place. That's it. The problem that I am trying to solve is that I am learning the ARMv8 architecture, just as I learned x86, x86_64 and ARMv7 I want to know which is the main FPU in AArch64. Many people say it's NEON (i.e. it would run FPU and SIMD instructions) and some other says it's good 'n old VFP. I am already confused, because in ARMv7 VFP and NEON instructions are very much similar, tbh. I've been drived to think that AdvSIMD (NEON) is a single inuit for FP and SIMD, but the images show the FPU and SIMD separated, but, as you said, the importance of this can be tonned down.