This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Can Arm-Neon use with OMP?

I've been studying Neon recently and I've had some questions about it:

1、Dose each ARM CPU core own the Neon register?

2、Arm-neon can theoretically accelerate four times,if I had six cpu cores,could I theoretially accelerate 24 times?

Parents
  • Hi.

    In short, yes.  Each CPU core will have its own NEON execution pipelines available to it, meaning that if you have parallelised your jobs using OpenMP, pthreads or whatever else, you will be able to fully use the capabilities of the chip you are on.

    It is perhaps worth noting that some cores may actually have more than one NEON pipeline available for execution in per cycle.  That means that using the Arm Neoverse N1, for example as found in the AWS Graviton2 nodes, or Marvell's ThunderX2 cores both of these have two pipelines.  The core itself will schedule available instructions appropriately into these meaning that no extra work from the user is necessary.  For reference, the AWS nodes prodice 64 Neoverse N1 cores, so a good parallel implementation will result in very high peak performance across the full system.

    Hope that helps.

    Chris

Reply
  • Hi.

    In short, yes.  Each CPU core will have its own NEON execution pipelines available to it, meaning that if you have parallelised your jobs using OpenMP, pthreads or whatever else, you will be able to fully use the capabilities of the chip you are on.

    It is perhaps worth noting that some cores may actually have more than one NEON pipeline available for execution in per cycle.  That means that using the Arm Neoverse N1, for example as found in the AWS Graviton2 nodes, or Marvell's ThunderX2 cores both of these have two pipelines.  The core itself will schedule available instructions appropriately into these meaning that no extra work from the user is necessary.  For reference, the AWS nodes prodice 64 Neoverse N1 cores, so a good parallel implementation will result in very high peak performance across the full system.

    Hope that helps.

    Chris

Children