• NE10-Library -> FIR-Filter cycle counts: C-version faster than NEON-version?

    Hi,

    i'm currently trying to measure cycle counts for FIR-filtering with the NE10 library. I'm using a Raspberry Pi 2 with ARM Cortex-A7 running on Raspbian as a target.

    I activated the Cortex-A7 performance counter register to read out the cycles…

  • Embedded assembly function problem

    Hello all,

    I wrote end embedded assembly function for an ARM Cortex A9 (the specific device is Zynq, from Xilinx) as follow

    float my_fun(float x)

    {

                    asm volatile ("vdup.f32 d0, r0                     \n\t");…

  • cortex-A15 instruction set and optimization ways on this platform?

    Dear,

    I am an greenhand developer on cortex-a15.

    now I need some specification as follows:

    where I can get the instruction set of cortex-A15?

    are there some documents about optimization technology on cortex-A15(image processing optimization)

    Thanks a lot.

  • HI,why the VFP vector mode can not be used in cortex-a series processors?

    HI,why the VFP vector mode can not be used in cortex-a series processors?

  • ARM_V8 instruction Cycles timings

    Hi, can anyone suggest me how to know the instructions cycle timing of the arm_v8 instructions.does it take more cycles to transmit from neon to basic arm instructions in arm_v8.

    please suggest me how to calculate instruction cycles in arm_v8

  • In NEON, have the three instructions( VCLS, VCLZ, VCNT), are they all count sign bit?

    In NEON spec:

    VCLS (Vector Count Leading Sign bits) counts the number of consecutive bits following the topmost bit, that are the same as the topmost bit, in each element in a vector, and places the results in a second vector.

    VCLZ (Vector Count Leading…

  • question about arm cortex-a9 neon optimization(4x4 matrix mul)

    =======================================

    for matrix 4 by 4 multiplication, neon programming is slower than natural code with

    auto-vectorization option. (Xilinx Zynq 702 EVM board - cortex a9 with gcc complier option

    -mfloat-abi=softfp -mfpu=neon-fp16 -ftree…

  • NEON: Cortex A7 is 4 times slower than Cortex A8 ?

    I'm seeing Cortex-A7 cycle-timing table here :

    http://hardwarebug.org/2014/05/15/cortex-a7-instruction-cycle-timings/


    For example, 

    VADD.F32 Dd, Dn, Dm takes 2 cycles

    VADD.F32 Qd, Qn, Qm takes 4 cycles

    same goes for VMUL..

    Is this really the case…

  • The cortex-A7's pipeline support dual-issue, so I want to ask what's the dual-issue mean?

    The cortex-A7's pipeline support dual-issue, so I want to ask what's the dual-issue mean?

    I find some answers say that dual-issue means that the cortex-A7 can issue two instructions per clock.

    But in the cortex-A7's pipeline diagraph, it has integer…

  • Question about accumulator word length in A8 core

    Hi,

    I have used some 32-bit microprocessor cores (non-ARM), which has a long word-length accumulator for some DSP operations, to avoid over-flow etc. After I check A8 core document, it is a surprise that I do not see any about this specification. It looks…

  • How to enable Neon in cortex A8?

    Hi,

          I am using beaglebone which has the processor TI Sitara AM335X. I want to make use of Neon coprcessor for my project, To enable neon, I have to follow these commands. But I can't access these registers ( especially FPEXC…

  • NEON SIMD Register Diagram

    Hello,

    I’m new to ARM architecture and was looking to get a better understanding of how it works. Most notably, the Cortex-A series and its DSP functionality.

    When looking through the NEON SIMD page on ARM's webpage (NEON - ARM), it mentions that…

  • Explain 8 stage pipeline of ARM Cortex a7?

    Brief explanation of each stage of ARM pipe-lining.  

    How many Neon pipeline stages are their?

    What is dual issue in ARM pipe-lining?

  • How does the ARM CA53 4 core join NEON on only 2 cores?

    Our project only wants 2 cores to support NEON for cost reasons. How can I do this?

    1. Can a single cluster be done?


    2. Cut into 2 clusters, each with 2 cores. What is the difference between the performance of ARM HMP scheduling 4 cores and the performance…

  • Arm Neon not vectorising nested loop

    Hi,

    I am using A9 Processor on Zynq Board running a test project with neon and simd options enabled . In my code i have nested loops which is not vectorised and below is the build log 

     not vectorized: multiple nested loops. 

    Can anyone help me on thi…

  • Containers Fundamental to Distributed Cloud Services

    As operators implement their next generation networks, containers can help accelerate application deployment cycles and increase network agility, enabling the same microservices that ran in the datacenter to run at the network edge. These containers are…

  • NEON-Advanced SIMD vs. SIMD

    Hello,

    I’m new to ARM architecture and was looking to get a better understanding of how it works. Most notably, the Cortex-A series and its DSP functionality.

    When reading through ARM’s webpage, it often refers to “NEON-Advanced SIMD”, “NEON”, and…

  • Data abort, External abort.. How can i find cause????

    Hi, experts

    I'm developing Secure OS on A57/53 bit.LITTLE SoC. But as you know.. Cuz i'm really beginner..

    I beg your wisdom...

    Current situation is :

    • For making a TA. Bring the related data from REE and Mapping TEE side's NON-SECURE memory. (Data…
  • Memory barrier (DSB, DMB). Does they guarantee writing data on cache to memory?

    Hi Experts,

    I'm reading white paper for ARMv7 and ARMv8.

    but when i reading cache part and memory re-ordering, i have silly questions.....

    Suppose there are below instructions..

     

    Core A:

         STR R0, [Msg]

         STR R1, [Something…

  • De-merits in using Cortex A9 for single core processor

    Hi Experts,

    A8 is meant for single core and A9 is for multi-core based.

    Consider in case of SoC is build with single core of A9 and A8 how we could compare both in terms of some metrics/parameters like power/speed ?

  • Understanding ARM NEON instruction

    hi i am trying to understand ARM NEON instruction and encountered with vqrdmulh instruction.

    i am particularly interested in saturation case in instruction i am not getting any case with saturation .

    Can any one explain me with an example

    for example:

    vqrdmulh…

  • New version of the Cortex-A Series Programmer's Guide is Available

    The ARM® Cortex®-A Series Programmer’s Guide has proved to be a very popular addition to the ARM documentation set, and now also forms the reference textbook for the ARM Accredited Engineer(AAE) examinations.

    The updated ARM® Cortex®-A…

  • A Walk Through the Cortex-A Mobile Roadmap

    Chinese Version中文版

    Introduction

    The ARM Cortex-A mobile application processor product line spans several generations and three main product tiers. Developers and SoC designers experienced with one or more of the newer ARM ARM Processors benefit from an…

  • Coding Using NEON Technology

    利用NEON技术编写代码

    ARM NEON™ technology is widely used for multimedia optimization. The SIMD architecture of NEON technology makes it very suitable for many compute intensive modules in multimedia codecs such as filtering, de-blocking etc. This blog explores…

  • Elba - How do we know it works?

    In part 1 of this blog, I outlined the thought process behind the Elba program. Here I'll look at the implementation decisions for the project.

    In ARM there are various stages of maturity of a new processor development, reaching silicon implementation…