This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to measure program execution time in ARM Cortex-A53 processor?

Hi,

I was using following method to read clock in cortex-a15:

       static void readticks(unsigned int *result)

        {

            struct timeval t;

            unsigned int cc;

            if (!enabled) {

                // program the performance-counter control-register:

                asm volatile("mcr p15, 0, %0, c9, c12, 0" :: "r"(17));

                //enable all counters

                asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f));

                //Clear overflow.

                asm volatile("mcr p15, 0, %0, c9, c12, 3" :: "r"(0x8000000f));

                enabled = 1;

            }

            asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(cc));

            gettimeofday(&t,(struct timezone *) 0);

            result[0] = cc;

            result[1] = t.tv_usec;

            result[2] = t.tv_sec;

        }

And final performance looks like:

before = readticks()

  foo()

after = readticks()

clock_cycles = after - before.

I want to use same logic in cortex-A53, ARM64 (not aarch32).

I have tried this after following online portals:

        /* All counters, including PMCCNTR_EL0, are disabled/enabled */

       

            #define QUADD_ARMV8_PMCR_E (1 << 0)

            /* Reset all event counters, not including PMCCNTR_EL0, to 0

   

     */

        #define QUADD_ARMV8_PMCR_P (1 << 1)

        /* Reset PMCCNTR_EL0 to 0 */

        #define QUADD_ARMV8_PMCR_C (1 << 2)

        /* Clock divider: PMCCNTR_EL0 counts every clock cycle/every 64 clock cycles */

        #define QUADD_ARMV8_PMCR_D (1 << 3)

        /* Export of events is disabled/enabled */

        #define QUADD_ARMV8_PMCR_X (1 << 4)

        /* Disable cycle counter, PMCCNTR_EL0 when event counting is prohibited */

        #define QUADD_ARMV8_PMCR_DP (1 << 5)

        /* Long cycle count enable */

        #define QUADD_ARMV8_PMCR_LC (1 << 6)

   

    static inline unsigned int armv8_pmu_pmcr_read(void)

    {

  

            unsigned int val;

        /* Read Performance Monitors Control Register */

        asm volatile("mrs %0, pmcr_el0" : "=r" (val));

        return val;

    }

        static inline void armv8_pmu_pmcr_write(unsigned int val)

        {

        asm volatile("msr pmcr_el0, %0" : :

            "r" (val & QUADD_ARMV8_PMCR_WR_MASK));

        }

       

        static void enable_all_counters(void)

        {

            unsigned int val;

            /* Enable all counters */

            val = armv8_pmu_pmcr_read();

            val |= QUADD_ARMV8_PMCR_E | QUADD_ARMV8_PMCR_X;

            armv8_pmu_pmcr_write(val);

        }

       

        static void reset_all_counters(void)

        {

   

            unsigned int val;

            val = armv8_pmu_pmcr_read();

            val |= QUADD_ARMV8_PMCR_P | QUADD_ARMV8_PMCR_C;

            armv8_pmu_pmcr_write(val);

   

    }

   

    static void readticks(unsigned int *result)

    {

      struct timeval t;

      unsigned int cc;

      unsigned int val;

      if (!enabled) {

        reset_all_counters();

        enable_all_counters();

        enabled = 1;

      }

      cc = armv8_pmu_pmcr_read();

      gettimeofday(&t,(struct timezone *) 0);

      result[0] = cc;

      result[1] = t.tv_usec;

      result[2] = t.tv_sec;

}

But nothing is working and i am gettin "illegal instruction" error. Can anyone help me to change the above code?

Thanks,

RV

Parents
  • This is an EL0 (userspace) app?  If so it's configurable whether the PMU can be accessed at EL0.  Controlled by PMUSERENR_EL0.  It's possible you configured the equivalent (PMUSERENR) on the Cortex-A15 platform, but not on this one.

    Your code snippet doesn't include an ISB instruction.  Changes to context (e.g. enabling the PMU) is not guaranteed to take effect until a context synchronizing event.  If you have access to DS-5, it includes some example bare metal PMU code with the required ISBs.

    One other note, the PMU measure cycles rather than time.  I'm guessing you already know that, but I thought it worth mentioning.

Reply
  • This is an EL0 (userspace) app?  If so it's configurable whether the PMU can be accessed at EL0.  Controlled by PMUSERENR_EL0.  It's possible you configured the equivalent (PMUSERENR) on the Cortex-A15 platform, but not on this one.

    Your code snippet doesn't include an ISB instruction.  Changes to context (e.g. enabling the PMU) is not guaranteed to take effect until a context synchronizing event.  If you have access to DS-5, it includes some example bare metal PMU code with the required ISBs.

    One other note, the PMU measure cycles rather than time.  I'm guessing you already know that, but I thought it worth mentioning.

Children
  • @Chris Shore

    mweidmann

    @sujathalakshmi.k

    Please help.

    Yes, its an EL0 app. I am trying to benchmark some algorithms for cortex-a53 and I need clock cycles as benchmark not time. I dont have access to DS-5.

    So I got this from manual that PMCR_EL0, which is 32 bit reg, can be used as performance monitor directly in aarch64. I have changed the readtick function simple like this:

    static void readticks(unsigned int *result)

    {

      struct timeval t;

      unsigned int cc;

      unsigned int val;

      if (!enabled) {

        //reset all counters

      asm volatile("msr pmcr_el0, %0" : : "r" (17));

      //enable all counters

      asm volatile("msr pmcr_el0, %0" : : "r" (0x8000000f));

        enabled = 1;

      }

      asm volatile("mrs %0, pmcr_el0" : "=r" (cc));

      gettimeofday(&t,(struct timezone *) 0);

      result[0] = cc;

      result[1] = t.tv_usec;

      result[2] = t.tv_sec;

    }

    I think its good so far but the actual problem is to use Hyp mode.

    I was using this small driver in Cortex-A15 (explaned here: http://bench.cr.yp.to/cpucycles/netwalker.html):

      #include <linux/module.h>

      #include <linux/kernel.h>

      MODULE_LICENSE("Dual BSD/GPL");

      #define DEVICE_NAME "enableccnt"

      static int enableccnt_init(void)

      {

      printk(KERN_INFO DEVICE_NAME " starting\n");

      asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(1));    //  Write to HVBAR characteristic register.

      return 0;

      }

      static void enableccnt_exit(void)

      {

      asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(0));

      printk(KERN_INFO DEVICE_NAME " stopping\n");

      }

      module_init(enableccnt_init);

      module_exit(enableccnt_exit);

    So, from the technical referece manual of cortex-a53, I got VBAR_EL2 characteristics register equivalent to HVBAR in aarch32. I modified the driver as

      #include <linux/module.h>

      #include <linux/kernel.h>

      MODULE_LICENSE("Dual BSD/GPL");

      #define DEVICE_NAME "enableccnt"

      static int enableccnt_init(void)

      {

      printk(KERN_INFO DEVICE_NAME " starting\n");

      asm volatile("MSR VBAR_EL2, %0" :: "r"(1));    //  Write to VBAR_EL2 characteristic register.

      return 0;

      }

      static void enableccnt_exit(void)

      {

      asm volatile("MSR VBAR_EL2, %0" :: "r"(0));

      printk(KERN_INFO DEVICE_NAME " stopping\n");

      }

      module_init(enableccnt_init);

      module_exit(enableccnt_exit);

    When I am inserting this into kernel. Its giving "segmentation fault". attached is the picture of dmeg command .

    11273696_10155587244140037_1676985089_n.jpg

    I am not sure that whether I should set some vlaues to "Hyp Debug Control Register"?

    And

    I am getting whether I should use isb instructions as it was working as it is in cortex-a15?

    I appreciate any help.