I am currently trying to extract the performance events information from the sabrelite i.mx6q (https://boundarydevices.com/product/sabre-lite-imx6-sbc/ )
I have successfully integrated the gator daemon - driver after kernel compiling in order to enable the recording data on the ds5 streamline analyzer (Streamline Performance Analyzer – DS-5 Development Studio – ARM Developer ).
The ARM energy probe is connected. on the DS-5 GUI after a test record, I can see the percentage of the CPU activity, the clock frequency/cycles and also othe informations (please see the attached figure). But the problem that is there are any information from the available events related to the cores-events or to the cache-events. all the values of these events are zero knowing that the cores execute applications during the records.
I would like to know what is the problem.
Hi mwsealey,
Thanks for your reply. I tried the instructions in the link above, but I get some errors. I'm not sure that I can resolve them, it's too complicated.
In addition, this code contains already a list of 6 configured events with some specification for the delay of events, sampling, which are conflicting with the DS-5 tool. in DS-5 tool you can choose to configure the events as you want for the monitoring and after the recording of data you can choose another events.
Thanks for your help,
Mohamad
Hi Mohamad,
What exactly was too difficult? Really the only thing that needs to be changed about the software is the setting of the SDER register so that SUIDEN and SUNIDEN are enabled. There's quite a lot of discussion in the aforementioned thread, but the real meat of it is that sequence that is in the "Correct Answer":
Your compiler may disagree about the encoding of the value, "val = 0x3" is the same thing. Note you may need an "isb" instruction after the write for the authentication controls to synchronize to be correct.
If the purpose of your using Streamline is that PMU counters will count while in Secure PL0 (i.e. while applications are running, not just the kernel), then SUNIDEN needs to be set, at the very least, by at least some part of the system software stack while in a Secure Privileged mode. On i.MX6Q the entire stack runs in Secure state, so you have a choice of U-Boot or the Linux kernel -- including modifying gator.ko to do the work for you. While in Secure state, those two bits are relevant.
Ta,
Matt
hi Mwsealey,
I agree with you, i have compiled this module and its work. But i have a negative impact on the other counters like the clock cycles (i have attached a picture).
Adn also, the event counters related to the L2 cache don't work and the strange thing after the insertion of this module is the disappearing of the clock cycle information.
thanks,
I am not sure why the CPU cycles information isn't valid anymore. The NXP forum suggests there is some software (a github repository) providing some kernel module code. Please don't run this module, we don't actually need it (we just need the SDER configuration).
L2C-310 events are a different thing entirely, there shouldn't be any issue in counting these as long as software knows to read them. In gator.ko source code there's a file driver/gator_events_l2c-310.c -- and a function contained called "gator_events_l2c310_probe" -- but the function isn't instrumented to report whether it has successfully found the L2C-310, what the base address might be, and whether it is available to count these events. Is there a way you can add some kprintf() or pr_info() statements to tell whether it was successful or not? Note that some of the probe functionality relies on correct information in the platform Flattened Device Tree, if it does not find this information it will invariably fail since i.MX6Q is not one of the supplied known variants.
Again, at the top left of your Streamline window is a black-and-yellow bar with a warning symbol in it - there should be an informative tooltip. Does it say anything about the issue?
Hi,
The first problem has been solved, with some modification in the downloaded source kernel. I have added some function to my kernel files (/arch/arm/include/asm/pmu.h & /arch/arm/kernel/perf_event.c & /arch/arm/kernel/perf_event_v7.c) in order to configure the SDER. After these modifications the compiling and the installation of the kernel is necessary. i have tested the DS5 with these modifications and it work, This can be solution better than the modules mentioned here Getting zeros on i.MX6 PMU counters | NXP Community .
I am working on the problem of L2C-310 events. I have a question in this case, like the modification to configure the SDER
where we have added the MCR (Move to Register from Coprocessor) "ARM function", i found a similar thing for the memory, the function is LDC (Load from memory to coprocessor registers). It cannot be the solution for the cache events, by modifiying the kernel in the same way to configure the SDER.?
Finally, as regards the warning symbol in the streamline window, it is just a message concerning the power meter tool and not for the problem.
Hi elahmad,
L2C-310 shouldn't be affected at all by the contents of SDER, you're right. It must be some other problem.
If you're using "userspace perf" via Streamline, then this will be 100% the problem you have. There is no perf driver for the L2C-310 performance counters.
For that you would needfully be using the gator.ko module which implements the support via a slightly obscure method (remapping the L2C-310 registers that are already mapped by the kernel to effect cache operations) -- are you using gator.ko module or just the daemon alone?
hi mwsealey,
Yes, I am using the gator.ko module with the daemon, from my first steps in this work I use gator.ko + daemon.
I don't know if the version of the used gator.ko is the main cause, but normally it's recommended to use gator.ko who has the same version like the installed DS-5.
regards,
Can you tell if driver/gator_events_l2c-310.c code "gator_events_l2c310_probe" or "gator_events_l2c310_init" actually ran, and whether the event interface actually got installed? A few kprintfs just to see it initialize and load and whether it worked at all might be prudent.
Once we confirm it installed and that the registers react the only other possibility that comes to mind is that the debug authentication signal SPNIDEN is not asserted, although this would be extremely strange for CPU counters to increment without this, SDER overrides it for non-privileged modes (you may be seeing some drops in counts when entering the kernel and going back to the application they might start up again). The L2C-310 implements a debug register (at offset 0xF40) and bit 2 should report the value of SPNIDEN input to the cache controller. If it's 0, this would be down to some kind of security infrastructure within the SoC, however - I did recall this note in the i.MX6SX Chip Errata documentation though related to a PMU erratum:
i.MX6 does not support PMU (Performance Monitoring Unit) hence this ARM errata is not applicable.
(My emphasis). I do not know if that actually means anything, and in any case would be a question for NXP what "not support[ed]" means. As above there is a somewhat slim chance that SPNIDEN is not integrated in a configurable way and would always be disabled, or it could just be a statement of support of the standard BSP package.
We really think you may have to talk to your local NXP Support on this issue as we've somewhat exhausted possibilities for changing this from the ARM side of the equation if SPNIDEN is 0. If it's 1, then we have something more esoteric going on.
I had the same problem. To make sure that the PL310 counters were correctly programmed in the gator.ko module, I stepped through the gator.ko module with the DS-5 kernel debugger and DSTREAM unit. That accidentally made the PL310 performance counters come to life.
This is actually described in i.MX6 errata ERR006259, although not very clear:
When JTAG_TCK is not toggling after power-on reset (POR), the ARM PMU, PTM, and ETB stayin their disabled states so various debug and trace functions are not available.
Provide at least 4 JTAG_TCK clock cycles following POR if the PMU, PTM and ETB functionswill be used. A free-running JTAG_TCK can also be used.
It looks like the Cortex-A9 PMU counters are not affected by the presence or absence of a JTAG clock, but the PL310 counters are.