The manual tells that I can use ETM in both self-host debugging and external debugging. With DStream and DS-5, the ETM works well. However, i find that i can not modify the tracing registers in DS-5, the command always leads to "verify error on memory operation".
Moreover, can i use ETM without JTAG and DStream? I mean i want to enable the tracing and read the ETB from the EL3, i guess it should be feasible as it supports self-host debugging. However, after i plug out the JTAG, I can not read the memory region CSS_DEVICE in trusted firmware. I checked the manual, all the debugging related (CTI, ETM, trace) components are mapped to this region (0x20000000-0x2e000000), but the read operation to this region in EL3 leads to stuck. I am really confused about it. I have checked the MMU page tables and even tried to disable MMU, but i still got stuck.
Really appreciate for any help!
-------------------------------------------------
Edit 09/02/2016: Not only the ETM, if i try to read from the coresight debug and trace memory region (0x20000000-0x23350000) in EL3 without plugging in JTAG, the system just stuck there. I think it is not caused by memory mapping as i got the same result even if i temporarily disabled the MMU. Did i miss something special?
Hi meteor67,
It is certainly possible to use the ETM and TMC within Juno, from a self-hosted perspective. The issue you might be facing is that the debug logic for both the "big" and "LITTLE" cores is in the power domain for the big "cluster" (i.e. the L2 and coherency logic). When the "big" cluster powers down, you lose access to that memory region.
There are controls in the core and ETM logic which signal the power controller (SCP) to prevent or emulate power down of components, which is what the debugger does when it connects. If you don't reproduce this in software (and it's difficult if they're not powered up in the first place) then you may find you never get a successful access.
If you're in EL3 then the possibility is you didn't run ARM Trusted Firmware yet or exited while still only a single core (Cortex-A53) was powered. That makes the problem quite difficult to solve - you probably don't want to write a full power management driver for SCP just to see trace.
Regarding "verify error on memory operation" - some ETM registers may not be 'verifiable' even if they're accessible, due to status bits and sticky bits. You should access them with EL3<verify=0>:0x2000xxxx or APB_0<verify=0>:0x2000xxxx to prevent the errors in this case.
Note also that accessing the debug logic from the core vs. over the APB bus from the debugger means you are at the mercy of the CoreSight Lock and OS Lock -- please look up the documentation on the use of LAR/LSR (0xFB) and 0xFB4 offset) and OSLAR/OSLSR (0x300 and 0x304 offset) registers for each CoreSight component you need to access, otherwise the components may not react correctly even if they are powered. How they react is usually documented quite well in the component documentation, and sometimes (particularly with the ETMv4 and core debug logic) also require the component to be "powered up" (not just applied externally, but programmatically - see the ETMv4 TRCPDCR and TRCPDSR, and core EDPRSR and EDPRCR via the memory-mapped interface).
Ta,
Matt
Hi Matt,
Thanks so much for your reply. In fact, I am not sure whether it is caused by powers down. I am using Trusted Firmware and Linaro's Android release on Juno. After the Android system boots up, i disabled the idle state of each cpu by writing 1 to /sys/devices/system/cpu/cpu<m>/cpuidle/state<n>/disable. And if i connect DS-5 and check the status of cpus, all of them shows "running". Then I enter EL3 by a smc instruction and try to access the memory region of ETM registers in EL3, and then stuck there if i did not connect JTAG. So the Trusted Firmware should have been run and both clusters should be powered at that moment.
As you mentioned, "EL3<verify=0>" and "APB<verify=0>" prefixes work well to access the registers. And after i modify the registers, no error occurs anymore. However, the modification does not really succeed. After i modified them, it seems that the value of the registers didn't change. I guess maybe the debugger resumes the value right after my modification.
Also thanks for the reminder on the locks, i am actually a beginner in ETM, and i will try to get more information from the manual.
Regards,
Zhenyu
Hi Zhenyu,
Now you're at the mercy of Linux (Android), which depending on the version of the kernel and device tree you use, have different definitions of the power domains in use and whether they can power them on and off. It is certainly possible that disabling cpuidle still powers down something that you desire to be turned on.
One thing the ETM is programmed to do when a debugger attaches to it is to set some ETM power up enables, which signal the rest of the system to prevent (or emulate) powerdown of the domain containing the ETM. It obviously requires the domain to be up when the debugger attaches to do so. It also requires the system to be monitoring and respecting that signal. It's possible that the collusion of your kernel, DT, SCPI driver and SCP firmware are not entirely respectful of that process.
Lack of modification of the registers usually implies that the component is in reset (it'd be interesting to know if the registers all show 0 or the documented reset valyes), powered down (again, all 0?), or locked (DS-5 debugger does do this, but it also has a 'back door'). Note, I made a small mistake: APB_0<verify=0>:0x2000xxxx should actually have bit 31 set otherwise it may not be bypassing the CoreSight lock (which is not the same as the OS lock!). You should be able to look at the Juno RVC or RCF file within DS-5 or look at the Juno TRM for a description of the "ROM Tables" which will define the APB addresses, which are not the same view as the system addresses.
For your software, as long as the ETM is powered and not in reset, then the CoreSight lock and OS lock are your most likely candidates for any prevention of modification of registers.
Thanks for your reply again. I tried to learn more about the power domains after your last reply. The ETMv4 manual gives 2 examples about trace unit core power domain and PE core power domains, and the memory-mapped programming interface in both example locates in debug power domain. Then i checked the code of Trusted Firmware to find out how can i power on the debug power domain and found that it provides interface to configure the core power state and cluster power state. As you mentioned, the debug logic is in the "big" cluster, so i guess maybe that is so-called "debug power domain" in Juno. However, after i get the power state from each processor, it shows that power states of all 6 processors and their clusters are 0 which is defined as ARM_LOCAL_STATE_RUN. Does that mean all the domains are powered up? My test steps are the following:
1. boot Android
2. launch an Android app
3. use a SMC instruction to enter EL3
4. use psci_get_target_local_pwr_states function to get the core power state and cluster power state of current processor.
5. trigger a secure SGI to all other processors and use the same funtion in step 4 to read the power states.
Moreover, you mentioned that
There are controls in the core and ETM logic which signal the power controller (SCP) to prevent or emulate power down of components,
Is the control a register or something else? I find that register DBGPRCR_EL1 could emulate the power down of the core power domain, and DS-5 modifies this register after connected to the board. However, after i manually config the register, my access to the region remain fails. Did i miss any other controls?
Thanks again for your help!
The ETM Architecture Specification really goes into some good detail on which parts of the ETM (and which registers..) are in which power domains. The important ones, though, are TRCPDSR and TRCPDCR which allow you to check status and control the ETM power domain from the ETM (in order to keep it up). Some of how they react depends on implementation, but you really should set TRCPDCR.PU before trying to access any trace registers that aren't in the Debug domain (i.e. some Management, and all Trace registers).
Do you have a flow of which registers you're accessing, and where you get "stuck"? It would be easier to point out what you're doing wrong than to list all the combinations of possibly accessing the ETM correctly and what behaviours might entail from it (it is an entire chapter in the ETM Architecture).
Thanks for the reply. I have tried to check TRCPDSR (offset 0x314) and TRCPDCR (offset 0x304) before, and the system just stuck there like accessing other registers.
My code looks like,
INFO("value of register TRCPDSR: 0x%x \n", *(uint32_t*)(0x2204000 + 0x314));
As I am working on Cortext-A57 core 0, so the base address is 0x2204000. I did not do anything related to debug or trace before this line. When the processor is executing this line, it just stuck there and no following instructions will be executed anymore. I guess that some errors occurs while accessing the address, but i can not read more information from ESR_EL3 as the JTAG is plugged out.
Moreover, not only the memory region of the trace registers, i tried many different addresses in the coresight debug and trace region (0x2000000-0x23350000), and none of the access succeed. It looks just like that this region haven't been
mapped to memory. I also tried to disable MMU and access the physical address directly, but that did not make sense, either. However, if i plug in JTAG and connect to the board with DS-5, the code works prefect and shows me the value of the registers. So I am really confused about it.
Thanks for your help!
Can you tell us which, for example, Linaro release you're running that got you the Android filesystem and firmware currently installed on the board?
Ta
The version of firmware installed on the board is v1.3.3 (printed in the console when the board reboots). The Android image and Linaro's source code we use is released in Sep 2015 (Linaro Releases). It is a little bit old as we start to working on Juno since last year. Does a upgrade of firmware help to our issue?
Zhenyu,
I wouldn't doubt it at this point. The "doesn't work when I disconnect JTAG" issue is known, we used to hand out a CSAT script which would connect the DSTREAM and power up the DAP (without maintaining a session), which fixed a lot of things but it still requires the JTAG.. Later versions of the SCP firmare (BL2) don't seem to require this at all, and we're certain that there have been Linux patches recently that improve specification of the SCPI power domains for self-hosted ETM usage.
If you have problems after that, then we're happy to help.
Thanks so much for your reply, I will try to transplant our work to the lastest version of firmware.
I have tried the 16.06 release, however, the situation does not change.
To make a easy experiment, i use the following steps:
1. Boot Android
2. Write a simplest kernel module, which use "ioremap" function to remap the memory region. Then dynamically load the module.
Then the system is stuck again. Normally an exception in the kernel module only fails the module, but it is weird that the whole system just keep stuck there.
Moreover, i tried to use ioremap to map a large region like,
ioremap(0x22040000, 0x1000);
or just map a small region like,
ioremap(0x22040314, 0x8);
but the result is similar.
As the cores and the clusters power domains are all powered up, is there anything else we may miss?
Thanks so much for your help!
If you seeing the whole system hang, then it's something to do with SCP firmware.
Can you look at the SCPI firmware version that gets printed out while Linux boots ?
Instead of you trying to manually to ioremap yourself, just enable the self hosted
ETM support in the kernel.
I am not sure of 16.06 release, they are few patches/fixes queued for v4.9 kernel, so
I would suggest to give that a try. It fixes a lot of crashes we have seen so far with ETM.
Hope this helps.
Sudeep
Hi Sudeep,
Sorry for the late response. As the 16.06 release did not solve my issue, i didn't transplant my work to the 16.06 release. So i am now
again working on the 15.09 release. While the linux is booting, it shows something like,
scpi_protocol scpi: SCP Protocol 1.0 Firmware 1.9.0 version
Is this the version your need?
Regarding to the "ioremap", i just use it to perform a easy experiment. In previous experiments, actually i try to access the ETM in EL3 directly with memory-mapping interface after disabling MMU.
Regardless of the SCP firmware version, I couldn't find a Juno reference release on Linaro that actually has the power domains defined in the device tree. Without these it's obvious that there can be no instruction to the driver that a particular device could be powered on or not. You can find some documentation on the SCPI protocol and the Juno implementation on Infocenter in the usual place - it's alongside the Juno documentation.
What you seem to require is some ability for some driver to either directly or indirectly send a "Set Device Power State" request for domain 0 (DEBUGSYS) - once this takes effect, you should be able to use the ETM.
The patches that add that power domain information to the DT, we have already mentioned in this thread. Patches that implement the power domain support, and wire it to Linux 'runtime PM' infrastructure are easy to find from that point:
arm64: Kconfig: select PM{,_GENERIC_DOMAINS} for ARCH_VEXPRESS
firmware: arm_scpi: add support for device power state management
Documentation: add DT bindings for ARM SCPI power domains
firmware: scpi: add device power domain support using genpd
arm64: dts: juno: add coresight support
arm64: dts: juno: add SCPI power domains for device power management
If your kernel doesn't include one or more of these patches (or any dependencies) then you'll have to do some porting. I'd suggest you give a Linaro mailing list a nudge, since I would expect this should be in the Android tree by now.
After that point, I'm afraid I don't have any specific advice on the matter.
Thanks so much for your help!!! I manually checked the source code of 15.09, 16.06, 16.09 release of Linaro, however, none of them contains the patches you mentioned.
But you remind me that i made some mistake in the previous discussion. In the previous discussion, i mentioned that i tried to check the status of cluster power domain and cpu power domain. However, now i know it is completely a fault as they belongs to the "CSS Power State" but not the "Device Power State". So I tried to send the "Set Device Power State" command to SCP as you suggested, and finally the memory region is available now.
Really appreciate for your help for so long time!!!
Best Regards,
SCP is definitely old. The Set Device Power State was supported but lots of issues were
fixed since last December. Since I am not complete aware of Linaro releases, I would
suggest(just for sake of getting the firmware and other setup right), use the latest mainline
(yet to be tagged v4.9-rc1, as there are few coresight driver bug fixes) with latest 16.09
firmware release.
View all questions in Arm Development Platforms forum