This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

About watch point debug excption on Cortex-A53

Now we are researching watch point function on A53. We simply write a driver, hook debug exception handler aml_watchpoint_handler instead of default watch point handler.

In our watch point handler, we first disabled watch point control, then handle debug exception, after handler finished, we re-enable this watch point control and exit exception.

Theoretically when we trigger a watch point event once, watch point exception should taken once. but we found if we re-enable watch point at the end of handler, A53 CPU re-entered this exception again and again and this event will last a long time, some times it will not end. If we don't re-enable watch point in exception handler but re-enable it in a work_struct after a short time sleeping(1ms is enough), then this exception can only take once. But this method may cause lost of watch point event during disable period.

We don't know the detailed behavior of watch point exception and why watch point exception will enter again and again and how to exit watch point handler safely. Who can help us for this issue?

  • Watchpoints are synchronous and precise on ARMv8, which means they are taken in the same way as an MMU fault, *before* the access is visible to the memory system. If you just return from the watchpoint handler, it will return to re-execute the instruction that triggered the watchpoint, and naturally trigger it again.

    The normal way to deal with this is:

    * Disable the watchpoint.

    * Single step the instruction (using Software Step exception).

    * Re-enable the watchpoint.

  • I found it is a bit difficult for using software step exception.

    I also wrote a handler and hook default software step handler in Linux kernel. But if I re-enable watch point in software step exception handler, watch point event still trigger again and again.

    Can you help to explain how to use software step exception with more detail information?

  • Are you able to confirm that, after stepping the instruction, the PC has advanced from the instruction that triggered the watchpoint?

  • "*before* the access is visible to the memory system"

    Michael:

    This is very helpful.
    However, is this info discussed in ARMv8 document? We encountered an issue that requires understanding if the exception is generated before or after the access is visible to the memory system. I could not find it in ARMv8 document. Maybe I missed it??

    Cheers,
  • In previous architecture versions there were different behaviors depending on version and implementation choice, meaning this was explicitly called out in the Architecture Reference Manual. For ARMv8 this was made consistent, and the description of the differences removed. If you know where to find it (and there's no reason you should!) you can find this still in section D1.13 -- Watchpoints are synchronous exceptions and synchronous exceptions have this property. You can also infer it if you follow through the pseudocode description. But I will raise a ticket to see how this might be made clearer.
  • Here is the log:

    [ 161.707466@0] ---- watch point 0 triggered, watch addr:ffffffc00256b500 ----
    [ 161.708870@0] [4500]sh, fault addr:ffffffc00256b500, esr:d6000062, mdscr:a000
    [ 161.715936@0] Call trace:
    [ 161.718532@0] [<ffffffc0018f225c>] dbg_en_store+0x50/0x68
    [ 161.723875@0] [<ffffffc001492424>] class_attr_store+0x3c/0x54
    [ 161.729567@0] [<ffffffc001238e50>] sysfs_kf_write+0x58/0x74
    [ 161.735087@0] [<ffffffc00123d15c>] kernfs_fop_write+0xf8/0x154
    [ 161.740865@0] [<ffffffc0011c1108>] vfs_write+0xac/0x1b4
    [ 161.746039@0] [<ffffffc0011c1b44>] SyS_write+0x50/0xb0
    [ 161.751126@0] aml_watchpoint_handler, pstate:20200145, mdscr:a001
    [ 161.757166@0] aml_single_step_handler, addr:ffffffc00256b500, esr:ce000022, awp:ffffffc05604f900
    [ 161.765875@0] aml_single_step_handler, pstate:200001c5, mdscr:a000
    [ 161.772121@0] ---- watch point 0 triggered, watch addr:ffffffc00256b500 ----
    [ 161.778989@0] [4500]sh, fault addr:ffffffc00256b500, esr:d6000062, mdscr:a000
    [ 161.786057@0] Call trace:
    [ 161.788655@0] [<ffffffc0018f225c>] dbg_en_store+0x50/0x68
    [ 161.793996@0] [<ffffffc001492424>] class_attr_store+0x3c/0x54
    [ 161.799688@0] [<ffffffc001238e50>] sysfs_kf_write+0x58/0x74
    [ 161.805207@0] [<ffffffc00123d15c>] kernfs_fop_write+0xf8/0x154
    [ 161.810989@0] [<ffffffc0011c1108>] vfs_write+0xac/0x1b4
    [ 161.816160@0] [<ffffffc0011c1b44>] SyS_write+0x50/0xb0
    [ 161.821247@0] aml_watchpoint_handler, pstate:20200145, mdscr:a001
    [ 161.827288@0] aml_single_step_handler, addr:ffffffc00256b500, esr:ce000022, awp:ffffffc05604f900
    [ 161.835997@0] aml_single_step_handler, pstate:200001c5, mdscr:a000
    [ 161.842238@0] ---- watch point 0 triggered, watch addr:ffffffc00256b500 ----
    [ 161.849112@0] [4500]sh, fault addr:ffffffc00256b500, esr:d6000062, mdscr:a000
    [ 161.856179@0] Call trace:
    [ 161.858774@0] [<ffffffc0018f225c>] dbg_en_store+0x50/0x68
    [ 161.864117@0] [<ffffffc001492424>] class_attr_store+0x3c/0x54
    [ 161.869809@0] [<ffffffc001238e50>] sysfs_kf_write+0x58/0x74
    [ 161.875330@0] [<ffffffc00123d15c>] kernfs_fop_write+0xf8/0x154
    [ 161.881110@0] [<ffffffc0011c1108>] vfs_write+0xac/0x1b4
    [ 161.886281@0] [<ffffffc0011c1b44>] SyS_write+0x50/0xb0
    [ 161.891368@0] aml_watchpoint_handler, pstate:20200145, mdscr:a001
    [ 161.897409@0] aml_single_step_handler, addr:ffffffc00256b500, esr:ce000022, awp:ffffffc05604f900
    [ 161.906118@0] aml_single_step_handler, pstate:200001c5, mdscr:a000

    You can see pstate and mdscr print value, before leaving watch point hander, single step is enabled. In  single step handler,  single step is disabled, but after leaving single step_handler, this watch point is triggerd anain and again. Please help to have a look.