1 2 3 Previous Next

ARM Processors

313 posts

Android software frequently sags under the sheer weight of all the different devices it’s required to support. This is because developers can’t fine-tune the performance of their apps and games with the same ease and speed that they can on iOS, where consumer choice over hardware is kept to a bare minimum. Indeed, it can be a major effort to make an Android game run crash-free on popular devices, let alone optimise its frame rate, RAM requirements, battery consumption or other aspects of its usability.

 

There's nothing earth-shattering in these observations, and nothing to make us appreciate Google's operating system any less. What's new, however, is that we're just starting to get a handle on the precise scale of Android's performance deficit relative to iOS, as measured from the perspective of real phone users. This is an important step towards fixing the issue and ultimately making Android experiences more responsive, less resource-hungry and more energy-efficient.

 

Our team at GameBench recently completed a unique comparison between the Galaxy S6 and the latest iPhones, based on how well each phone handles a sample of ten popular cross-platform games. The GS6 is the best-performing Android phone we've tested so far, but we found that it lagged behind the iPhone 6 Plus to the tune of around 5 percent, and behind the regular-sized iPhone 6 by around 15 percent. Other Android phones fared worse: the HTC One M9 and Google Nexus 6 both showed a shortfall of 19 percent, while the LG G4 lagged by 21 percent, compared to the iPhone 6.

Smartphone ranking main chart.png

We think this information is interesting and others do too, judging from the way journalists and product reviewers have responded to it. GameBench's cross-platform comparisons also offer a way to speed up and scale up cooperative efforts between hardware and software engineers across the mobile industry, which is why OEMs, chip designers and game studios are starting to make use of our data and tools. However, the data will only be truly constructive if they're interpreted the right way: not as judgements of hardware or software, but as evidence of how pairings of devices and apps come together to produce good or bad user experiences. This distinction still leaves a lot of people stumped.

 

After we published our last report, we saw plenty of commentators using our work as ammunition to argue that "my phone is better than your phone." Some hardware-centric readers even suggested that our evidence proved certain technical superiorities in the iPhone's GPU, involving its texture compression formats, pixel data storage formats, and the precision of its arithmetic logic units. These notions all ignore the influence of game developers and the software optimisation process, so they are not logically supported by our data.

 

Texture compression is actually one area where the developer’s decisions are crucial to the end result. The developer may choose to use an older (and worse) type of texture compression in their game for the sake of being compatible with older devices, or because they are not aware that better choices were available to them. If this game then looks bad or performs poorly on a very modern device, whose more up-to-date texture compression capabilities are left unexploited, this can't really be blamed on the hardware.

 

Our performance tests don’t apportion credit or blame to hardware factors for the simple reason that our methodology wasn't designed for this. Specifically, unlike traditional hardware benchmarks, we don’t fix the software load that is applied to different devices. We wouldn't even try to control this variable, because doing so would require synthetic workloads rather than the real workloads that we wish to measure (and that users actually care about).

 

To illustrate this point about measuring real workloads, and why this is useful even though it doesn't necessarily identify causal factors, let's look at the cross-platform sci-fi strategy game, XCOM: Enemy Within. From a pure engineering perspective, the iOS and Android editions of XCOM technically constitute different software loads and therefore can't underpin any sort of hardware comparison: they don't have the same code, they don't play at the same resolution and they probably don't exploit available hardware capabilities to the same degree. From a user's perspective however, XCOM is marketed as the same game on both platforms, with the same price tag and the same promise of letting you defend the earth against an alien invasion. So we absolutely can use it to compare user experiences -- and when we do, the results are pretty interesting.

XCOM LG G4 chart.png

GameBench shows that XCOM plays smoothly on the iPhone 6, iPhone 6 Plus and the GS6, at a steady 30 frames per second (fps). On the other hand, the game stumbles along at just 22fps on the LG G4. The game also murders the G4's battery, draining it around 50 percent quicker than it does on the GS6, despite the fact that the GS6's battery has a smaller physical capacity.

 

We can't know from these top-level figures what's hurting the user experience on the G4, but we can be pretty sure it's not just hardware. If we tried to lay it at the feet of the GPU for example, we would then have to explain why the Google Nexus 6 plays XCOM rather better, at 25fps, and with less battery drain, despite having very similar GPU specs and the same display 1440p display resolution as the G4.

 

A whole range of different factors could be at play, but what matters most is that the G4's problem with XCOM is properly highlighted and not just dismissed as a hardware issue. Ideally, it would be investigated through the sharing of performance data between the OEM and the game developer, and then fixed for the benefit of LG customers who want to indulge in some smooth, stutter-free killing of extraterrestrials. If this same approach could be used to assess and optimise many popular device-app pairings on Android, preferably during pre-release testing, then the platform's performance deficit relative to iOS would very likely disappear.

Here is a review I made about TV Box based on S802 Soc. According to benchmark tests the S812 Processor performance is almost the same expect few features.


Review: http://www.androidpimp.com/2015/08/09/android-tv-boxes/amlogic-soc/droidbox-t8s-kodi-tv-box-review

I want to connect webcam camera with arm cortex-m3. and get image from camera to processing in arm cortex-m3.

how to do it? thank you

Many thanks to Martin Weidmann and Chris Shore who provided a lot of the content for this blog. They recently ran a webinar introducing the GICv3 architecture that you can watch by following this link

 

A programmable interrupt controller is an IP block that collates many sources of interrupt one one or more CPU lines, as well as submitting a level of priority to the interrupt outputs. It’s fair to say that almost every SoC needs an interrupt controller to handle all of the interrupt sources. For example, ARM processors only have two interrupt signal inputs whereas a controller can manage much more than that.

 

ARM’s GIC (General Interrupt Controller) architecture provides an efficient and standardized approach for handling interrupts in multi-core ARM based systems. Like the ARM architecture, it is a functional specification, meaning it doesn’t describe the implementation of the architecture, just the programmer’s model and functional model. It defines what registers the interrupt controller has, what they’re for, how software interacts with the interrupt controller, how to program it up, and then to later deal with interrupts.



A generic programmable interrupt controller is a good idea because it simplifies the software involved. For OS development you don’t want to have to write multiple drivers for multiple controllers, especially when it is unlikely to be the differentiating factor in a complex SoC. It makes sense for hardware people and software people to agree on a standard, and a couple of years ago ARM introduced the GIC architecture. This blog is an introduction to the latest version of the architecture; GICv3.

 

It’s designed for multi-core systems where we have a single interrupt controller shared across what could potentially be a large number of cores.

 

 

 

So what's changed in GICv3??

 

 

GICv2 to GICv3.png

A high level view of the new features that GIC v3 provides

 

 

 

The headline change is that GICv3 can support far more than the 8 cores supported in GICv2. In mobile SoCs, support for 8 cores is enough for the vast majority but for areas like servers or networking there is a need to support a higher core count. Managing all of this via one interrupt controller increases system efficiency. I’ll go through the other changes in some detail below. ARM offers an implementation of the GICv3 architecture, the CoreLink GIC-500 General Interrupt Controller, that includes all of the latest updates in the GICv3 architecture. The ARM CoreLink™ GIC-500 supports for up to 128 cores which provides the ability to virtualize up to 480 shared peripheral interrupt signals.

 

 

 

System Register Interface

 

Traditionally in GICv1 and v2, all the GIC registers were memory mapped. That meant when you are programming or handling an interrupt, load store instructions were used to communicate with the interrupt controller. That is still true for many instructions in GICv3, for configuring the interrupt sources, they are mostly memory mapped registers. But those registers used while handling interrupts (ie the ones most commonly used) can now be system registers which means that software, rather than load-store, will perform MSR and MRS instructions. In order for these to be supported there needs to be support built into the core. ARM processors such as the Cortex®-A72, Cortex-A53 and Cortex-A57 all have the required support.

 

 

What's the advantage?

 

There is an advantage to change the most commonly used memory maps to system register interfaces as it provides certainty for the registers’ location (ie they won’t be at different addresses in different systems). Also, in certain cases it gives a more reliable set of timings than going through every memory interface.

 

 

Security Groups

 

In GICv3 there are three security groups which are configured individually for each interrupt, an increase from the two that were in GICv2. This change provides a better match with the ARMv8-A architecture in the security and exception model. Just as before there are secure and non-secure interrupts based on different OSes, but there is a new state that lets you distinguish between interrupts for the Secure Monitor and interrupts for the Secure Kernel.

 

Now that the CPU interface is a part of the processor, we can be a bit more intelligent with how we handle processor exceptions (FIQ and IRQ). In GICv3, FIQ becomes either an interrupt for ER3, or an interrupt for non-cant security state.

 

 

Group 0

Secure Group 1

Non-secure Group 1

Group 0 interrupts are always secure

Signalled as FIQ if core is in Non-secure state

Signalled as FIQ if core is in Secure state

Signalled as FIQ, regardless of current Security state

Signalled as IRQ if core is in Secure state

Signalled as IRQ if core is in Non-secure state

Typically used for interrupts for the firmware running at EL3

Typically used for interrupts for the trusted OS

Typically used for interrupts for the rich OS or Hypervisor

 

 

 

 

 

Below is a diagram that gives a bit of a visual example into how this could all work together

 

GICv3 Routing Example.png



**EDIT** This routing example diagram may not represent the typical routing model for some Trusted OS implementations. In Trusted OS, Non-Secure Group 1 interrupts can be routed to Secure-EL1 (not EL3) when executing in the secure state. This is to allow the Trusted OS take action (e.g. move the pre-empted task to a different core) before allowing the normal world to handle the interrupt. An SMC interface is used by the Trusted OS to notify the normal world that it has been pre-empted and by the normal world to tell the Trusted OS to resume its pre-empted activity.

 

 

 

Support for Larger Systems

 

GICv3 introduces Redistributors that hold the settings for the private interrupts. These are interrupts that are private to the core so each core can have its own settings for those interrupts. There is a Redistributor per-connected core, holding settings for private interrupts (PPIs and SGIs).

 

 

Why do we do this?

In short, it allows for distributed designs with Redistributors kept close to each target core. If you have a large design with many cores, private interrupts will typically be generated close to the core that they are private to. We don’t want to transport this onto the other side of the chip where the interrupt controller is located to work out all the settings, it’s inefficient. The redistributors allow for distributed design where the logic is physically close to the CPU and can hold the settings close to its connected core and reduce the travel distance for these private interrupts.

 

 

 

Affinity Levels and Routing

 

One of the consequences of this is that we need to rethink how we target interrupts at a particular core. One way of doing this is affinity, which is a way of identifying a core in a hierarchical structure. It’s a matching system for each core made up of four 8-bit fields (similar to an IP address) and has been changed to increase the scalability of the architecture, allowing for the handling of far more cores than GICv2 was capable of. Shared peripheral interrupts can be separately configured to be sent to any connected core, or to a specific core by using its affinity coordinates.

 

 

 

Message-based Interrupts

 

The new architecture adds support for message-based interrupts whereby instead of requiring a dedicated signal, a peripheral writes a register in the GIC to register an interrupt. It’s done to reduce the number of wires needed for all of the interrupt signals and ease routing congestion. It’s a form of future-proofing for when systems become larger and the number of interrupt sources increases accordingly.

 

The Interrupt Translation Service is another way of improving the efficiency of message-based interrupts, acting as a dispatcher and also mapping interrupts to INTIDs and Redistributors.

 

 

GIC Interrupt Translation Service.png

 

 

Why use an ITS?

 

To move a block of interrupts from one RD to another is inefficient as users have to update each individual peripheral with the new redistributor’s address, which could be time consuming if there are a lot of interrupt sources being moved.

 

Instead all peripherals can be configured to write to the ITS, which forwards them on to the correct place.  Moving interrupts from one RD to another then becomes just a matter of updating the ITS’s tables.  The ITS also handles the issuing of the commands to RDs which are required to move an interrupt. It’s controlled by a command queue in memory, and software can map or remap interrupts by adding commands to the queue.

 

 

GICv3 is a scalable architecture

 

There are a number of new innovations in the GICv3 architecture that offer a scalable infrastructure for the interrupt handling of larger systems. The changes will help to manage an increased core count, increased message congestion and will provide a faster and more efficient way for interrupts to be handled across SoCs. The ARM CoreLink GIC-500 is an interrupt controller designed by ARM that harnesses all of the benefits of the GICv3 architecture to improve interrupt efficiency and allow for greater virtualization on-chip. For more information you can visit the ARM website or you can still register to view a webinar delivered by ARM’s Chris Shore and Martin Weidmann that goes into more detail and answers some viewer’s questions.

In a lot of ways debug is similar to being a medical doctor. A patient comes in with some complaints and lists their symptoms, but you need to run tests in order to properly diagnose the issue before focusing the mind on how to fix it. A lot has been written and discussed in the past about debugging hardware, but most of the attention is dedicated to the pre-silicon stage when issues can be identified close to the source and rectified before it is too late. These bugs are similar to performing an autopsy on a body, sifting through all of the potential clues to narrow down what has gone wrong, and how it can be rectified. Bugs that are found in the silicon itself are typically much more difficult to identify, and can drain an enormous amount of time and resources to fix properly. Today I will speak about silicon debug, the challenges associated with it and what can be done to improve it.

 

 

Computer chip.jpgSilicon autopsies require care and preparation to find out the true bug diagnosis

 

 

 

 

 

Silicon debug challenges

 

 

In the past, it was possible to use logic analyzers to gain visibility to interfaces when CPUs, buses, GPUs, memory controllers, etc. were separate components.

Tracing of interface signaling could be used to determine which component was not being responsive.

Data corruption could be traced and isolated to a component. With a logic analyzer you could see a cycle trace to the bus for example to see why the circuit is hanging. Through this you could isolate the problem down to a single bus. Narrowing things down is obviously critical to figuring out the real cause of the problem and how to fix it.

 

 

Contrast that to today’s silicon products which are highly integrated SOCs that have very little visibility. Two of the typical debug techniques are unsuitable here for the following reasons:

  • Using IO pins for additional visibility is cost prohibitive in terms of both die area and package costs.
  • External buses such as DDR, PCIe are running at very high frequency and require very expensive bus analyzers with probes that may be intrusive or are required to be soldered to the board

 

 

It doesn’t leave you with many options, indeed the only debug visibility that some users are able to provide is single chain scan. Single chain scan is accessed through the JTAG IEEE 1149 interface and provides full visibility of the signals in the chain at a single clock snapshot of flip flop state, making it extremely useful for debugging lock ups. However there are issues with single chain scan. It is not always being implemented or is not functioning when it is, there can be signal name to flip flop mapping issues or signal name polarity issues.

 


Another useful method is by using the ARM DS-5™ Development Studio. DS-5/DSTREAM is a powerful tool for debugging silicon failures, but there are some cases that require more debug observability. In short, there is a growing need from SoC designers for more visibility into what’s happening on-chip.  than code trace, breakpoints, watchpoints, single step, etc.

Two areas that are especially challenging to debug are lockups and data corruption.


Lock ups are cases where the CPU(s) will not halt, making it impossible to determine the PC, registers, and there is no code trace available. Lock ups could be caused by software (access to powered down device) or hardware problems.


Data corruption simply means incorrect or invalid data (although not related to ECC failures). Normally you can spot the corruption with a print statement or debugger, but the source of corruption usually occurs much earlier in time than the detection. Often the use of breakpoints, single step and watchpoints may be intrusive to the failure. Some examples of this are a FIFO overrun, or a data path problem where we just don’t get the right data. The fact that we pick up on the problem much later than the source of corruption makes things quite challenging.

 

To summarize, far more visibility is needed to address the problems associated with silicon debug.

 

 



What can be done to improve debug visibility?



A current way of thinking to address this issue is to place the logic analyzer capability on the silicon. For this method to work it needs to fulfil a number of pre-requisites:

  • Needs to operate with other ARM CoreSight debug components
  • Must be small enough to not cause IP area growth
  • Cannot measurably affect battery life
  • Must be able to operate over a large frequency range
  • Needs to be supported with DS-5 software
  • The definition and support of the connection of debug signals

 

ARM’s solution to improve debug visibility is the CoreSight™ ELA-500 Embedded Logic Analyzer, which is a CoreSight component that can be connected to ARM IP and other IP blocks. It is programmable over debug APB via debugger or CPU for trigger condition setup and can:

 

Generate trigger from one of up to 12x128-bit signal groups via assertion styled conditions.

Trigger conditions are built by using trigger state transitions, event counting, comparators for criteria evaluation, and signal masking.

 

Trace capture selected signal group in embedded SRAM (configurable size) for later analysis and/or waveform capture over time. It also supports trace filtering.

 

Trigger amongst other ELAs and SoC components over CoreSight cross trigger interface matrix.

 

You can find out more about the CoreSight ELA-500 in my colleague William Orme's blog Taking the fear out of silicon debug. In it he explains how the ELA-500 connects to the Cortex®-A72 processor to increase the amount of visibility on-chip.

  

 

 


Integration improvements



Integrating the CoreSight ELA-500 in the IP has several benefits:

  • Adding the debug signal ports to the IP RTL is fast
    • Port extractor and LEC scripts are more labor intensive
    • The logical and physical locations of the ELA in the IP are the same

 

 

When it comes to implementation by the IP team, they reap the benefit of answering all the placement questions earlier to understand timing paths and routing congestion, such as ‘Can we connect more debug signals?’ and ‘Is the ELA too large?’. There are also valuable discussions regarding specification and  physical IP when integrating, what tools and compilers work best to suit the requirements.

 

 

 

 

 

Recommendations for SoC Design Teams



SoC designers around the world and across many segments all want improved hardware debug capability. It’s an area that grows in the amount of money and time spent on minimizing the risk of bugs in an SoC.

It is a great benefit to be able to debug silicon issues. In rare cases silicon failures can be caused by IP bugs, so understanding the cause is important. Anything that can move you closer to root cause is a huge benefit. The way to better understanding of debug and new innovations comes one step at a time, so any new information can be most valuable.

 

 

 

 

 

Prior Planning and Preparation Prevents Poor Performance

 

 

While ARM products such as DS-5 and the CoreSight ELA-500 make it a lot easier to identify and remove bugs in silicon, it is becoming necessary to include sufficient hardware debug capability as part of the product plan/requirement. Adding debug support requires more effort if the project has already started. To use one example, the effort to create and document a port puncher script internally can be much greater than adding debug ports to the IP RTL (two months versus one week).

 

Finally, as well as setting aside resources for debug, designers should also plan a debug strategy for visibility that takes into account:

  • Isolation of failures
  • Considers partner usage
  • Consider visibility for complex,  problematic logic that may have had several bugs found by verification
  • Review past errata

 

 

 

The understanding of the complexities of the human body increased dramatically when pioneers such as Leonardo da Vinci started to perform autopsies. Thankfully in this day and age, silicon autopsies are legal and indeed encouraged by the chip design community. In the rare case that silicon failure does happen, having the capability to take a deep look at the root cause of the issue is invaluable to preventing that type of problem from happening again.



Further Information

CoreSight Debug and Trace - ARM

CoreSight ELA-500 - ARM

CoreSight on-chip debug and trace (Infocenter)

How to debug: CoreSight basics (Part 1) 

We recently welcomed Hardent in Montreal into the Approved Training Center (ATC) program as a provider of ARM training to the developer community in NA and Canada.

 

Developers in the area will want to check out an excellent opportunity to meet with them and local distributor Joral Technologies and the first meeting of the Toronto Keil/ARM User Group on July 22nd. More details here: KEIL/ARM TORONTO USER GROUP MEETING. With an update on the latest in ARM tools and an introduction to the Cortex-M7 microcontroller, it promises to be a very useful event. And it's free!

 

Chris

This is the third in a series of blogs that gives a technical introduction to the ARM CoreSight debug & trace technology and architecture. You can check out my previous blogs How to debug: CoreSight basics (Part 1) and How to debug: CoreSight basics (Part 2) to find out the full story.


 

Typical CoreSight systems

The systems shown here demonstrate the most basic configurations of a CoreSight system. More complex systems might involve clusters of processors, multiple clock domains, etc.

 

 

 

Single processor debug

 

Figure 1 shows CoreSight debug in a single processor system.

 

 

CoreSight Figure 2.png

Figure 1 Single processor with Debug APB access

 

 

 

This configuration provides no trace capabilities. The DAP shown here is configured with a combined Serial Wire and JTAG external interface, and APB internal debug access. The Debug APB connects using an APB-Interconnect to configure the CTI and access the processor. The CTI supports triggering of the processor from a designated resource, and enables connection to additional triggering resources if this example is integrated into a larger system.

 

 

 

Single source trace

 

 

Figure 3 shows a single processor trace using the CoreSight infrastructure.

 

 

 

CoreSight Figure 3.pngFigure 3 - singe source trace with the TPIU

 

 

 

The CoreSight-compliant ETM trace unit outputs trace directly to a TPIU for direct output of trace off-chip. You can extend this system to add a CoreSight ETB and replicator to provide on-chip storage of trace data.

 

 

 

Multi source trace in a single processor system



Figure 4 shows full trace capabilities in a single processor system.

 

 

CoreSight Figure 4.png

Figure 4 - Full CoreSight trace with single processor

 

 

 

The ETM trace unit provides processor instruction and data tracing, and the STM provides instrumentation trace. The trace funnel combines trace from all sources into a single trace stream. This is then either:

 

 

• Replicated to provide on-chip storage using the CoreSight ETB (limited capacity)

• Output off chip using the TPIU (limited bandwidth)

 

 

You can program components using the DAP and operate cross-triggering using the CTM and CTIs.

When multiple trace sources are active in the system, each source must be configured with a unique trace source ID, and every trace sink must have trace formatting enabled. One function of the trace formatter is to embed the trace IDs in the final data stream. When only one trace source is active, the trace sink can be used in bypass mode which can be more efficient in some scenarios.

 

 

 

System topology restrictions

 

 

The CoreSight architecture includes some rules which restrict the system topology. These rules allow for system-agnostic debug tool design and topology detection. Violating the topology rules might also result in deadlock or livelock conditions.

Some rules relate to the debug memory map, which is limited to any path from external interface to peripheral only crossing 3 levels of protocol addressing (external interface, subset of debug interconnect, address within interconnect) and this addressing not having any replication or aliasing. Restrictions on the trace bus require no duplication or re-use of any trace ID which reaches any other trace component, or feed any trace source back in a feedback loop.

 

 

 

 

Trace Capture

 

 

The trace that CoreSight trace sources generate must be captured by one or more Trace Capture Devices (TCDs). The following common forms of TCD exist:

 

 

• On-chip trace buffer.

• Off-chip logic analyzer.

• Off-chip dedicated Trace Port Analyzer

 

 

Logic analyzers are expensive and are less well supported by development tools, but can often capture trace at higher speeds than is possible with a Trace Port Analyzer (TPA). Most developers capture trace using a TPA or on-chip trace buffer.

 

The CoreSight ETB and Embedded Trace Router (ETR) are ATB slaves and connect to the CoreSight system directly to enable capture of trace data on-chip. A TPA, or logic analyzer, must connect to the pins of a trace port that a TPIU drives.

 

Many systems implement either one ETB or one TPIU. However, it is possible to implement multiple trace sink components using a CoreSight Replicator.

 

Figure 5 on page 14 shows a system that implements an ETB and a TPIU connected to a TPA.

 

 

 

CoreSight Figure 5.png

Figure 5 - Example system with ETB and TPIU 5.1.1 Operation of a TCD

 

 

A TCD has a large circular buffer at its center. Trace is written into this buffer as it is generated. Trace capture does not stop when the buffer becomes full, but instead overwrites old trace.

A TCD is sensitive to two special signals, that the ETB or TPIU generate:

 

 

• Trigger.

• Trace disabled.

 

 

A TPIU indicates these signals to a TCD as follows:

 

• Using the optional TRACECTL top level pin. This is the easiest way for a TCD to detect this information, but requires a dedicated pin when trace is in use.

• Using the CoreSight formatter protocol. This requires a TCD that can extract this information from the formatter protocol, and results in a trace port that is one pin smaller. There is a protocol overhead cost (at least 6%), but this is offset by freeing up one more pin. The formatting protocol also permits the use of more than one enabled trace source at a time.

 

 

Trigger

 

 

The trigger is an input to the trace sink, and an output from a CTI. If there is more than one trace sink, each can receive a different condition as its trigger. Most trace sources, for example an ETM trace unit or AHB Trace Macrocell (HTM), can output a signal to use as a trigger. Usually, the CTIs are configured to send a trigger to all trace sinks when any trace source signals its trigger condition.

 

When a trigger is detected, the TCD counts a programmable number of trace records before it stops trace capture. After this point, it ignores any more trace. By setting the appropriate number of programmable trace records, you can select a window of trace to capture around the trigger condition. Figure 6 shows this context.

 

 

CoreSight Figure 6.png

Figure 6 - Use of the trigger to set a trace window

 

 

 

You can configure the trigger to output when the system detects a bug. The window of trace indicates the behavior of the system before and after the bug occurred.

You can use the trigger count in the following ways:

 

 

• Set the trigger count to a small value. This gives a window of trace mostly before the trigger occurred, capturing the software bug under investigation.

• Set the trigger count to a value slightly smaller than the size of the buffer. This gives a window of trace mostly after the trigger occurred.

• Set the trigger count to roughly half the size of the buffer. This gives a window of trace before and after the trigger occurred.

 

 

When trace capture has stopped, the development tools download the trace from the TCD.

 

 

 

Trace disabled


Trace disabled indicates to the TCD that there is no trace to capture. It ensures that the values of the trace port pins are only captured when trace data is available. The formatting protocol can also indicate that there is no data to be captured by using a specific sequence, but again this requires on the TCD being able to perform some analysis of the stream before it is captured.

 

 

 

 

Streaming Trace Capture

 

 

Usually, the ETB, ETR, or TPIU wait until there is sufficient trace to use all the pins of the trace port before any trace is captured in the on-chip memory or output over the trace port. For example, if only one byte of trace is available in a system that implements a 16-bit trace port, no trace is output until a second byte of trace is available. In addition, when the formatting protocol is in use, a full block of 16 bytes must be captured before the data can be fully decompressed. This complicates the task of designing a trace capture system where data must be continuously streamed and analyzed in near real time. Different approaches to this problem can be used depending on the system requirements, and are unlikely to detract from the user experience when streaming trace is expected.

 

 

Trace Capture Capacity

 

 

A trace capture system is likely to be one of the limiting factors determining how much trace can be generated. The resources dedicated to trace capture are likely to be limited, and it is important to ensure that the typical use-cases can be supported with a low enough level of data loss. Although CoreSight is designed with graceful degradation in the case that more trace is generated than can be captured, this should not be relied on. Careful use of filtering will result in more useful trace being captured than relying too much on the overflow/recovery behavior.

The demands of a trace source can vary greatly, an ETM trace unit might produce between 1 bit per instruction for instruction only trace, or over 30 bits per instruction when tracing instructions and data. Even if the data to be traced can be filtered, this might not help much for short-term bursts of data so an on chip trace FIFO can help. For more complex trace systems, this becomes more of a cost-effective solution as the resource added is shared between more of the trace logic. The user can select which trace source needs most bandwidth, but still enable a smaller amount of trace from several other sources, or use the other sources as triggering resources.

 

 

 

Trace Synchronization


Most trace sources use complex protocols which rely not only on identifying the correct packet boundaries in the protocol, but also initializing the various decompression schemes. When the trace capture formatter protocol is in use (as is necessary for simultaneous capture from more than one source), the formatter protocol requires synchronization too.

A TPA will typically capture trace into a circular buffer. This means that if capture is stopped once the buffer has wrapped round, some early trace will have been lost. In order to decompress the trace stream, the tools must search the buffer until a synchronization point can be detected. Any trace which was captured but is before the synchronization point must be discarded (usually the synchronization cannot be extended backwards). Since it is inefficient to synchronize each trace stream too frequently, most trace sources allow for software programming of the synchronization points.

Depending on the quantity of trace being captured, it might be necessary to change the synchronization period. When capturing into a small buffer, more frequent synchronization results in a higher proportion of the captured trace being usable (but more use of the buffer for non-useful trace).

In systems where several trace sources are active together, the synchronization of each source is independent. Some trace sources support the use of a distributed synchronization request to be generated from the TCD. This ensures that all trace sources initiate their synchronization sequences at the same time.

 

 

 

Timestamps

 

Many trace sources can embed global (SoC level) timestamps in their trace stream. These can be used to correlate activity between different traces sources, particularly when the trace data might be captured in different TPAs, or subject to delays as a result of protocol or buffering.

Timestamps are typically a 64 bit count, derived from an always on domain with a frequency of at least 10 MHz. The timestamp distribution mechanism uses a narrow bus to distribute this count value, and an interpolation mechanism to generate corresponding count values at higher resolutions where the count needs to be used. This provides a trade-off where the ordering between events in a well designed system can be determined, at least to the accuracy of any communication between the CPUs originating the events. Timestamps can also be used for performance measurement, as an alternative to the more precise but more bandwidth intensive cycle counts which some trace sources can insert.

 

Thank you for reading the above blog. You can find out more about CoreSight on the ARM website at CoreSight Debug and Trace - ARM and if you have any questions or opinion then please leave a comment below

FreeBSD is an advanced UNIX-based operating system used to power modern servers, desktops and embedded platforms; it has a long history in the Networking and Storage worlds used by companies like Juniper Networks and NetApp as well as many many others. Linux has been more popular in recent years and has seen a broad adoption in not only servers and the datacentre but also with mobile and embedded platforms. However, BSD is built on solid foundations and offers a good alternative and have an active developer community.

 

The FreeBSD community has had support for 32bit ARM for some time (specifically ARMv6 & ARMv7) and supported various platforms as documented on their wiki, thanks to a valiant few within the community. ARM and Cavium started working with the The FreeBSD Foundation in October 2014 to enable a port of the FreeBSD operating system to ARMv8, specifically AArch64, to help bootstrap this effort

 

Andrew Turner, has been a long time FreeBSD developer and committer. He started working on porting to AArch64 in his spare time in the summer of 2014 and was an ideal choice for the FreeBSD Foundation to work full time on the ARM port. In addition to Andrew, the Foundation worked with SemiHalf who have a wealth of experience both with ARM and our partners, both in Linux and FreeBSD. AArch64 upstream contributions to the main FreeBSD repository (HEAD) started in April 2014. The porting effort has been carried out on a variety of platforms - ARM Foundation Model, ARM Juno development board, QEMU emulator, Cavium Thunder Simulator and Cavium ThunderX Reference Board. As of April 2015, the University of Cambridge to port Dtrace and Hardware Perfomance Counters, enablling these on AArch64.

 

bsdcan2015.pngThe FreeBSD community held their annual North American conference, BSDCan, in Ottawa in June 2015. This is the largest gathering of not only FreeBSD developers and users, but also members of the wider BSD community including NetBSD and OpenBSD. With over 280 attendees, it was the perfect place for all involved to show the toils of their labour. There was a working group round-table discussion around the porting effort and where SemiHalf demonstrated FreeBSD running on a 48core Cavium ThunderX platform, Andrew Turner presented the status and efforts of porting FreeBSD to AArch64 and Citrix's Julien Grail presented on Running FreeBSD under the Xen Hypervisor.

 

There were several platforms available to see at the conference, Cavium brought out an example of their single socket 48core ThunderX platform  and a dual socket 96core platform, and Andrew Turner showed off his diminutive by comparison HiKey board from 96Boards. With platforms like those from 96Boards becoming more broadly accessible, the developer community will be able to more easily use and test FreeBSD and contribute to the upstream developments.

AllBSDhw.png

The FreeBSD community is looking at including AArch64 support as a Tier1 architecture in the FreeBSD 11 release. More information on FreeBSD on ARMv8 can be found on the FreeBSD wiki, alternatively if you have questions or you wish to participate in the efforts please reach out to the developers either on IRC #freebsd-arm64 on Efnet or on the mailinglist, where developers will be more than happy to respond.

I'm doing a series of blogs that give a technical introduction into ARM CoreSight debug and trace technology. If you missed the first part, you can find it here: How to debug: CoreSight basics (Part 1)

Please let me know if you have any comments or questions, I'll be happy to address them in the comments section below.

 

 

 

 

Processor Trace Architectures

 

 

The ETM and PTM trace units are trace sources that monitor ARM processors. Each ETM trace unit and PTM trace unit is associated with certain processor lines, and each ETM and PTM  implementation conforms to certain ETM and PTM architectures. The architecture consists of a generic programmers model and a trace protocol.

 

 

ETMv1, ETMv2

The earliest ETM architectures, representing internal processor pipeline status in a cycle by cycle basis. No longer in common use.

 

 

ETMv3

Major revision to earlier protocols, implementing a byte-based packet protocol and the first ETM protocol to support CoreSight. Supports instruction by instruction execution and data transfer trace, depending on the processor.

 

 

PFTv1

Derived from ETMv3, providing only trace of branch execution and exceptions. Supported by

Cortex-A9, Cortex-A12 and Cortex-A15

 

 

ETMv4

A major revision of the earlier protocols, supporting advanced processor architectures. Includes the instruction execution trace style of PFTv1, and optionally ETMv3 style data trace capabilities. Supported by Cortex-R7, Cortex-A53 and Cortex-A57.

 

 

Within a CoreSight system, any processor trace units supporting ETMv3, PFTv1 or ETMv4 architectures can operate in combination.

Most processor trace units provide a single ATB output bus (either 8 bit for the Cortex-M variants, or 32 bit). This carries both instruction trace, and data trace if supported. Some R-class processor trace units are unusual in providing a 32 bit ATB interface for instruction trace and a 64 bit ATB interface for data trace. This reflects the high cost of implementing data trace for a high performance processor, and also the need within some real-time application segments to support high-quality data trace capture.

 

 

 

 

 

Debug access and DAP topology

 

Traditional SoC debug used a JTAG interface to connect to a TAP controller in the processor. Where multiple processors are present, the JTAG scan chain would cascade the TAP controller of each processor, possibly through multiple clock and power domains.

Access to system memory would be achieved by halting the processor and downloading instructions while halted to cause the processor to perform the necessary memory accesses.

The DAP introduced by the CoreSight architecture moves the primary point of connection away from the individual processor, and implements a bridge between the external protocol and various different on-chip protocols. This provides a flexible and scalable solution where this bridge point can remain powered and responsive irrespective of the activity of individual processors.

Figure 1 shows a view of the components which are visible in the debug memory mapped space with their discovery registers. Registers provide identification and address offset details. Remember that  the DAP will be multiplexed with accesses from the main system interconnect too.

 



CoreSight blog diagram.png

 


 


 

Debug Port

 

Every DAP requires a Debug Port (DP). This is the master device, and implements the external interface. Debug ports supporting both JTAG and optimized 2-pin Serial Wire interface can be licensed from ARM.

The debug port provides:

  • always-on connection for the debugger
  • debug fault and status reporting
  • power and reset request interface

 

 

Debug port accesses from the external debugger are performed as 32 bit (word) read or write transactions, targeting either DP registers, or Access Port (AP) registers. Multiple Debug ports (usually in multiple packages) can be addressed from a single external debug agent using:

  • daisy chained JTAG scan chain
  • star topology JTAG scan chain
  • multi-drop serial wire

 

 

 

Access Port

 

Each DAP contains between 1 and 256 Access Ports (APs). The APs are controlled by the DP in response to external commands. Most APs implement a master port which interfaces to an on-chip standard bus interface. Memory APs exist for memory-mapped interfaces such as APB, AHB and AXI interconnects. A JTAG-AP can be used to interface the DAP to a traditional JTAG TAP controller. Customized access ports can also provide a simple interface to dedicated chip-level debug logic.

Memory APs provide the following features:

  • Target address register
  • Read or write to target address
  • Bus error reporting
  • Transaction in progress status
  • Address incrementor (to accelerate block read/write operations)
  • Access control mechanisms
  • Information about connected debug components
  • Perform access appearing as system master, or external debug agent.

 

 

 

DAP Address Space

 

Any individual memory mapped address in system memory might require several accesses to enable the correct path, and requires more than simply the target address in the on-chip memory map:

  • DP Identifier: The debug agent might support concurrent access to more than one DAP.
  • AP Select: The target AP must be selected by writing to a register in the DP.
  • TAR Select: The target address must be set by writing to a register in the AP. Each AP can have a unique view of some or all of the memory mapped components in the target system.
  • Data Access: Once all the addresses necessary for a DAP access to the system are set, a request to the AP can initiate the on-chip access as either a read or a write.
  • Read Data retrieval: Although the on-chip access will now proceed, the debugger must perform another access to the DAP in order to retrieve the data value. This need not result in a second on-chip access.

 

 

When an access fails for some reason, the debugger is able to identify the failure. Usually the debugger can re-try the access and recover from simple errors on the interface.

 

 

 

Debug Memory Map Views

 

Both externally hosted debug agents and on-chip debug agents (for example a debug monitor) require access to debug components. Within CoreSight, these debug components are provided on a dedicated bus, the debug APB. This ensures a clear separation between system memory space and debug memory space. An exception is the Cortex-M processors where a shared AHB interconnect supports both system memory and debug access as an area-reduction trade-off.

An on-chip agent must first navigate the system memory bus before being multiplexed with the DAP initiated transactions on the Debug APB. This provides two memory mapped views, one from the external debugger and one from the on-chip agent. Both views share access to the debug  components using the same address offsets within the mapped regions. The system view of the debug APB will typically have a non-zero base address whilst the external debugger view uses a base address of zero.

The upper address bit (PADDRDBG31) is only accessible from the external debugger and serves as an access control mechanism.

 

 

 

 

Debug Memory Discovery and ROM Table Entries

 

Every CoreSight component with an APB memory map occupies one or more 4kB blocks of memory. Within this block, CoreSight defines the content of some discovery registers. You can see the CoreSight TRM on ARM Infocenter for each individual component for specific details. The discovery pointer structure is shown in Figure 1 above, some examples of the individual registers are shown in Table 2 below.

 

 

 

 

Name/Offset

Example Values

Description

DEVTYPE

0xFCC

0x00000016: Processor Performance

monitor

0x00000013: Processor Trace unit

Only used by CoreSight debug

Can classify  unknown ‘new’ components

PID4

0xFD0

0x04 : 4kB component, ARM

Size of address block, and part of designer ID

PID3,PID2,PID1,PID0

0xFE0-0xFEC

0x004BB906 : ARM CTI rev4

0x003BB912 : ARM TPIU rev 3

- Unique part identifier consisting of Designer (via JEP106 code)

- 3 digit part allocated by designer

- Part revision

- Part ECO identifier

- Part modified

CID3,CID2,CID1,CID0

0xFF0-0xFFC

0xB105900D : CoreSight Debug

0xB105100D : CoreSight ROM Table

Component identifier, indicates if the CoreSight layout is used. Other values might be used by ARM PrimeCells and

other components.

 

Table 2 – Example CoreSight discovery registers

 

 

 

 

At least one ROM table component must be present as a slave to any AP which contains debug components. This will be the APB-AP, or AHB-AP in the case of a Cortex-M system. Each ROM table contains a list of address offsets which can be used to locate component base addresses. These components can themselves be ROM tables, but each physical component or ROM table must appear only once in the expanded list of pointers.

 

The AP contains a base address register which must point to the master ROM table for that bus. Typically, this will occupy the lowest 4k block of the address space. The ROM table is a CoreSight component, and contains standardized identification registers. It also contains an identifier for the SoC as a whole which can be used by debug agents to look-up against a database of known devices. This lookup can provide information about SoC specific features.

 

Typically the ROM table hierarchy will match the design hierarchy of modules containing debug APB. In this way, larger systems can be constructed from sub-systems and clusters. As a result, the debug APB is often sparsely populated.

Hi,

 

One of the topics that is of growing interest is to use a hypervisor on an applications processor alongside a TrustZone based TEE.   This new white paper from Mentor give a great introduction to the topic.

http://s3.mentor.com/public_documents/whitepaper/resources/mentorpaper_87069.pdf

 

I would add that the growing popularity of ARM Trusted Firmware

ARM-software/arm-trusted-firmware · GitHub

makes the integration of these systems much easier than it used to be.  On ARMv8-A (64/32-bit architecture) based platforms we have a new exception level (EL3) which is typically used for Trusted Boot and a small run-time doing the world switch, PSCI, interrupt routing etc.   ARM Trusted Firmware provides a reference implementation for this EL3 code and has been ported to many platforms including our own Juno development board.

 

Regards,

 

Rob

Let's be honest, debug can be a bit of a pain. At the best of times it's a nuisance and in the worst case scenario a complex web of wires that need to be configured properly in order to diagnose and solve your SoC design problems. A study conducted by Cambridge University found that the global cost of debugging was $312bn in 2013, a figure that undoubtedly has risen in the past two years. With this much money and effort dedicated to this part of SoC design, it is necessary to be as efficient as possible when debugging. CoreSight technology from ARM provides solutions for Debug and Trace of complex SoC designs. It can take years to become an expert in the finer details of CoreSight, but in this series of blogs I intend to provide readers with a starting point to understand the concepts which will help you to work with CoreSight. Like any good technical introduction, let's start with some definitions

 

 

Debug: This refers to features to observe or modify the state of parts of the design. Features used for debug include the ability to read and modify register values of processors and peripherals. Debug also includes the use of complex triggering and monitoring resources. Debug frequently involves halting execution once a failure has been observed, and collecting state information retrospectively to investigate the problem.

 

Trace: CoreSight provides features which allow for continuous collection of system information for later off-line analysis. Execution trace generation macrocells exist for use with processors, software can be instrumented with dedicated trace generation, and some peripherals can generate performance monitoring trace streams.

Trace and Debug are used together at all stages in the design flow from initial platform bring-up, through software development and optimization, and even to in-field debug or failure analysis.

Historically, the following methods of debugging an ARM processor based SoC exist:

 

 

Conventional JTAG debug (‘external’ debug)

This is invasive debug with the processor halted using:

• Breakpoints and watchpoints to halt the processor on specific activity.

• A debug connection to examine and modify registers and memory, and provide single step execution.

 

Conventional monitor debug (‘self-hosted’ debug)

This is invasive debug with the processor running using a debug monitor that resides in memory.

 

Trace

This is non-invasive debug with the processor running at full speed using:

• A collection of information on instruction execution and data transfers.

• Delivery off-chip in real-time, or capture in on-chip memory.

• Tools to merge data with source code on a development workstation for future analysis.

 

 

CoreSight technology addresses the requirement for a multi-processor debug and trace solution with high bandwidth for entire systems beyond the processor, despite ever increasing SoC complexity and clock speeds. Efficient use of pins made available for debug is crucial.

 

CoreSight provides:

  • A library of modular components and interconnects.
  • Architected discovery and identification methods to allow for flexible system design and easy inclusion of differentiated debug/trace functions.
  • A standard implementation of the ARM Debug Interface for debug tools to work with.

 

 

 


Elements of a CoreSight design

 

The CoreSight architecture introduces a number of key concepts which together enable complex systems to be designed. Standardized programming models and feature discovery registers allow debug tools to be largely generic with minimal dependence on the feature set of an individual SoC.

 

 

Debug Access Port

The Debug Access Port (DAP) is present on any SoC which presents a physical port to be connected to external debug tools. The DAP is an implementation of the standardized ARM Debug Interface, and provides a bridge between a reliable low pin count interface and on-chip memory mapped peripherals. Check out my next blog for more details on the DAP. Transactions generated by the DAP are referred to as External Debugger Accesses.

The DAP provides (amongst other things) architected top level control for debug domain power control, and fast code download direct to system memory.

CoreSight components implement memory mapped interfaces, but the DAP can also act as a bridge to an on-chip JTAG scan chain where necessary for legacy components. This gives increased flexibility and power savings when working with multiple clock and power domains on the SoC.

 

 

Self Hosted Debug

Most processors have direct access to their own debug resources by using dedicated instructions. In addition, it is common for most processors on a SoC to have access to some or all of the remaining debug components. Exact details vary, but there is typically a region in the system memory map which is multiplexed with external accesses to the debug components. Self hosted debug is typically managed by debug monitor software running on either the target processor or a second processor in the SoC. Access control mechanisms are provided to permit interworking between an external debugger and self-hosted debug such that the external debugger does not need to be aware of the actions of the debug monitor.

Save and Restore sequences can be used by on-chip software to maintain the debug state across power-down cycles, and provide the illusion to the external debugger that the SoC remains powered on. This is particularly important for debug of battery powered devices where infrequent events are being monitored.

 

 

Discovery using ROM Tables

All CoreSight systems will include at least one ROM table. This serves the purpose of both uniquely identifying the SoC to an external debugger, and allowing discovery of all of the debug components in a system. Discovery relies on the use of identification registers at architected positions in the memory map of every debug component. All CoreSight components use this standard. This permits discovery sequences of identify at least a sub-set of the feature-set without detailed knowledge of every component. For both external debug, and self-hosted debug, there is a pointer to the address of the top-level ROM table from that debug agent. The ROM table provides a list of address offsets which can be used to locate the next level of component. Components can be ROM tables again, or individual components. Provided the system complies with the rule that each component is only referenced once in the ROM tables and there are no loops, it is possible to identify all the debug components which are accessible to each debug agent.

 

 

 

Processor debug and monitoring features

 

The exact features vary between processor design, and can also vary from one implementation of a processor to another. Processors typically provide a halting debug mode (where architectural state can be observed) and single step execution. Also common are breakpoint units and Performance Monitoring Units (PMU). CoreSight provides an Embedded Cross Trigger mechanism to synchronize or distribute debug requests and profiling information across the SoC.

 

 

Cross Triggering

CoreSight Embedded Cross Trigger (ECT) functionality provides modules for connecting and routing arbitrary signals for use by debug tools. Wherever there are signals to sample or drive, a Cross Trigger Interface (CTI) is used to control the selection of which signals are of interest. Most systems will implement a CTI per processor, and at least one CTI for system level components. The CTIs in the system are interconnected using a Cross Trigger Matrix (CTM) which distributes any selected input events across the SoC to every CTI. Each CTI is programmed to use these distributed events to drive local control signals.

For processors and ETM trace units, the event connections to the CTI are standardized (although this does vary from processor to processor, as described in the processor documentation). Typical connections are listed below.

 

 

 

 

Source

Destination

Example use case

Trace logic External Outputs (4 bits)

CTI Trigger inputs

Trace logic resources to trigger trace capture or debug

Trace logic External Outputs (2 bits)

PMU inputs

PMU counters to extend trace logic counters

PMU Events (~30 bits)

Trace logic External inputs

Filter trace based on processor events such as cache miss

PMU overflow

CTI Trigger inputs

Forward PMU counter overflow to interrupt controller or other clusters

Processor Debug Restart

CTI Trigger input

Synchronized debug restart across clusters (supporting halt and restart)

Trace Buffer Full

CTI Trigger input

Halt processor on trace buffer full

CTI Trigger Output

Processor interrupt input

Cause interrupt based on input to CTI or other CTI in system

CTI Trigger Output

Processor Debug Halt Request

Enter debug state based on input to CTI or other CTI in system

CTI Trigger Output

Trace Port Trigger request

Indicate trace trigger to trace capture device

Table 1 - Cross Trigger Connections

 

 

 

Trace Sources

CoreSight technology provides a standard infrastructure for the transmission and capture of trace data (presented as arbitrary streams of bytes). This allows for optimum sharing of common resources. Various trace sources are available:

 

 

Processor Trace Units

Processor debug is implemented by Embedded Trace Macrocells (ETM trace unit) or Program Trace Macrocells (PTM trace unit) depending on the target processor. Each ETM trace unit or PTM trace unit is specific to the processor it is designed for.

The feature set varies depending on the use cases anticipated for the different processors, but all CoreSight ETM and PTM trace units which use an AMBA Trace Bus (ATB) output can be combined in a system. Trace units might support the following:

 

 

  • Processor execution trace in varying degrees of detail
  • Resource logic, often useful as an extension to processor performance monitoring resources
  • Filtering logic to reduce the amount of non-interesting data which is captured

 

A common feature of trace units is efficient compression and encoding, relying on a copy of the executed code for decompression. Using halting debug, it is possible to extract the code image from program memory.

 

 

 

Instrumentation Trace Units

 

The instrumentation trace and system trace units provide the ability for running software to be instrumented with messaging (either by the programmer, or through a tool flow). This is more intrusive than using processor trace, but provides information at a higher level. The instrumentation trace macrocells are typically mapped into system memory. Tightly coupled Instrumentation Trace Macrocells (ITM) exist for some processors, the System Trace Macrocell (STM) is a more generic version which can be used in any system.

 

 

 

Trace (ATB) interconnect

 

One advantage of using a standard trace bus protocol is that a small set of modular components can be used to generate sophisticated trace infrastructure. These components include bridges for timing closure, clock and power domain crossing, replicators and funnels which can be used to combine data streams, and buffer components. Upsizers and downsizers are used to convert busses of varying data width. A key feature of the AMBA Trace Bus (ATB) is that the trace source identification is passed with the data, permitting cycle by cycle interleaving of trace data from different sources. CoreSight trace interconnects provide the following features:

  • Backpressure to stall a trace source based on the ability of downstream infrastructure to collect data
  • Flushing of any data stored in intermediate buffer components through the interconnect
  • Transfer of byte orientated data, agnostic to the underlying data protocol
  • Synchronisation request distribution

 

 

 

Trace Sinks

 

A trace sink is the final CoreSight component in a trace interconnect. A system can have more than one trace sink, configured to collect overlapping or distinct sets of trace data. Trace sinks can stream data off chip, provide a dedicated buffer, or route trace data into shared system memory. These different solutions cover a wide range of latency and bandwidth capabilities

 

 

 

Stay tuned for my next blog which looks at processor trace architectures and debug access ports. If you have any questions then please leave a comment below, I'll get back to you ASAP!

‘I can hear it buzzing in the air tonight …’ Ok, so I took a little poetic license with the lyrics of Phil Collins’ classic hit ‘In the Air Tonight’, but it made for a more interesting opener than ‘Hi, my name is…’. What is this guy droning on about you may be asking. Sit tight…

 

This week at Freescale FTF Americas 2015, our new Kinetis V series of ARM® Cortex®-M class MCUs set embedded motor control on a new heading. With thousands of discerning customers to impress, Freescale sets the bar high when it comes to FTF product demos and ‘cool’ factor tops the requirements list. Freescale’s motor control demo vault is filled with a vast array of industrial and appliance type creations as those have historically represented the biggest slice of the motor control pie. While such demos have performed admirably for many years and continue to do so, we thought that it was time to look further afield for an application befitting of our first ARM-Cortex-M7 based MCU and FTF’s 10-year anniversary. Motor control is after all, the largest consumer of electricity on the planet so presumably there must be an unknown talent out there just waiting for its turn in the FTF spotlight.


Men-less machines

After many hours of coffee and doughnut fuelled deliberations, the demo team settled on a drone. Why drones? The short answer is “because washing machines don’t fly very well”. No, in all seriousness there are a number of reasons why the drone, UAV or quadcopter was deserving of its place. Firstly, it ticks the ‘cool’ box. Defying gravity is always a neat trick but it now comes with added flair with drones now able to perform all types of aerial wizardry. Secondly, spinning multiple motors accurately is a task that our Kinetis V series MCUs take in their stride so it showcases the MCU’s talents to good effect. Technical prowess box ticked. Thirdly, is the potential business opportunity associated with it. What began as a hobbyist play-thing is now rapidly transitioning into a viable commercial market of sizeable proportions and increasingly diverse end applications. These magnificent “men-less” flying machines are finding new destinations on an almost daily basis including aerial surveying of structures and farmland, cargo transportation to remote communities, and even shark spotting in California. The market is still at an embryonic stage with many regulatory hurdles to clear, but all the signs are that it won’t be long before drones will be delivering pizza to your house (hopefully not dumped on your roof ‘Breaking Bad’ style). In short, this application is….wait for it…..taking off.


4 x 8-bit MCU = 1 x 32-bit Kinetis V MCU

When it comes to motor control, Freescale’s expertise is par excellence. Naturally I’m biased, but decades of new product development, turn-key customer projects for the industry’s ‘big players’, and a vast library of sophisticated enablement software speaks for itself. With that in store and Kinetis V series MCUs ‘straining at the leash’, the drone demo project was seized upon. Against an aggressive schedule – typical of every trade show demo request from marketing I expect – the development team was set in motion. Motor control passions were ignited and soon propellers would (hopefully) begin turning.


The drone selected was the DJI Phantom1 – a workhorse of the market and hence a suitable platform with which to test our V series MCU’s credentials. Propeller guards were also purchased to avoid any unfortunate finger incidents – 8K rpm blades can cause quite a nip.

 

The Embedded Speed Controller (ESC) modules were the target area – four per drone and each controlled by one 8-bit 8051 MCU. A new consolidated ESC design was manufactured using one KV5x ARM Cortex-M7 MCU where previously there had been four 8-bit MCUs. Leveraging the MCU’s agile performance and highly integrated motor control peripherals – 240MHz ARM Cortex-M7 core, high resolution PWMs, multiple high speed ADCs (5Msps) and its inter-peripheral crossbar – a four off 6-step BLDC control system was implemented. This required approximately 50% of the KV5x MCU’s CPU performance leaving additional bandwidth to implement field oriented control (FOC) and flight stability control functions in future ESC designs.

 

An additional KV4x ARM Cortex-M4 version of the ESC was built to demonstrate the unique scalability that the Kinetis V series brings to this and countless other motor control applications. One MCU family – multiple end products, with scalable form, functionality and price.

By coincidence, the demo team discovered a parallel drone project using the Freescale Analogue Product Group’s new gate driver IC. A quick decision was made to join forces and include the GD3000, adding further power control capability while replacing several additional discrete components. More BOM cost reduction. With schematics drawn up, PCBs populated and software tested, the result was 2 new custom made ESC modules, and thankfully no missing fingers.

 

Drone 2Drone 1

 

Up, up and away

The Kinetis V series drone made its maiden flight in the TechLab of Freescale FTF Americas 2015. Unfortunately, for logistical/safety/legal reasons, drones can’t be flown freely in such built up areas (hotels owners aren’t keen on drones ‘buzzing’ their elaborate ballroom chandeliers), hence it was temporarily caged it within a safety cabinet. However, it won’t be long before it’s fully airborne and appearing on the Freescale Internet of Tomorrow Trucks, at Design with Freescale seminars and at multiple other locations around the globe. So look out or should I say look up, for a Kinetis V series drone coming to a city near you soon.

It is often said that the best things in life are those that move us. While my pun-infused ramblings might not, I’m fairly confident that the Kinetis V series will. If not, try Phil

 

Danny Basler is a product marketer in Freescale’s Microcontroller Product Group

William Orme of ARM, Nick Heaton of Cadence Design Systems and Brian Choi of Samsung Electronics all participate in a panel discussion on 'How to address the challenges of IP Configuration, SoC Integration and Performance Validation', hosted by Sean O'Kane at the Chip Estimate booth at 52DAC this month.

 

In a forthright and honest discussion, the three talk about some of the challenges related to SoC design, including the fact that SoCs are increasing in size and complexity, as well as the rise in modular subsystems and virtualisation.

 

Brian is a principal engineer at Samsung working on SoC integration with a focus on improving the design cycle, making the validation of a SoC faster and better. He also has previous experience as an SoC architect. In the discussion he shares that for Samsung in the mobile market, customer requirements can change quite a lot over the course of the project, which naturally causes a change in the specs, even halfway through the chip being built. We need a methodology that incorporates new IP during a project and means you don't have to start again from scratch. It’s a challenge right now for silicon providers and OEMs.

 

Nick is a distinguished engineer wtih Cadence R&D, working on architecting and innovating products that help in the SoC space. He has a background in advanced verification and has been looking at applying technologies to SoC integration to help solve that problem. One of his key points from the discussion is that timescales have stayed the same for design projects over the last 5-10 years: 6-9 month projects even though the complexity is growing massively. Naturally that has an effect, as with the shortening cycle of IP release it puts so much pressure on handling this complexity and getting the integration right the first time to avoid unnecessary and costly waste.

 

William has a long background with ARM, beginning life as a hardware design engineer before spending some time doing software design and now has a strategic role within System IP marketing, looking after product lines such as interconnect, debug & trace and IP tooling. As a way of solving the problem, William notes that we have moved to an adoption of IP standardisation. Using intelligent system design tools we can now reflect the configuration state of the IP and are able to share this with tools vendors, verification and other stakeholders to make the design process faster.

 

 

On IP configuration - "we want to abstract away a lot of the details as things get more complex, use algorithms that embody the design intelligence to attack the complexity issue."


On IP integration - "algorithms can help shape what Brian spoke about in terms of changing specs and design reuse. You can create the algorithm and design rules to check that the configurations match across different views, for properties and interfaces etc, and if they fir the requirements, and show users what has been set up in a way that can actually be understood through visualisation like schematics, micro-architectures and so forth."

 

Find out more in the full video of the discussion below

 

 

 

 

 

 

 

I hope you enjoy, and leave any comments or questions in the sections below

Without motor control, our homes and lives would be far less convenient than they are today – we’d still be washing our clothes by hand, cooking over open fires and desperately searching for the nearest ice cave in which to chill our beers. Outside of the kitchen the effect would be equally troublesome – HVACs replaced by hand fans, garage door/gate opening would require manual labor (shriek with horror!), and filtering the pool/jacuzzi would take months with only that lukewarm beer in hand to dull the pain. Jokes aside, motors are a BIG deal and represent a huge area of opportunity for electronic control using microcontrollers (MCUs), which bring increased automation and energy efficiency benefits to the appliance.


Within such applications, the MCU performs several functions. Its timers generate up to 6 channel PWMs which drive, via an inverter stage, the AC motor’s 3-phases that essentially make the motor spin. Analogue to Digital (ADC) module(s) are used to measure the various phase currents to track the speed and/or position of the motor as it rotates, known as sensor-less feedback control.


Several household applications also use 2 motors: washing machine (big drum and pump), dishwasher (sprays the water and drains), fridge/freezer (compressors, air-flow to stop frost), and HVAC/air conditioner (compressors and air flow). Many MCUs contain two sets of 6 channel PWMs allowing them to drive two inverter stages and in turn spin two motors. Generally, the sensor-less monitoring of speed and position is completed using one or two ADC modules. Sensor-less speed algorithms work with less errors if they can simultaneously acquire two of the phase currents at specific times of the PWM period, but error adjustments can be made if only one ADC module exists, and the two phase currents are measured back to back. For driving a dual motor control application with two ADC modules, the application can assign one ADC per motor and include some error correction in the speed calculation. Alternatively, the dual motor drives can be synchronized by having one set of PWMs 180 degrees out of phase from the other, and making use of both ADCs for both motors by assigning different input channels. An MCU with four ADC modules allows true, independent dual 3-phase motor control which helps simplify application code and minimizes acquisition errors. The trade-off that often arises in such integrated solutions is the cost of having four ADC modules, versus the level of power efficiency savings that will impact the end consumer.

MotorImage_KV-Series.jpg

The latest member of Freescale’s fast emerging Kinetis V series of ARM® Cortex®-M class MCUs – the Kinetis KV5x MCU family – is well equipped to handle the demands of multi-motor applications. With multiple timers, four high-speed ADCs (sampling at up to 5 Msps), and a 240MHz capable Cortex-M7 core, fully independent sensor-less control of two 3-phase motors can be accomplished with ease. With CPU MIPS to spare, the KV5x MCU can also perform other functions including adding secure internet connectivity via its on-chip Ethernet, multiple CAN and UARTs, and Encryption modules. With the embedded market currently ablaze with IoT (Internet of Things) concepts, the opportunity to remotely monitor and manage countless motorized appliances in the home and beyond can now be realized from the comfort of our armchairs, workplaces or further afield. So you can rest easy, thanks to the humble MCU it should be some time yet before we need to search for that elusive ice cave.


To learn more about Freescale’s motor control solutions, Kinetis V series MCUs, visit: freescale.com/Kinetis/Vseries

#KinetisConnects


Dugald Campbell is an MCU systems architecture engineer for  Kinetis MCUs

The Freescale Technology Forum (FTF) has always been one of my favorite events and this year was certainly one to remember. FTF 2015 was held in the music capital of the world, and Freescale's own backyard, Austin Texas. There was a great mix of technical tracks with over 350 hours of technical content, a fantastic keynote speaker in Steve Wozniak, live music, and of course lots of fantastic ARM technology!

 

Freescale has one of the broadest ARM portfolios ranging from some of the smallest 32bit micro-controllers (KL02) all the way up to the latest ARMv8 64bit Cortex-A57 based devices (LS2080), and everything in between, all being showcased in the technology lab. Here are a few of my favorite demos:

 

Freescale Kinetis V MCU family

Freescale have just recently launched the newest member of the Kinetis V family, the KV5x MCU. The KV5x implements the Cortex-M7 processor and targets Motor Control and Power Conversion applications. There were several demos showcasing the KV5x, by my favorite had to be the quadrocopter where a single KV5x was being used to control all 4 motors. Unfortunately I wasn't allowed to fly it, probably to the benefit of everyone in the tech lab! There was also another KV5x demo where Freescale was showcasing the Kinetis motor suite which can be used to simplify the development of advanced motor control functions. Using the Kinetis motor suite I was able to successfully use a KV5x to drive a washing machine motor simply by following the on screen prompts!

 

kv5x_banner.jpgkv5x_copter.jpg

 

Networking with QorIQ Layerscape

There were a lot of Networking demos on display in the tech lab including one demo showing 2 of Freescale's new ARMv8 based Layerscape parts; the LS2080, and LS1043. The demo allowed you to run benchmarks on one or more CPUs and displayed the benchmark result as well as the power consumed during the run. The best part was that the demo allowed you to set the number of cores used to actually run the benchmark, which in turn allowed you to observe the power consumption of the CPUs. Is was great to see that enabling a second Cortex-A53 had less of an effect on power consumption than the variation in successive demo runs. Also seeing the raw performance of 8 Cortex-A57s was equally impressive.

 

LSbench.jpg

 

Internet of Things - Devices and Gateway

I also found the LS1021A-IoT Gateway demo interesting as it showcased how an ARM powered IoT edge devices such as a Freescale Kinetis based device can talk to an ARM powered LS1021 gateway. This demo really drove home the fact that the IoT is being enabled by ARM technology.

 

gateway.jpg

Automotive Vision Processor - Heads Up Display (HUD)

Freescale are one of the largest providers of Automotive semiconductors on the planet and ARM is well represented in Freescale's automotive product portfolio. The newly announced Cortex-A53 based S32V vision microprocessor was on display in the tech lab as well as the Cortex-A5, Cortex-M4, and Cortex-M0+ based MAC57Dxx which was powering an instrument cluster with a secondary display for powering a windshield HUD.

 

20150623_140100.jpg

 

And lets not forget the ARM Booth where we displayed the latest in ARM development tools working with lots of new Freescale silicon. My favorite demo was DS-5 displaying heterogeneous debug support for both the i.MX 6 Solo X as well as the i.MX 7.

 

Ronan.jpg

 

 

And lets not forget the no-hands race car simulation inspired by Arrow's Sam Car which really stole the show!

 

SamCar.jpg

 

There were so many more great demos, and I haven't even gotten to the ones in the IoTT truck that was parked around back! With this much going on, it is easy to get overwhelmed. Luckily for us, Freescale make their keynote and technical session content available after the show via their FTF site linked below.

 

FTF 2015 | Freescale Technology Forum - FTF 2015

Filter Blog

By date:
By tag: