1 2 3 Previous Next

SoC Implementation

133 posts


Join industry experts and a handful of ARM Partners at the Winchester Mystery House in San Jose, California (great venue for the subject) on Tuesday, October 14th as they unravel the strange and wonderful secrets of semiconductor intellectual property at Unlock the Mystery of IP. This free one-day conference will address cutting-edge semiconductor technology, market trends and projections, and challenges facing players in the IP industry.

Jim Feldhan, President of Semico Research Corporation, will ground the day's programming with two data-rich keynote presentations. Throughout the day, speakers will share 30-mintue “deep-tech” presentations on today’s cutting-edge IP products to equip attendees with the knowledge they need to make informed decisions for their next design projects. Then finally, there will be two panel discussions that will address today’s hottest topics: the Internet of Things (IoT) and IP subsystems.


  • "IP Subsystems: Build or Buy?"
    • Moderator:
      • Gabe Moretti, Extension Media
    • Panelists:

If it couldn't get any better, a hosted bar and hors d'oeuvres networking reception will conclude the action-packed day. To register and to view the complete agenda, visit the IPextreme website.

The ARM Cortex-M7 processor is out, developed to address digital signal control markets that demand a blend of control and signal processing capabilities. The ARM Cortex-M7 has been designed with a variety of highly efficient signal-processing features to address the design demands of market applications such as next-generation vehicles, connected devices, and smart homes and factories.


In many of these end markets, engineering teams demand:

  • Maximum performance within power budgets
  • Maximum power savings targeting a given frequency


These are significant challenges to address, so how do we deal with them?

(Cadence recently published a white paper that details the challenges and some solutions. It described ways in which Cadence and ARM worked to optimize power and timing closure in the ARM Cortex-M7.)


We start by identifying and confronting the issues. Let’s take dynamic power, for example. Dynamic power is the largest component of total chip power consumption (the other components are short-circuit power and leakage power). It can quickly become a design challenge in leading designs.


Then there are timing-closure challenges. One fundamental timing closure issue is the modeling of physical overcrowding. Among other things, this problem can be addressed by deftly managing layout issues (such as placement congestion, overlapping of arbitrary-shaped components, and routing congestion).


K.T. Moore, Group Director in Cadence’s Digital and Signoff Group, said:

“Closure requires a different way of thinking. You have to consider multiple constraints in the closure process with a unified objective function in mind. This is easier said than done because many constraints conflict with each other if you simply address their effects only on the surface.”         


In the past, teams relied solely on post-route optimization to salvage setup/hold timing in tough-to-close timing situations. But now we can rely on in-route optimization to bridge timing closure earlier during the routing step itself using track assignment.


In addition, opportunities exist to reduce area and gate capacitance in other ways.

The Approaches     

Among several methods, the team explored placement optimization using the GigaPlace engine, available in Cadence Encounter® Digital Implementation System 14.1. GigaPlace places the cells in a timing-driven mode by building up the slack profile of the paths and performing the placement adjustments based on these timing slacks.


The team also trained its sights on using in-route optimization for timing optimization to help hit the final frequency target.


Lastly, the team introduced the “dynamic power optimization engine” along with the “area reclaim” feature in the post-route stage. These options saved time and cut by nearly half the gap that earlier existed between the actual and desired power target.


By the end of this exercise, the team achieved power savings greater than 35% on the logic (excluding constants like such as macros and so forth).


For complete details, check out the detailed white paper here.


Brian Fuller

Related stories:

-- Whitepaper: Pushing the Performance Boundaries of ARM Cortex-M Processors for Future Embedded Design

--Cortex-M7 Launches: Embedded, IoT and Wearables

--New Cortex-M7 Processor Balances Performance, Power

--The new ARM® Cortex®-M7 »

Registration: Xilinx University Program : Workshops Schedule

Venue: Cidade De Goa, Goa, India


Date & Time: 8 AM - 4 PM, Tuesday, December 16th 2014


The ARM University Program and Xilinx University Program (XUP) will be conducting a faculty workshop around the ARM SoC Design Lab-in-a-Box (LiB). The Lab-in-a-Box is about designing an SoC around the ARM Cortex®-M0 DesignStart™ Processor Core with Peripheral Interfaces using the AHB-Lite Bus. This LiB targets courses such as Design of Digital Systems or Embedded System Design using FPGA and is part of ARM University Program's ongoing commitment to share ARM technology with academia globally. The LiB has been designed with academics in mind and will allow participants to gain first hand experience not only on how to teach the material in their own courses, but also the essentials of hands-on SoC Design. The workshop will cover topics such as: Designing AHB-Lite Compliant Hardware Peripherals such as Memory, UART and GPIO, to name a few; Integrating these Peripherals around the ARM Cortex-M0 core and Implementing the SoC on FPGA.


Workshop Agenda

  • Introduction to ARM Cortex-M0 DesignStart Processor Core
  • Overview of AHB-Lite (AMBA 3) Bus Protocol
  • Xilinx Artix-7 FPGA Architecture
  • Xilinx Vivado Design Flow
  • Simple AHB-Lite Peripheral Design and Integration
  • Introduction to UART Peripheral
  • Introduction to Interrupts and CMSIS
  • Integrating UART Peripheral and Interrupt
  • Snake Game Application Demo



  • Faculty attendees must come with their own laptops running Windows OS and already installed with Keil MDK-ARM. This can be downloaded at Keil MDK-ARM.
  • Each Faculty attendee must individually register at the DesignStart Portal using an Official University Email ID for the "ARM Cortex-M0 DesignStart Processor" IP download instructions and then download the IP to her/his laptop before coming to the workshop.
  • Knowledge of embedded system design and experience in programming microcontrollers will be helpful.
  • Attendees are expected to make their own travel and stay arrangements.


If you require any further information please write to university@arm.com. We look forward to seeing you on the 16th of December, 2014!

This week is the 10th year for ARM Techcon, which has evolved into the best place for all things related to ARM technology. I will be attending this year, and giving a presentation on Friday at 3:30 titled “Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models”.


Based on the agenda this year, ARMv8 will be one of the primary topics. For the past few years there have been presentations about ARMv8, but it’s clear many people now have hands-on experience and are ready to share it at the conference. To get warmed up for ARM Techcon, I will share a couple of fun facts about 64-bit Linux on ARMv8.

Swap & Play Technology


One of the differentiating technologies of Carbon SoC Designer is Swap & Play, which enables a system to be simulated with ARM Fast Models to a breakpoint and saved as a checkpoint. The simulation checkpoint can be restored into a cycle-accurate simulator constructed with models built directly from the RTL of the IP. The most common use case for this technology is running benchmarks on top of the Linux operating system. Swap & Play is attractive for this application because the Linux boot is only a means for setting up the benchmark, and the accuracy is critical to the benchmark results and the system performance analysis. It may seem strange to simulate Linux using cycle accurate models because it requires billions of instructions, but there are times when being able to run Linux benchmarks on accurate hardware models is invaluable. In fact, this is probably required before a chip can complete functional verification.


swap n play2 resized 600


One of the useful features of the ARM® Cortex®-A50 series processors is the backward compatibility with ARMv7 software. I have had good results running software binaries from A15 designs directly on A53 with no changes. Mobile devices have even started appearing with A53 processors that are running Android in 32-bit mode which have the possibility of upgrading to 64-bit Android in the future.


One of the reasons we always focus on the system at Carbon is because today's IP is complex and configurable, and this can lead to integration pitfalls which were not anticipated. Take for instance 64-bit Linux on ARMv8. It would be a reasonable assumption that if a design has A53 cores and is successfully running 32-bit Linux, it should be able to run 64-bit Linux just by changing the software.


Below are a couple of fun facts related to migrating from 32-bit Linux to 64-bit Linux on ARMv8 to get warmed up for ARM Techcon 2014.

Generic Timer Usage


The A15 and A53 offer a similar set of four Generic Timers. Many multi-cluster A15 (and even some single cluster) designs have used the GIC-400 as an interrupt controller instead of the internal A15 interrupt controller, so the update to A53 seems straightforward to change the CPU and run the same A15 software on the A53 in AArch32 state.


It turns out that 32-bit Linux uses the Virtual Generic Timer and 64-bit Linux uses the Non-Secure Physical Timer as the primary Linux timer. From a hardware design view, this probably doesn’t matter much as long as all of the nCNT* signals are connected from the CPU to the GIC, but understanding this difference when doing system debugging or building minimal Linux systems for System Performance Analysis is helpful. As I wrote in previous articles, architects doing System Performance Analysis are typically not interested in the same level of netlist detail that the RTL integration engineer would be performing, so knowing the minimal set of connections between key components in the CPU subsystem needed to run a system benchmark is helpful.


Below is a comparison of the Generic Timers in 32-bit and 64-bit Linux. The CNTV registers are used in 32-bit Linux and the CNTP registers are used in 64-bit Linux. The CTL register shows which timer is enabled and the CVAL register having a non-zero value indicates the active timer.


timer compare resized 600

Processor Mode


The reason for the different timers is likely because 32-bit Linux runs in supervisor mode in the secure state, and 64-bit Linux runs in normal (non-secure) mode. I first learned about these processor modes when I was experimenting with running kvm on my Samsung Chromebook, which contains the Exynos 5 dual-core A15. I found out that to run a hypervisor like kvm I had to start Linux in the hypervisor mode, and the default configuration is to run in supervisor mode. After some changes to the bootloader setup, I was able to get Linux running in hypervisor mode and run kvm.


It may seem like the differences between the various modes are minor and unlikely to make any difference to the system design beyond the processors, but consider the following scenario.


Running 32-bit Linux on A53 in AArch32 state runs fine using CCI-400, NIC-400, and GIC-400 combined with some additional peripherals. The exact same system would be expected to run 64-bit Linux without any changes. What if, however, the slave port of the NIC-400 which receives data from the CCI-400 was configured in AMBA Designer for secure access? This is one of the three possible configuration choices for slave ports. Here are the descriptions of AMBA Designer choices for the slave port:

nic 400 slave


If secure was selected, the system would run fine with 32-bit Linux, but would fail when running 64-bit Linux because the non-secure transactions from the A53 would be presented as secure transactions to the GIC (because of the NIC-400 configuration) and would result in reading wrong values from GIC registers such as the Interrupt Acknowledge Register (IAR) when trying to determine which peripheral is signaling an interrupt. The result would be a difficult to debug looping behavior in which the kernel is unable to service the proper interrupt. All of this because of a NIC-400 configuration parameter. For more information on the NIC-400 design flow, a recording of the recent Carbon Webinar is available.



As you can see, seemingly minor differences in the processor operating mode between 32-bit and 64-bit Linux can impact IP configuration as well as connections between IP. These are just two small examples of why ARM Techcon 2014 should be an exciting conference as the ARM community shares experiences with ARMv8.


Make sure to stop by the Carbon booth for the latest product updates and information about Carbon System Exchange, the new portal for pre-built virtual prototypes.


Jason Andrews

That's right, designers! If you'd like to see some stuff for FREE and even get a few free lunches at ARM TechCon 2014, you need to register in advance and use code ARMEXP100 to get a free Expo Pass. With that pass you can attend:

See you there at ARM TechCon 2014.


Time flies: 10 years ago, ARM rolled the first of what's become the storied line of Cortex-M processors (chart right). The launch of that family couldn't have been timed better, starting as it did just as the world of system design moved toward mobile solutions that were both power and size constrained. Today mobile, wearables and IoT are huge and show no sign of slowing. Consider that during those 10 years, more than 8 billion Cortex-M cores have shipped, more than half of them in the past 18 months, according to Thomas Ensergueix, ARM senior product marketing manager.


The latest in the series was announced this week, when ARM rolled the Cortex-M7 with an eye toward that always tricky balance between a design team's need for performance and its power constraints.

ARM Cortex-M7The 32-bit device doubles the compute and digital signal processing (DSP) capability of existing (and powerful) ARM-based MCUs while keeping power under control.

Using a Cadence implementation flow, design teams can wring even more power optimization from the M7 and tackle tough parasitic extraction issues along the way. We'll post about that in detail next week during ARM TechCon.

For now, here's the latest on the M7 launch, announced Sept. 24:



Market applications for the ARM Cortex-M7 include next-generation vehicles, connected devices, and smart homes and factories. Companies including Atmel, Freescale and ST Microelectronics are counted among early licensees.

More information about the new processor is available at ARM.

Next week at ARM TechCon, Cadence's Paddy Mamtora, Product Engineering Group Director, Digital and Signoff Business Unit, will join with ARM Principal Engineer Aditya Bedi Wednesday (Oct. 1) at 4 p.m. to talk about pushing the boundaries of embedded design with Cortex-M.


Related Stories:

-- Getting a Glimpse at the Future Early – Cadence & ARM at ARM TechCon 2014!

Ok, with one week to go, I'm getting excited about this year's ARM TechCon!


Synopsys ARM at ARM TechCon and our mutual customers (e.g.,AMD, HiSilicon, Samsung and STMicroelectronics) will be sharing technical content and successful examples of solutions for ARM-based design spanning soc_implementation, verification, prototyping andvirtual prototyping. All but one of these sessions require only a free EXPO badge (not a full conference badge), so you can just drop by for one or more as your time permits. We even have two free lunch sessions with excellent technical presentations.


NOTE: Please be sure to register first and use code ARMEXP100 to get a FREE expo pass.


Wednesday, October 1 – Mission City Ballroom 1

11:00 – 11:50am: Performance Analysis and Optimization of ARM® CorelinkNIC-400 based Systems Using Synopsys Platform Architect

12:00 – 12:50pm: Turbocharge Verification of your ARM-based Systems with Synopsys Hybrid Emulation

1:00-1:50pm: A Processor-Based Approach to Acceleration in Modern SoCs
Lunch will be provided for all attendees

2:00-2:50pm: Integrate Pre-Verified Synopsys IP Subsystems into an ARM-Based SoC in Minutes

3:00-3:50pm: Accelerating Development of Fujitsu Embedded Platform SoC using Synopsys Virtual Prototyping and Galaxy Implementation Solutions

4:00- 4:50pm Efficient hardening of ARM® Cortex®-A57/Cortex-A53 Processor Subsystems in FD-SOI Process Technology with Synopsys Galaxy Platform (presentation by STMicroelectronics)

Thursday, October 2 – Grand Ballroom A

10:30-11:20am: AMD Tapeout of a High-Performance ARM® Cortex-A57® Processor-Based Server SoC using Synopsys Galaxy Design Platform

11:30am -12:20pm: Addressing 16nm FinFET Challenges to Tapeout HiSilicon’s 50M+ Gate ARM® Cortex-A57® Processor-based SoC using Synopsys IC Compiler (learn about the first 16nm FinFET networking processor running up to 2.6 GHz)

12:30-1:20pm: Q&A Panel with AMD , HiSilicon STMicroelectronics Achieving Optimum Results on the Latest ARM® Cortex®-A Processor Family with Galaxy Platform
Lunch will be provided for all attendees

1:30 – 2:20pm: Innovation in Debug for ARM-based SoCs Driving Innovation Verification and HW-SW Bring-up

2:30 – 4:20pm: ARM-Samsung-Synopsys A Simple Formula for Success with Next-Generation Wearables to High- Performance SoCs

Friday, October 3 – Grand Ballroom H

1:30-2:20 pm: Performance Analysis and Verification of an ARM® based SoC Interconnect
**NOTE: This session requires a conference badge to attend

- See more info at: Synopsys at ARM Technology Conference 2014


As always, please check out our microsite www.synopsys.com/ARM for more information about Synopsys' optimized solutions for ARM-based design.


Oh, yeah, and please root for me in the ARM Step Challenge as I use my ARM-powered fitbit  to go up against my arch rivals, John Heinlein and Brian Fuller. After a tough fought competition at DAC 2014, I'm ready for the re-match!http://schedule.armtechcon.com/session/synopsys-addressing-16nm-finfet-challenges-to-tapeout-a-50m-arm-cortex-a57-processor-based-soc-using-synopsys-ic-compiler

Interesting article on Semiconductor Engineering where options other than increasing clock frequency are considered for improving performance and data throughput of advanced node devices:

Semiconductor Engineering .:. Making Chips Run Faster

Our approach is to provide the 'different IP' as suggested in the article. On the advanced nodes, performance optimisation schemes require conditions to be accurately monitored on-chip and within the core. We have the belief that PVT conditions should be monitored and sensed by small analog sensors such as accurate temperature sensors, voltage monitors (core and IO) and process monitors. Quite simply, the more accurate you sense conditions the more watts can be saved for both idle leakage and active states of a device. For example, our embedded temperature sensors have been developed to monitor to a high accuracy for this reason. Once you have the 'gauges' in place you can then play with the 'levers,' by implementing Dynamic Voltage and Frequency Scaling (DVFS) schemes or Energy Optimisation Controllers (EOCs) with are able to vary system clock frequencies, supplies and block powering schemes.

Again, we believe that these peripheral monitors are nowadays less 'nice to have' and becoming a more critical requirement. With that, these monitors must be reliable and testable in situ as failing sensors could have a dramatic effect to the system.

Another point is that we're seeing device architectures that cannot cope with each and every block being enabled. With increased gate densities on 28nm bulk and FinFET, hence greater power densities, hence greater thermal dissipation, we're seeing that devices cannot be fully power-up and at the same time, operate within reasonable power consumption limits.

All these problems of coping with PVT conditions on-chip and the increasing process variability on advanced nodes mean that the challenges, and opportunities of innovation, for implementing more accurate, better distributed embedded sensors and effective Energy Optimisation (EO) schemes are here to stay.

What did you think of Apple’s latest blockbuster product announcements this week, the iPhone 6 and the smart watch? Can’t wait to buy them?


I thought the announcement was fascinating but not because I’m running out to buy or pre-order the devices (heck, I've only had my iPhone 5 for six months). It was fascinating because it illuminates a fundamental shift in electronics system design. And, at its heart, the story is about the difference between mammals and insects (more on this in a moment).


Two paths

Smart phones and wearables represent two distinctly different ways to design systems. The smart phone architecture, generally speaking, descends from computer systems design: big, powerful, OS-centric, able to manage a multiplicity of tasks. An industry ecosystem has coalesced around these high-volume devices with a standard array of products and services: various processors, RF basebands, memory subsystems, sensor technologies, and so forth.


The world of wearables design is completely different. It's a subset of a broadly defined Internet of Things or Internet of Everything sector. The applications in this area are arguably almost infinite in number and wildly diverse. And as such, their technology requirements—their power, performance and area considerations—are just as varied. One size generally fits only one size, not fit all.


Mammals and Insects

Cadence IP Group CTO Chris Rowen likens it to mammals versus insects (see comparison chart nearby, from Current Results). In a conversation we had recently, he noted that the smart phone/tablet/PC/server world can be viewed as mammals: A relatively small number of species in the ecosystem functioning as generalists, he says. IoT applications, on the other hand, are more like insects: Too numerous to count and having key, highly specialized roles within a larger ecosystem.



This ecosystem requires a holistic approach to system design enablement from IP implementation and verification all the way to tape out. It requires an awareness of what Rowen calls cognitive layering. Oversimplified, this means matching the right processing, power, memory, and software attributes with the right tasks at the right time. We'll be writing more about this in the coming months.


shift left

This system-design perspective is one of the drivers behind the "shift left"trend (chart, left) that my colleague Frank Schirrmeister often writes about. Our industry has talked for many years about the need for hardware-software codesign to speed time to market, but today increasing system complexity and diversity requires it as well. Not understanding how your hardware design affects your application software (and vice versa) at the earliest stages of your design can be perilous.


I know nothing about how Apple or Samsung apportions its design teams and how those teams are traveling along these two distinct paths. But I suspect the teams are different, as are their design approaches to systems and SoCs.


Just consider the Apple watch. Apple's promotional materials laid out the challenge: "Massive constraints have a way of inspiring interesting, creative solutions.... No traditional computer architecture could fit within such a confined space." Apple engineers responded with the S1 SoC, an system-in-package device that includes processing and sensing.

You could consider the S1 (what little we know about it) to bridge the mammals-insect worlds, but just imagine the thinking that goes into far more specialized "insect" applications. It's going to be a fascinating future indeed.


Related Stories

- Sealing the Seams in System Design

- Q&A with Nimish Modi: Going Beyond Traditional EDA

The processors from ARM® get all of the attention.  After all, ARM partners have shipped over 50 billion ARM processors so far.  10 billion of those in 2013 alone.  With so many processors shipping, you would think that this would be reflected on Carbon's IP Exchange web portal and ARM's processors would be the most popular IP models created and downloaded.


In truth, it's pretty rare that any of ARM's processors top the list of the most popular models generated on Carbon's IP portal.  That title consistently goes to one of ARM's CoreLink interconnect offerings.  Month in and month out, one of the NIC-400, NIC-301 and PL301 interconnect models top the list.  The comparison is a bit unfair since theARM Carbon Webinar Logo typical architect will try out only a handful of different processor configurations but it's not at all uncommon for a single user to create dozens of various configurations for the system interconnect.  It does reflect though the importance that users place on having accurate models for the components in their system that have the greatest impact on overall performance.  (Not surprisingly, the next most commonly created type of component on our portal is a memory controller)


Carbon has blogged a few times about the importance of accurate models for the NIC-400 and NIC-301 for system tasks ranging from IP selection to accurate firmware bringup and debug.  We've discussed how accurate interconnect models enable you to avoid arbitration problems, detect system bottlenecks meet your price, performance and area targets. On September 11th at 1pm EDT (17:00 GMT) we'll be holding a webinar together with ARM to talk about the impact of NIC-400 configuration choices on the performance of the system. Well see how this performance can be analyzed and optimized not just using a few traffic generators and bare metal software but also when running a complete Linux operating system running commercial benchmarks.  Far too often, this type of performance optimization waits until emulation or FPGA prototyping when it's too late to really have much impact.  We'll demonstrate how you can use Swap & Play to get to your system booted quickly and then switch to 100% accurate models to run those important system benchmarks that drive performance decisions.  We'll also show how you can use our CPAKs to get up and running within minutes of download.  The demos will be done using a multi-cluster Cortex-A53 system but will obviously apply no matter what processor you're using.


The webinar will feature sections by William Orme, ARM's product manager for the NIC-400 as well as multiple demonstrations by Eric Sondhi, a corporate applications engineer here at Carbon.  Although the webinar will be available as a recording afterwards, I'd urge you to attend live if possible to ensure that your questions are answered.  You can of course, always get answers to any questions you have by clicking on the button below.


Sign up for the Pre-silicon Optimization of System Designs using the ARM® CoreLink™ NIC-400 Interconnect webinar.


Request More Information    Optimization of ARM Cortex-A15 and AMBA4 Designs using a Virtual Prototype    AXI Interconnect Optimization using a Virtual Prototype

An interesting article by Daniel Payne on SemiWiki that approaches the predicted finFET issues through simulation analysis:

SemiWiki - FinFET Design for Power, Noise and Reliability


Many of the points raised are of interest to us, as an advanced node development team, and our customers. Gate density (as is the intention on finFET!) is a significant contributor to thermal issues and IR drop issues. We believe that Moortec Semiconductor's approach accompanies the analysis from simulation. We provide embedded temperature sensors, voltage and process monitors, essentially 'lifting the lid' to on-chip PVT conditions for advanced node SoCs (Analog IP and Custom Mixed Signal ASIC IC Chip Design Services).


In a thermal context, gate density equates to power density and in turn, localised thermal issues. Only accurate core temperature sensors placed near to potential hot spots provide the system with sufficient feedback to implement a dynamic control scheme for clock speed or supply. Schemes such as DVFS are becoming the big application area for on-chip PVT monitors as you can then performance optimise on a per chip basis (we prefer to use the term 'Energy Optimisation' which leads to 'Energy Optimisation Controller' schemes being implemented within a system).


In terms of IR Drop, as the gate density increases and the impedance of metal tracking for supplies increases, together with reduced headroom due to supply reduction, we're seeing a greater problem for advanced nodes. Using on-chip core voltage supply monitors allow chip developers to see what the supply conditions are really like and how this compares to simulation results. In addition, when data outputs from these monitors is included in the architectural level of an SoC, the power supplies can be optimised for better performance, or power saving, as required. We can only see demand for such monitors increase as we move down the technology curve.

NIC-400 Webinar.png


The design decisions made early in the design cycle have a dramatic effect on all of the downstream processes. The ability to make correct decisions early and leverage these throughout the design process is why virtual prototyping plays an increasingly important role in leading SoCs. ARM and Carbon Design Systems  are jointly presenting a webinar on how virtual prototypes can be used to make critical design decisions on SoC designs containing ARM’s leading CoreLink NIC-400 interconnect.


This webinar will demonstrate multiple real world cases where Carbon Performance Analysis Kits using 100% accurate models of the CoreLink NIC-400 and Cortex-A53 together with pre-built software and advance analysis capabilities to deliver useful results within minutes of download. Case studies will include initial bring up and optimization of NIC-400 systems at the bare metal level and will also cover how the same approaches can be used to boot an OS in under a minute followed by 100% accurate system level performance analysis.



William Orme - Strategic Marketing Manager for System IP, Processor Division at ARM. William is responsible for the CoreLink NIC-400 and the next generation on-chip interconnect. At ARM since 1996 he has lead the introduction of many new products, including the ETM and subsequent CoreSight multi-core debug and trace products. Prior to joining ARM, William spent 12 years designing embedded systems from financial dealing rooms, through industrial automation to smartcard systems. William holds degrees in electronics and computer science as well as an MBA.


Bill Neifert - Chief technical officer and founder at Carbon Design Systems. Prior to founding Carbon, he worked in applications engineering management positions at Quickturn Systems and C Level Design. Bill started his career as a design engineer at Bull HN. He has a BS and MS in Computer Engineering from Boston University.


Eric Sondhi  - Corporate applications engineer at Carbon Design Systems. Before Carbon, Eric worked as a software engineer developing core software and device drivers for next generation storage systems. He has a BS in Computer Systems Engineering from the University of Massachusetts Amherst and an M.B.A. from Boston University.


>> Click here to register

The Internet of Things represents a vast opportunity for electronics designers if we can figure out one not-so-small problem.


We are rightly having myriad conversations here in the ARM Connected Community on the potential:

Will IoT Break M2M Silos for Start-up Apps?

IoT Success Depends upon Decoupling

Diversity drives IoT deployment

Balancing IoT's promise with privacy concerns

Freescale's Five S’s of IoT


(And this is just a fraction of what's been published here!)

It reflects our innate understanding that IoT's application potential is big, broad and pretty much unbounded. And yet we need to simplfy it, as Steve Nelson of Freescale has blogged about (Embedded Beat: The Five S’s of IoT | Freescale Community). (Thanks for the heads up, Lori Kate Smith!).


We need, as part of that simplification exercise, to tackle that not-so-small problem I alluded to: We need a natural user interface (Dennis Laudick has some great insight into interface design possibilities: From Keyboards to Touchscreens To....? The ‘Futuristic’ HMIs That Will Soon Be A Reality In Your Pocket.).


In earlier technology eras, we adapted to specific interfaces (computing keyboards, mobile touchscreens). But IoT will be so diverse that we need it to adapt to us. And that's where natural user interface design comes in.

My colleague Seow Yin Lim explains the implications, and she'll be writing a series about it. Natural user interface design should be a great industry conversation.

What are your thoughts?

As use of advanced node technologies (FinFET specifically) ramp, more designers are confronting challenges in technology, productivity, and time to market. To get a sense for what engineers need to know about advanced nodes, FinFETs, and parasitic extraction, Brian Fuller, editor-in-chief at Cadence, sat down with Hitendra Divecha, senior product marketing manager at Cadence, to understand more about where we are today.Hitendra%20Divecha-cropped.jpg


Q: Hitendra, let's start at a really high level: As we move down into leading-edge nodes—16/14nm FinFETs and beyond—what are the main challenges that designers face today?


A: Well, we can bucket these challenges into two main categories: Increasing complexity and modeling challenges. It’s not just tighter geometries and new design rules, which come with every new process node. We talked about the introduction of FinFET, but there is 3D-IC as well,the number of process corners is exploding, and specifically for FinFET devices, there is an explosion in the parasitics that couples capacitances and resistances. This increases the design complexity and sizes—the netlist is getting bigger and bigger, and as a result, there is an increase in extraction runtimes for SoC designs post-layout simulation and characterization runtimes for custom/analog designs.

Q: You mentioned modeling challenges and accuracy, what’s happening there?


A: Yes, so design complexity is one challenge, but there are various modeling challenges as well. For FinFET devices, for example, there is an introduction of local interconnects, there are second- and third-order manufacturing effects that also need to be modeled. So all of these new features have to be modeled with precise accuracy. Performance and turnaround times are one thing, but if you can’t provide accuracy for these devices, especially as it relates versus the foundry golden data, there is a burden on our customer that they have to over-margin their designs and leave performance on the table.


Q: Talk a little more about that. We talk about the enormous percentage of design time taken up by verification in general. How much has extraction, as a subset, grown as we get into these advanced nodes? Can we quantify that?


A: Well, from our customers' perspective, while their extraction and time-to-signoff times are increasing, their time to market is shrinking. It can take anywhere from six to eight weeks for designers to close the signoff loop and, as you know, extraction is a critical step in this loop. We have been told by our customers, that while the extraction run time varies based on the design sizes and types, that full flat extraction at these advanced nodes can take up to three days with their current extraction tools. This puts an enormous amount of pressure on our customers' ability to have design closure in a timely manner to meet their time-to-market pressures.


Q: OK, so extraction is a huge pain point for our customers…


A: Yes, so huge that we have solve our customers' problems and help them accelerate signoff extraction turnaround time. There’s no way around it. The market has lacked tools that deliver the performance required to produce a significant speed-up in both digital and transistor signoff extraction flows.


Q: We’re going to get to that in a second, but before we do, engineers coming around the corner and confronting advanced nodes may not have a sense for what they’re in for.


A: Absolutely. Let me put a finer point on that. Signoff extraction has become challenging due to a number of reasons.


First, the number of interconnect corners on both the digital and custom/analog sides has exploded partly due to the introduction of double-patterning technology (DPT), first introduced at 20nm and carried over to 16/14nm FinFETs.


Second, design sizes are increasing. At 20nm and below, there are more than 70 million net designs. With more corners and larger design sizes, extraction goes from taking a day to a few days to complete.


Q: We've talked about complexity challenges...let's move on to modeling challenges.


A: Are you ready for this? There are 155X more resistances for FinFET than 28nm devices. This growth means bigger netlists, which impact post-layout simulation performance and require faster simulation runtime. Tools need to model three different resistance types: contact resistance, spreading resistance, and extension resistance. And consider this—the thickness of the 3D gate introduces new capacitances. From FinFET to fringe capacitances, double patterning, and more, the modeling features have just grown more complex, and that stretches out extraction runtime.


Q: Parasitic extraction has been a big issue for some time. So what’s wrong with existing flows and tools?


A: In some cases nothing, especially for certain designs and older nodes. But as I’ve said, advanced nodes are a different ballgame. In most cases, different extraction engines are used in implementation and signoff, resulting in poorly correlated results that have a negative impact on design closure. Consistent extraction engines throughout the flow—meaning implementation and signoff—is a linchpin to our customers' time to signoff by reducing the number of ECO loops they have to go through.


Q: We touched a bit on productivity. So, at 16/14nm and FinFET technology, older extraction technologies can’t necessarily keep up with all the additional complexity you’ve alluded to, correct?


A: Yes, parasitic extraction is a means to an end in both digital- and transistor-level extraction flows. However, it is a very BIG means to an end. We listened to our customers' time-to-market challenges, and we’ve brought the  massively parallel architecture to bear on the problem. The Cadence Quantus™ QRC Extraction Solution, which we just announced, offers up to 5X better turnaround time for both single- and multi-corner extraction versus traditional extraction tools in the market today, provides scalability to 100s of CPUs and machines, and delivers best-in-class accuracy for FinFET designs measured against foundry golden. Also, with the Quantus QRC solution, we continue to provide leading functionality for custom/analog designs, including very cool functionality to address automotive application designs and our new random-walk field solver, Quantus FS.


For an example of a customer design like I talked about earlier, we can reduce their extraction runtimes in 10 hours or less instead of three days without compromising on accuracy. In summary, with the combination of performance, accuracy, and our tight integration with our implementation tools, the Encounter® Design Implementation System and the Virtuoso® platform, the Quantus QRC solution delivers the fastest path to signoff.


Q: You’re a busy man, so thanks for your time, Hitendra!


A: No problem!

Brian Fuller

Related stories

ARM has released DS-5 version 5.19 including the Ultimate Edition for ARMv8 to compile and debug 64-bit software. The Carbon Performance Analysis Kits (CPAKs) for the ARM Cortex-A57 and Cortex-A53 demonstrate 64-bit bare metal software examples which can be modified and compiled with DS-5. The software in the currently available CPAKs is compiled with ARM Compiler 5, better known as armcc, and not yet configured for ARM Compiler 6, also known as armclang. Fortunately, only a few changes are needed to move from armcc to armclang.


Today, I will provide some tips for using ARM Compiler 6 for those who would like to use the latest compiler from ARM with CPAK example software. In the future, all CPAKs will be updated for ARM Compiler 6, but now is a good time to give it a try and learn about the compiler.


ARM Compiler 6 is based on Clang and the LLVM Compiler Framework, and provides best in class code generation for the ARM Architecture. There are various articles covering the details, but the key takeaway is that ARM Compiler 6 is based on open source which has a flexible license and allows commercial products to be created without making the source code available.

Migration Guide


A good place to understand the differences between armcc and armclang is the ARM Compiler Migration Guide. It explains the command line differences between the two compilers and how to map switches from the old compiler to the new compiler. The migration guide also covers two additional tools provided to aid in switching compilers:

  • Source Compatibility Checker
  • Command-line Translation Wrapper


The compatibility checker helps find issues in the source code that is being migrated, while the translation wrapper provides an automatic way to call armcc as before, but invisibly calls armclang with the equivalent options. I didn’t spend too much time with either tool, but they are worth checking out.

The key point is that migration will involve new compiler invocation and switches, but it may also involve source code changes for things such as pragmas and attributes that are different between the compilers.



Let’s look at the practical steps to use ARM Compiler 6 on a Cortex-A53 CPAK software example. For this exercise I selected the DecrDataMP_v8/ example in the Applications/ directory of the CPAK. The system is a dual-cluster A53 where each cluster has 1 core. It also includes the CCI-400 to demonstrate cache coherency between clusters and the NIC-400 for connecting peripherals. The block diagram is shown below.

Cortex-A53 System

Setting up DS-5 is very easy, I use Linux and bash so I just add the bin/ directory of DS-5 to my PATH environment variable. Adjust the path to match your installation.


$ export PATH=$PATH:/o/tools/linux/ARM/DS5_5.19/64bit/bin


Only the 64-bit version of DS-5 includes ARM Compiler 6, it’s not included in the 32-bit version of DS-5 so make sure you install the 64-bit version and run on a 64-bit Linux machine.


The first step to using ARM Compiler 6 is to edit the Makefile and replace armcc with armclang to compile the C files. Any assembly files can continue to be compiled by armasm and linking done with armlink remains mostly the same. It is possible to compile assembly files and link with armclang, but for this case I decided to leave the flow as is to learn the basics of making the compiler transition.


The Makefile specifies the compiler as the CC variable so make it CC=armclang


The next important change is the specification of the target CPU. With armcc the --cpu option is used. You will see --cpu=8-A.64.no_neon in the Makefile. One tip is to use the command below to get a list of possible targets.


$ armcc --cpu list


With armclang the target CPU selection is done using the -target option. To select AArch64 use -target aarch64-arm-none-eabi in place of the --cpu option.


The invocation command and the target CPU selection are the main differences needed to switch from armcc to armclang.

Other Switches


This particular CPAK software is using –-c90 to specify the version of the C standard to use. For armclang the equivalent option is –xc –std=c90 so make this change in the Makefile also.


The next issue is the use of –-dwarf3 option. This is not supported by armclang and it seems like DWARF4 is the only option with armclang.


The Makefile also uses –Ospace as an option to shrink the program size at the possible expense of runtime speed. For armclang this should be changed to –Os.


The last difference relates to armlink. The armlink commands need --force_scanlib to tell armlink to include the ARM libraries. From the documentation, this option is mandatory when running armlink directly. Add this flag to the armlink commands and the compilation will complete successfully and generate .axf files!


Here is a table summarizing the differences.


ARM Compiler 5

ARM Compiler 6

Invoke using armcc

Invoke using armclang


-target aarch64-arm-none-eabi








I encountered one other quirk when migrating this example to ARM Compiler 6, a compilation error caused by using .h file in the source file retarget.c


  #include <rt_misc.h>


For now I just commented out this line and the application compiled and ran fine. It’s probably something to look into on a rainy day.

Creating an eclipse Project for DS-5


It wouldn’t be DS-5 if we didn’t use the eclipse environment to compile the example. It’s very easy to do so I’ll include a quick tutorial for those who haven’t used it before. Since a Makefile already exists for the software I used a new Makefile project.


First, launch eclipse using


$ eclipse &


Once eclipse is started, use the menu File -> New -> Makefile Project with Existing Code


Pick a name for the a project and fill it into the dialog box, browse to the location of the code, and select ARM Compiler6 as the Toolchain for indexer settings.


DS5 armclang


There are many ways to get the build to start, but once the project is setup I use the Project menu item called Build Project and the code will be compiled.


There is a lot more to explore with DS-5, but this is enough information to get going in the right direction.

ARM Techcon


Now is a great time to start making plans to attend ARM TechCon, October 1-3 at the Santa Clara Convention Center. The schedule has just been published and registration is open. I will present Performance Optimization for an ARM Cortex-A53 System using Software Workloads and Cycle Accurate Models on Friday afternoon.


Jason Andrews

Filter Blog

By date:
By tag: