1 2 3 Previous Next

ARM Processors

225 posts

Over the past two decades, we have a wide range of innovation in the devices we use to access the network, the applications and services we depend on to run our digital lives, and the computing and storage solutions we rely on to hold all that “big data” for us. However, the underlying network that connects all of these things has remained virtually unchanged. The reality is the demands of the exploding number of people and devices using the network are stretching its limits. The networking world is undergoing a significant shift, moving from embedded systems to more open, virtualized systems based on common, standardized software stacks. Software-defined networking (SDN), network functions virtualization (NFV) and network virtualization (NV) offer new ways of designing, building and operating networks.

NetworkOperatorsRequireDS.pngThe introduction of SDN is also being promoted by the growing relevance of the Internet-of-Things (IoT), which will dramatically increase the number of network endpoints, while adding a huge amount of data that needs to be transferred in a secure manner. NFV further simplifies the deployment and management of such large number of endpoints, helping providers to cope with the increased needs of performance whilst reducing costs. NFV also brings more agile, serviceable networks to fruition by virtualizing entire parts of the network.  It is a multi-year evolution that is truly reshaping the industry.

Earlier this month, at ARM TechCon 2014, the Linux Foundation made an announcement about the formation of a new project named Open Platform for NFV. OPNFV will be a carrier-grade, integrated, open source reference platform intended to accelerate the introduction of Network Functions Virtualization platforms and architecture. ARM is a founding member of the OPNFV group. ARM has also been at the center of NFV definition activity through our work with standards bodies and with partners. The OPNFV project is the next step in advancing NFV with a common software base: bringing together multiple open source software blocks, integrating, testing, optimizing and also filling in gaps.  OPNFV is expected to increase performance and power efficiency; improve reliability, availability and serviceability; and deliver comprehensive platform instrumentation.


If we look now at the hardware platform, specifically an SoC we can see there are four classes of compute required in infrastructure:

•             Control plane processing which requires high compute performance

•             MAC scheduling which requires low latency

•             Data plane processing which benefits from high efficiency small cores

•             Specialized processing including accelerators and DSPs


HeterogeneousComputeRequirementsDS.png
These are just a few examples of the types of challenges we will discuss in more detail at the upcoming Linley Networking conference on the 22nd of October, 2014. Ian Forsyth, Director of Infrastructure Product Marketing at ARM will take to the stage to highlight the role that ARM CoreLink™ Cache Coherent Network products play, offering high bandwidth, low latency connectivity and highly integrated solutions for the new deployments of NFV and SDN. The key here is the scalability offered by these interconnect solutions that help address intelligence at the network edge. His talk will further detail how the power-efficient yet high performance ARM architecture delivers benefits for networking infrastructure and high core-count solutions in servers.


Look out for further ARM blogs and news around the event.


Further information:


It was the philosopher George Santayana who first proclaimed “Those who do not heed history are doomed to repeat it” over a hundred years ago, and it remains highly relevant today to those who are tackling the issue of developing faster, smaller, cheaper SoCs. I recently sat down with Norman Walsh to discuss IP integration. As well as having 21 years in the IC design industry Norman is a keen historian and so I figured he would be a fountain of knowledge when it came to piecing together how we have gotten to this point in time. As it turned out, he proved to give a very interesting overview of the challenges facing IP integration, and how it all got to this point. I’m shortly due to sit down with David Murray and talk about the future possibilities and probabilities in the industry, so be sure to check that out as well. If you have any questions you would like David to answer, please PM me or ask in the comments below.

 

 

Hi Norman, can you give me a brief overview of how IP integration has gotten to where it is right now?


Well to give a proper explanation we must bring it back to Moore’s Law. I’m sure everybody knows this already, but Gordon Moore was an engineer working for Fairchild Semiconductor who stated back in 1965 that the number of transistors in an integrated circuit will double approximately every 2 years. It’s a cliché but it’s testament to its accuracy that only in the last few years has there been a shift away from Moore’s Law. Now this was obviously going to grow exponentially, from the very small systems that were around back then until very quickly it grew into millions of transistors and nowadays we’re looking at a system that can have anything from 7 to 10 billion transistors. The backstory to Moore’s Law was that he was highlighting this phenomenon as a challenge for the semiconductor industry to keep up with the growth. The way the IC design industry kept up with this growth was through periodic shifts in the level of abstraction in the designs. At that time in the 60’s and 70’s a chip was first sketched out on paper, and each transistor was designed individually. The people within the companies doing this at the time soon realised that this was not a reasonable way to continue, so they developed CAD (computer aided design) tools in-house to automate the process of putting these transistors together, and to give them a schematic for putting them together. In the early 1980’s some of these design teams went off on their own and formed what is now a large and vibrant EDA industry. As well as developing the tooling to do this, the level of abstraction had to improve as well. In the late 70’s and early 80’s the industry went from a design that focused on individual transistors to what was called a standard cell-based design. This was a way of defining commonly-used features on an IC like an inverter or an AND gate so that they could be called out instead of constantly having to describe these functions.

 


hp-daily.gif(Source) how chip designers felt with the introduction of AND and OR gates



So that was the first level of abstraction then? The introduction of AND gates?

 

Yeah. It doesn’t sound too complicated now but at the time it was pretty revolutionary stuff. So they created these libraries of logic gates that gave designers the ability to keep pace with the growth in transistors. This kept everything on an even keel up until the late 80’s early 90’s when it became really difficult again as we’re talking about hundreds of thousands of transistors on a chip, and even with the logic gates people were struggling to keep up. This is when a hardware description language came in to raise the abstraction another level to what we now know as RTL. This time it modelled the flow of digital signals between hardware registers and the logical operations performed on those signals. This was done in conjunction with the emergence of two hardware description languages, Verilog and VHDL (which was actually developed by the US Department of Defense back in the 1980’s). Using the RTL along with the synthesis tools developed by some of the larger EDA companies gave some breathing space back to system designers and allowed them to keep ahead of the Moore’s Law curve. Predictably, within 15 years the problem was back on the table and had reached critical levels by the time IP-XACT came along. IP-XACT was developed by the now defunct SPIRIT Consortium around 5 or 6 years ago, and raised the level of abstraction up to where designers were now describing interfaces between individual blocks of IP. Over the last number of years it has taken a while to be adopted, just as the other abstractions did, but in the last 2 years it has definitely become the standard across the industry. The adoption of IP-XACT contributed to the rise of commercial IP as more and more teams have been incorporating 3rd party IP or reusing old IP in their designs, because they now have a way to standardise the interfaces and make the integration quicker and easier. A number of people don’t see this growth of commercial IP as a shift in abstraction, but I believe it is. It essentially allows SoC designers to leave behind a lot of the repetitive tasks in chip design that do not make any overall difference, and allows them to focus on real design decisions that can differentiate their SoC. But if you really want take advantage of all of this IP, there are still some nuts to turn and things that are necessary to make them work together really well. That’s the real key point, as the end user cares more about the performance of the SoC (and the new device) than the performance specs of each individual piece of IP.


Slider-Electronic-integration-600x225.jpg(SourceSuccessful SoC design requires the integration of various components

 

 

It sounds like the industry has changed a lot over the last five to ten years. What does the landscape look like now?


Newer companies don’t have the collateral or library of IP to make it all themselves. If a company started out now it would take decades of man hours to come up with the IP necessary to build a chip in-house. So in that sense the growth of commercial IP has reduced the barriers of entry for this industry, and we are seeing new companies come into the market and become competitive by focusing on a particular niche. However as with all of the levels of abstraction and solutions that I mentioned earlier, it has presented its own set of problems to be solved.

There is always a big question over the quality of 3rd party IP. In the past there were doubts, about whether it had gone to silicon? It created a bit of a Catch-22 situation as nobody would trust IP that hadn’t gone to silicon, and some of the newer designs struggled to get off the ground because of that. Once it gets to a silicon-proven stage then people are more likely to use commercial IP, as they know that there is less of a risk involved with it.

The interoperability between IP blocks is always an issue. It’s becoming less so over the years as a lot of people have moved to the AMBA protocols like APB or AXI, but it still takes a bit of time for companies to move away from something they’ve developed internally. It also took a while for standard interfaces to be recognized and become adopted by the majority of the industry. Five years ago, a lot of IP was internal rather than commercially available. Interoperability was a problem. You had this thing that you grew internally that didn’t connect to anything very easily. Then you had this other IP with a standard interface and you couldn’t connect them. Nowadays people still build internal IP, but they build them in a way that they can be connected easily. The ARM AMBA specifications help because they are a standard, but it’s more of a catalog of things you could do with the interface. So there is still a lot of tweaking. You can see a trend, with the chips becoming more complex and the levels of abstraction being raised. To be honest we are probably overdue for another level of abstraction at this stage.

 

patch.jpg (Source) Simplified version of the interoperability issue

 

 

And what do you think that level of abstraction will look like?


You have to think outside the box on this one. There’s no magic bullet here, ESL has been talked about for a while but I’m not so sure, it just describes the chip in a different way. The way I see it is through IP-XACT and standardisation, we need to standardise formats. A shift in abstraction is all about improving productivity and making sure that formats are correct across different teams or different companies would absolutely make a difference. One of the results of commercial IP and IP reuse is that there we have seen a growth in subsystems as more and more parts become standardized. To a certain extent this makes the integration side of things easier to manage as there are fewer custom interfaces to deal with. It is essentially delivering an entire system within an IP block; the processor clusters, the interconnects and everything. There needs to be a more standardised methodology. There’s been a lot of talk about back-end methodology, and stitching things together in your block diagram. One of the challenges that people are going through right now is that the design on the front end is taking weeks and weeks to get right, and then it is being rushed through the back-end in a matter of days. If we could find a way to manage the front-end of design in a better way, that would cut down the design cycle considerably.


Technology-Wallpaper-9-1024x1024.jpg

(Source) The next level of abstraction?

 

Sounds like you have plenty on your plate going forward. Finally, how important is it to have internal IP that can interact with ‘foreign’ IP?


The use of standard and intelligent formats is very important for the continued use of 3rd party IP. It’s one thing to have an IP-XACT language, but we need to have a consistent way of describing IPs using IP-XACT. This way IPs will come together a lot quicker and it needs to happen going forward. It’s something we’re working on in a big way at the moment, creating standardised formats for IPs to interact with each other. This needs to become standard across the industry but it needs to happen internally first. It needs to happen with the interfaces for all IPs to be 100% compatible and make it simpler for people who are not experts in this to input ARM IP or any other 3rd party IP. I think that once this happens we will see far more people open to the idea of truly vendor-neutral IP, and design times could be reduced dramatically.

 

I certainly found his answers quite helpful and enlightening to uncover some of the history within the IC design industry. If you have any questions for Norman, please enter them in the comments below and we'll get them answered as soon as possible.

S2C_Virtex7_diagram.jpg


The need for ever-connected devices is skyrocketing. As I fiddle with my myriad of electronic devices that seem to power my life, I usually end up wishing that all of them could be interconnected and controlled through the internet. The truth is, only a handful of my devices are able to fulfill that wish, but the need is there and developers are increasingly recognizing that we are moving to a connected life. The pressure to create such a connected universe is so immense that designers need a faster, more reliable way to fulfill our insatiable need.

 

One way to fulfill the need is for designers to adopt FPGA-based prototyping. This proven technique allows designers to explore their designs earlier and faster and thus proceed more quickly with hardware optimization and software refinement. In addition, recent capacity developments in prototyping have made it possible to realize the benefits of FPGA-based prototyping for even the largest designs. It has to be said that ARM and Xilinx have been at the forefront of enabling today’s embedded designs. It is critical that prototyping technology keep pace with the advancements from ARM and Xilinx.

 

S2C Inc. recently announced the availability of its AXI-4 Prototype Ready™ Quick Start Kit based on the Xilinx Zynq® device. The Quick Start Kit is the latest addition to S2C’s library of Prototype Ready IP and is uniquely suited to next-generation designs including the burgeoning Internet of Things (IoT).

 

The Quick Start Kit adapts a Xilinx Zynq ZC702 Evaluation Board to an S2C TAI Logic Module. The evaluation board supplies a Zynq device containing an ARM dual-core Cortex-A9 CPU and a programmable logic capacity of 1.3M gates. The Quick Start Kit expands this capacity by extending the AXI-4 bus onboard the Zynq chip to external FPGAs on the TAI Logic Module prototyping system. This allows designers to quickly leverage a production-proven, AXI-connected prototyping platform with a large, scalable logic capacity – all supported by a suite of prototyping tools.

 

Integrating Xilinx’s Zynq All Programmable SoC device with S2C’s Virtex-based prototyping system provides designers an instant avenue to large-gate count prototypes centered around ARM's Cortex-A9 processor.

 

To learn more about how S2C’s FPGA-based prototyping solutions are enabling the next generation of embedded devices, visit Rapid FPGA-based SoC & ASIC Prototyping - S2C.

Fuels – A fragmented Regulation of the World

 

There are currently a lot of different motor fuels used in different car markets [1] – worldwide (WW).  This is either simply influenced by the driver’s impression of certain fuels quality or demanded by legislation for various reasons. For example, modern lightweight turbocharged diesel-engines in light passenger cars are still associated with rattling-smoky trucks in the US, not taking their superb fuel efficiency into account. Therefore, it results in low customer demands and very often gas stations do not provide diesel fuel at all in the US. Brazil politically prohibits any kind of diesel engines in non-commercial cars to protect its large scale sugar cane farming which provides the source for industrial ethanol-alcohol (E85) production used as fuel mix. On the contrary, Europe currently has a diesel-car population of about two third of all light passenger cars. This is mainly driven by efficiency requirements of 95 grams CO2 per kilometer for a manufacturer’s fleet portfolio in 2020 and enforced by harsh monetary penalties for no compliance. Japan is a paradise for Hybrid Electrical Cars (HEV) driven in high density urban environments. Furthermore, BRIC-countries are still in a very strong growth phase on cars led by Asia to be by far the best.

 

new_registrations_and_light_commercial_vehicles_English.jpg

(Source: R.L-Polk)

 

All in all, there are different motivations for using a certain fuel type in classical powertrain engines. The automotive industry increasingly faces those topics in the past years as their products are usually shipping more and more to different countries from one production source. This results in a drastically increase of different algorithms and data parameter sets inside a car’s powertrain ECU to cover each local requirements. Therefore, the amount of e.g. Flash-memory space has almost been increased four to eight fold in the past decade, ending up in requirements of 8-32 Megabytes for future ECUs.

 

 

Integration is the Key to any Future

 

Hybrid Electrical Vehicles (HEV) and Electric Vehicles (EV) add further powertrain requirements for ECUs on top of this. For a HEV – a powertrain source (also called “powertrain domain”) of two different physical engines has to be controlled at the same time which leads to a combination of many ECUs or a highly integrated single-ECU processing unit of a superb performance level in a car. Command and control (C&C) of an HEV’s electrical motor part itself is not a sophisticated secret as this technology is used in the past century in almost all different kind of applications and flavors.

 

The main challenge is the integration of software coming from different sources or even from different vendors, each one being a specialist in his power train sector and wanting to protect its intellectual property (IP). Therefore, this has huge impact on complexity, integration, validation, test cycles, project schedule and qualification for powertrain ECU software on vendor site. Very often this leads to an increase of up to ~50% of software overhead and a related waste of CPU performance. Questions arise like e.g.: How to ensure that both C&C domain software do not interfere? What happens in error cases? What is the “safe state” of each one? How to maintain data consistency between both sectors without extensive software overhead? Is it still real-time?


ARM is currently developing CPU cores for its Cortex-R series product line based on ARMv8-R architecture [2][3] which addresses those questions at the CPU core hardware level for the future.

 

nucleus.jpg

 

(Source: ARM Ltd)

 

Cortex-R based products shall provide help on hardware layers to reduce the related software overhead coming from questions as mentioned previously. The key words are multicore, lockstep, virtualization, hypervisor support, a revised MPU/MMU structure and specific new CPU core instructions like e.g. well known CRC32 support for an upcoming time-triggered Ethernet backbone in car networks.

 

 

The Big Bang Theory

 

Many of today's existing proprietary CPU cores present in the powertrain area, lack a future outlook into the next decade due to their more-or-less exotic use case as compared to global market penetration combined by a declining tools support and eco-system seen from a vendor perspective. Additionally, there is a missing link to globally used architectures like ARM architecture being used in almost all other car domains (e.g. Advance Driver Assistance Systems (ADAS) and Multimedia) and consumer markets.


Finally, code generation tools for algorithms will abstract the use of specifically trimmed architectures which opens the door for market leading architectures in combination with integrated and data consistent GPGPU processing like ARM Mali for localized highest performance tasks.

 

Soon - silicon wafer manufacturing companies will put aside proprietary cores due to specific Automotive requirements combined with their lower volumes and outdated silicon technology requirements which drives cost. For example, today’s topmost 12 silicon wafer manufacturing companies for e.g. 300mm wafer own about ~90% of the WW market [5]. Most of them seem to not be very “well known” for their Automotive powertrain interest and engagement on proprietary CPUs.

 

sips.jpg

 

ARM is already well prepared for the future requirements and challenges of powertrain domains with its Cortex-R processors and its ARMv8-R architecture.

 

[1] http://www.bosch-presse.de/presseforum/details.htm?txtID=6355&tk_id=108&locale=en

[2] http://www.arm.com/files/pdf/Driving_with_ARMv8-R_architecture.pdf

[3] http://www.arm.com/files/pdf/ARMv8R__Architecture_Oct13.pdf

[4] http://en.wikipedia.org/wiki/Cyclic_redundancy_check

[5] http://www.icinsights.com/news/bulletins/Five-IC-Suppliers-To-Hold-OneThird-Of-300mm-Wafer-Capacity-In-2013/

Eoin McCann

What is IP anyway?

Posted by Eoin McCann Oct 9, 2014

Let me be the first one to put my hand up and say it: it took me a while to wrap my head around IP. I am a relative newcomer to this industry, having studied business in university instead of engineering, and trying to stay afloat amidst the constant rush of TLAs (Three Letter Abbreviations) and other engineering parlance. It seems like every day another abbreviation comes up and off I disappear down the rabbit hole of Wikipedia trying to nail down a definition of what I’m supposed to be understanding. An early example of these is that old chestnut, IP, which has inspired me to write this blog.

 

               It turns out IP is one of those things that you need to understand when working at ARM, as I discovered early on when people were talking about “connecting IP blocks together”. Being a diligent employee and wanting to learn, I went to find out what this IP could be. A quick search on www.acronymfinder.com revealed over 100 terms that could stand for IP, including ‘Inhalable Pump’ and ‘Irish Pennant’. I highly doubted that ARM was in the business of connecting Inhalable Pumps together, so I began to search elsewhere.

 

An exhaustive initial search (read: Googling “what is ip?”) showed slightly discouraging results: my IP address. Or to be more precise, 216.140.95.20. But this doesn’t make any sense? Surely ARM isn’t in the business of connecting up blocks of numbers?! Not wanting to be felled by the first hurdle I persevered, and a look at the 2nd Google search result revealed that this IP that I was looking at was actually an Internet Protocol address. This sounded vaguely familiar, as I could remember back to my teenage years when I was denied access to the video streaming website Hulu due to my IP address coming from the wrong territory! I was fascinated to discover that every device in a network that uses the Internet to communicate (be that a computer, printer, TV or thermostat) has a numerical label designed both to identify it and display its address on the network. I was also interested to discover that the original creators of these IP addresses had not surprisingly underestimated the way the internet would expand, and that they were running out of them. That was until someone had the clever idea of adding an extra couple of numbers on the end to solve that problem forever (until the IoT becomes far more popular than we imagine). While all of this information was highly interesting it still did not seem relevant to the IP that was being mentioned in the office, because to the best of my knowledge you can’t connect Internet Protocol addresses together. Thus it seemed that an IP address would leave me disappointed, just like it did in the past with Hulu. The search continued into uncharted territory, the 4th Google search result.

 

               I definitely seemed to be on the right track this time as I started reading about Intellectual Property being ‘creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce’. This made a lot more sense and would explain why there are so many framed patents on the walls in ARM. Going back to the original question of “connecting IP”, an image sprung into my mind of a man sticking some patents together with a stick of glue. It did explain why people are so secretive about the distribution of information, as the implications of intellectual property being distributed to the wrong people could be enormous. All was going well until I suffered another setback when the term “intangible assets” popped up. Now I’m clearly no expert on this subject but in any man’s dictionary intangible assets do not exist physically, and therefore can’t be connected together. Therefore the search had to continue, but I felt that I was getting somewhere at least.

 

               In the pursuit of journalistic integrity and solid answers, I had to expand my search outside of the realm of reasonable expectations. This required steely grit, determination and (*gulp*) going past the first page of Google search results. This truly represented sailing into uncharted waters and by the time I found what I was looking for (page 7), I had seen things that some had thought to be lost forever.

 

second.png

(Source)

 

Here I found a link to the Xilinx webpage, saying: “Intellectual Property (IP) are key building blocks of Xilinx Targeted Design Platforms”. At last, I was onto something! A quick browse through the website and beyond led me to the definition of an IP block as “a reusable unit of logic, cell, or chip layout design that is the intellectual property of one party. IP cores may be licensed to another party or can be owned and used by a single party alone. The term is derived from the licensing of the patent that exists in the design”. So it seems that each individual IP block is physically manufactured based on the intellectual property designs protected by patents. These blocks perform specific functions, such as CPU, interconnect, memory controller and so on, and must be stitched together in order to be worth more than the sum of their parts and work effectively as part of a larger SoC. As I read more and more about the different IP blocks that are out there and their specifications, the more technical and complex the descriptions became, until I eventually decided enough was enough. In my head, IP blocks are like Lego blocks. One on its own is pretty useless, however when combined together by a good designer they can be transformed into something much more functional. Simple, really.

 

Blue-legos.jpg

How I imagine ARM's library of IP (Source)

Last week I attended ARM TechCon in Santa Clara and one of the topics covered sparked my interest for this blog – wearables.

 

Did you know that Wearable Technology represents the fastest growing device segment currently expected to grow by 400+% from now until 2017. What makes wearables interesting for me is the incredibly broad range of applications from professional safety equipment, through sports, health and medical applications through to keeping track of your pets and kids. Last year Atmel wrote a blog about how wearable technology is nothing new, but that it has become a lot more fashionable than the pocket protectors of the 1980's!


The good news is that ARM is already in most wearables in operation today and these represent a very broad range of hardware configurations from RTOS to richOS, from no display to full 3D graphics, with video capture and playback and security. I believe that ARM has the products to suit most if not every requirement with every potential to keep on doing so.


Why am I so sure? Many early high-end wearable designs are based on mobile phone designs with Cortex-A and Mali and are in active use today, albeit with a bit of re-shaping to enable fitting into a tighter PPA budget. Recent research has shown that users reach for their mobile phone over 150 times a day so it’s no wonder why we’re seeing more and more watch based wearables being introduced to make checking e-mail and messages at a glance simply that.

 

One of the other major use cases for wearable technology is health and fitness tracking. In fact, a recent ‘Forrester Research survey conducted early this year determined that 6% of US adults already wear a gadget to track performance in a sport ARM continued its tradition of the Wearable Fitness Step Challenge, tracking the steps made by exhibitors over the course of the few days. As always it drew a lot of attention and some friendly competitive spirit between participants, with Alec Bath from STMicroelectronics walking away with the grand prize after some stiff competition from Qualcomm and Xilinx!

 

The System IP glue for this type of wearable features interconnects such as NIC-400 which with its high level of configurability and scalability, GIC-400 for interrupt management and TZC-400 and DMC-400 or similar for secure management of external memory.

 

On the subject of PPA, as devices scale so do their power requirements and hence their required recharge intervals. We’re all used to charging our high-end mobile phones once a day, and there is an unwritten expectation that a watch form factor device using say 10x less energy should only need charging say once a week. Health and fitness monitors at the simpler end of the scale with say half the energy usage again as our watch have an expectation of only needing a charge once a month.

 

Compounded by time to market pressure and the broad market requirements, it goes without saying then that ARM partners designing high end wearables SoC designs would greatly benefit from automated design environments to reduce their development times from months to weeks if not days. Such environment would bring together CPU, GPU and System IP including Debug and Trace capabilities in the most efficient way that still leverages existing tools and software.


The second bit of good news is that ARM is also investing in such technology in the form of its Socrates™ Design Environment which enables rapid right first time IP integration of the front end design with more to be announced in the coming months. This complements existing ARM Fast Models for earlier software development, but that’s another blog for another day.

SocratesDemo.JPG

William Orme at TechCon this week putting Socrates DE through its paces in showing how rapidly a

CoreSight Debug and trace environment can be configured and integrated into a larger collection of IP

including CPU and interconnect.


Okay so I touched on high end wearables, but what about the basic variety that also needs to play in the IoT space? There’s not only the TTM pressure here but also a need for a common foundation of provisioning support to network into the cloud. I’ll leave you with the ‘ARM mbed IoT Device Platform’, probably one of the busiest corners of the TechCon Expo floor this week– and not just because of the free beer after 16:00 or coffee being dispensed by the now infamous IoT enabled Nespresso machine:

 

BusyMbed.JPG

 

 

I for one am very excited about what the future has in store for wearables, devices that are capable of being made right now with existing ARM Cortex, Mali, CoreLink and CoreSight technology, and are only bounded by the imagination of the ARM partnership or the wider maker community.

 

Let me know what you think about the upcoming future for wearables in the comments section below.

 

 

Andy

Michael Williams

Critical interrupts

Posted by Michael Williams Oct 3, 2014

In software there are often cases where you need to have critical interrupts serviced. For example, for:

  • Code profiling
  • Kernel debugging
  • Watchdog handling
  • Error handling.

 

With the ARMv7-M architecture this can be achieved using nested interrupt handlers, but it is harder on A-profile processors.

 

I have recently published a white paper that addresses some of the issues with handling critical interrupts on the ARMv8-A architecture, and proposes some means for software to use the architecture to improve critical interrupt handling. If you are involved in writing operating systems or code that handles interrupt handling software, I'd appreciate any thoughts and feedback. Please do let me know if it is useful!

 

The paper can be found here: Critical Interrupt Prioritization

 

Thanks, Mike.

Since we launched Cortex-A17 and Cortex-A12 processors, we’ve seen wide adoption with customers planning to use these cores in a range of phones, tablets and other consumer devices. These ARMv7-A cores offer leading 32-bit performance and efficiency for mobile and consumer markets, and we are very excited about current and future products that will be shipping in the coming months.


Originally, the Cortex-A12 and Cortex-A17 processors were launched as two distinct parts, as they delivered different feature sets and performance points.  Since then, due to the consistent demand for more performance and efficiency from the market, alongside ARM’s focus on continuously improving the products, has led us to introduce the performance improvements added by the Cortex-A17 processor into the latest maintenance release of the Cortex-A12. Now both processors deliver equivalent performance levels.


The Cortex-A17 processor also added big.LITTLE support. Whilst many partners are using this to provide heterogeneous Cortex-A17 and Cortex-A7 processor solutions, some are also delivering standalone Cortex-A17 processor solutions,  which are now almost indistinguishable from Cortex-A12 only solutions.  For this reason, and to ease support for our customers and their partners, we have taken the decision to use the “Cortex-A17” name to refer to both processors in the future. We have therefore removed references to the Cortex-A12 processor from the ARM.com web-site and in the future, all engineering documentation and support available from the ARM Infocenter will be provided under the Cortex-A17 processor identity.


Faster processing and more efficiency in the mobile power envelope are two key properties of the Cortex-A17 processor. It delivers a 60% single-threaded performance uplift over the popular Cortex-A9 processor at increased efficiency. Additional Floating Point Unit (FPU) and NEON performance allows workloads to be calculated 50% faster compared to previous devices resulting in significantly improved user experience:

Blog Cortex-A17 performance.png

The Cortex-A17 processor builds the ideal platform for any demanding consumer SoC requiring great performance for best-in-class energy efficiency. For more information about the ARM Cortex-A17 processor and its companion IP, please visit our website.

Are you a designer who is too busy to attend ARM TechCon in Santa Clara later this week? Then think again, you might well save a lot more time than the day spent to attend. You’ll get the chance to learn about and see the demo of the latest and truly greatest tools for automating IP design. We are previewing a new Socrates design environment recently acquired with ARM’s purchase of Duolog Technologies on theARM booth 300.

 

What I want to highlight in this blog is how ARM has used the versatile Socrates platform to create a tool that has the effect of combining years of engineering experience in an easy-to-use tool. That’s literally decades of experience encapsulated in hundreds of rules and algorithms of what to do and what not to do when creating either a CoreSight debug and trace system or a CoreLink interconnect system.

 

System IP configurability is a key aspect to designing the very best SoCs, but with it comes increased complexity of design choices, system integration and verification. System IP configurability has evolved from simple hardware parameterization to highly configurable architectures and IP boundaries. We will look at how configurability is modeled in design flows and try and understand where current design flows are limited.

 

A defining feature of any system interconnect or debug and trace solution is its configurability, which I touched upon in my last blog. This configurability is vital for its function and makes it versatile for specific user requirements. The simple fact is that no two SoCs are the same, and the sytem IP needs to adapt to match. However, it often raises a number of design decisions.

 

You begin to ask yourself:

  “Do I need this feature?

  What is the best value to set this parameter to?

  How do I pick the most appropriate option for my SoC?”


These questions can often mount up, along with the nagging doubt that "Maybe I haven’t configured all my components to fit together correctly. I really don’t want a deadlock situation." Much like tuning a Formula 1 car engine, there are so many variables that can be tweaked that it can be difficult to settle on a setup that maximises the performance for a specific use while making sure that there is balance across the system.

 

What ARM has done with the Socrates design environment is create a tool that instantiates all of these connections through the use of rules and algorithms, thus ensuring that your system is correct-by-construction and unleashes the full potential of your CoreLink or CoreSight IP. In essence, it combines the intelligence and experience of our tech leads, system architects and support engineers inside one toolbox that allows you to package interfaces, build micro architectures and test connectivity in an easy-to-use design

environment. This allows you to cut through the noise of all the connection options and choose the one that works best for you. At ARM TechCon we are previewing two tools with this built in intelligence to take your design intent, the high level spec of what you need the CoreLink or CoreSight system to do. It then automates the configuration and connection of all the necessary IP blocks to create the required sub-system and verify its correctness, give estimates of its area & performance and generate all the output you need to take the design forward into the implementation stage. The new design environment delivers productivity in two main ways:

 

  1. It automates the mechanical and repetitive tasks for you, cutting out the risk of errors
  2. It assists you in the real design choices only the designer can make

 

Some mechanical aspects are ripe for automation and save months of error-prone donkey work: identifying the exact interface definitions you need to connect to via IP-XACT descriptions and matching your system interface to them, generating precise and easily sharable documentation of your design; generating testbenches and test codes around your system, to name but three. Of course there are some design tasks that are more subtle and require a combination of intelligent algorithms along with the designer’s input in order to create a system that fits the user requirements. This mixture of necessary functions and new features is how architects can really differentiate their SoCs and add value to customers. Here instant feedback on the area and performance of the design guides those design trade-off decisions, leading to the most appropriate design. And anything you have modified manually still gets the automatic checking. This all adds up to a more optimised SoC that is designed faster and with less risk.

 

So to all those overworked engineers, I hope after reading this you’ll make the time to learn how to save a lot of time and come and visit us on booth #300 this Wednesday or Thursday or catch Simon Rance's technical session Friday 3:30pm.

So you are excited about the release of a new ARM-powered smartphone or tablet device – and why shouldn’t you be! You’ve made your way to your preferred tech review website where you discover that the device is running big.LITTLE™ technology – sweet! Though hang-on, it’s also running the latest version of ARM big.LITTLE software “big.LITTLE MP”. So what additional benefits does big.LITTLE MP bring compared to its predecessor? In this blog, I attempt to answer this frequently asked question.

 

The mobile analytics firm, Flurry, carried out an analysis on smartphone users in the US and made some interesting findings. The study found that mobile users spend most of their time on the following mobile activities:

  1. Web browsing and Facebook;
  2. Gaming;
  3. Audio, video and utility.

 

Calculated on a daily basis, web browsing and ‘facebooking’ accounted for 38% of a mobile user’s smartphone interaction time, gaming accounted for 32% and the use of audio, video and utility services was third in line at 16%. In total, the top three activities account for a staggering 86% of the time we spend on smartphones, and goes to show how far the mobile use case has come from the times when mobile phones were used plainly for voice calls and text messaging.

 

But how do these use cases impact on power consumption? By looking at the power profile (i.e. the power vs. time) for each activity, a pattern begins to emerge that highlights three very distinct patterns.

 

Mobile web browsing

For the web browser analysis, we used the BBench browser benchmark provided by the University of Michigan. BBench simulates browsing popular websites of varying richness and complexity, and enables key parameters to be configured. In order to ensure reliable results were obtained, we ran the workload with a clear environment for maximum accuracy and reproducibility. To maximize the reproducibility, execution of these workloads and related measurement were automated. The following graph shows the power profile that we produced from a run on a Symmetrical Multi Processing (SMP) system consisting of a Quad-core Cortex-A7 CPU subsystem.

                 Burst in Performance Graph.jpg

Graph 1: Power profile of web browsing use case

 

The first thing you will notice about the power profile (Graph 1) is the spikes in power. These typically occur when launching an application, loading content or scrolling through webpages. In other words, they occur when the system requires a short burst of performance to respond to a user interaction. Responsiveness is a type of user experience metric and therefore the better your mobile system is at handling such workloads, the better the overall mobile user experience.


Mobile gaming

For the mobile gaming workload, we ran the popular gaming application CastleMaster. Through workload analysis, we selected a period of gameplay that produced high intensity performance load which was automated to ensure reproducibility. The following graph shows the power profile produced from this workload from a run on an SMP system consisting of a Quad-core Cortex-A7 CPU subsystem.

                 Sustained Performance Graph.jpg

Graph 2: Power profile of web browsing use case


The power profile here requires a more constant level of power, which is common in intensive gaming applications, where the CPU cores are required to process a high amount of multi-threaded data for the GPU cores. In workloads like these, as you can imagine, power efficiency within the thermal budget of the system is vital.


MP3 audio playback

To demonstrate MP3 audio playback, we played a freely available MP3 audio sample on the default Android music player. The following graph shows the power profile that we produced from this workload from a run on an SMP system consisting of a Quad-core Cortex-A7 CPU subsystem.

     Low Intensity Graph.jpg

  Graph 3: Power profile of web browsing use case


Workloads such as audio playback and video playback are known as low intensity workloads and tend to have long use periods. Power savings is therefore essential to having a longer battery life.

 

Analysing the patterns in the power profile from each of the mobile applications above, we are able to identify three main building blocks, each present with a high degree of prominence across the workloads:

        1. Burst of high intensity workloads
        2. Sustained performance workloads
        3. Long-use low intensity workloads

 

Workloads 2.jpg

Graph 4: Power profiles of the building blocks in the top three mobile use cases


Graph 4 shows a conglomeration of each of these categories. We are able to observe a high degree of power and performance requirements in today’s mobile applications, particularly in the three classes of mobile activities that we spend most of our time on. In real life, a mobile user is usually listening to an MP3 audio playback while surfing the web or watching an embedded video while using Facebook. In such instances, we would expect a combination of these three classes of workloads. In order to be able to handle the requirements of such a mix of workloads efficiently, a combination of high performance and high power efficiency cores working seamlessly in a single mobile system is required.


This is where big.LITTLE Technology comes in. big.LITTLE Technology is a power optimization technology that, through the combination of high performance "big" cores and high efficiency "LITTLE" cores, along with big.LITTLE MP software, ensures the right task is run on the right core. This delivers increased levels of power efficiency, battery life and user experience. Graph 5 shows a comparison of the degree of improvement on average that big.LITTLE MP delivers when compared to its predecessor, Cluster Migration.


Worklod bL benefit.jpg

Graph 5: big.LITTLE MP improvement over big.LITTLE Cluster Migration


If you are keen to find out more about how big.LITTLE MP is able to achieve these improvements, I will be delving into this topic in my "big.LITTLE Unleashed" presentation at this year's ARM TechCon event, held next week (October 1st-3rd). If you have not registered for it yet, be sure to register for TechCon now.

 

If you are unable to make it, however, then fear not! In my next blog, I will dive deeper into the details of how big.LITTLE MP is able to achieve these improvements and show how it enables you to enjoy a higher quality mobile experience.

By Bee Hayes-Thakore and Thomas Ensergueix

 

Pervasive connectivity, largely spurred by mobile and tablet use is transforming the way we consume and interact with each other through cloud connectivity. The Internet of Things will expand this interaction further to a multitude of connected devices, influencing the connected city, home, healthcare and all aspects of embedded intelligence. This future demands embedded intelligence to be always-on, always-aware, always-connected, and demands more performance (particularly high Digital Signal Processing (DSP) performance) for local data pre-processing, voice and image processing, access to richer content and increased system reliability and fault tolerance.

 

 

It is with this future of embedded intelligence in mind that we announced today the new ARM Cortex-M7 processor, bringing a host of new features and capabilities to the Cortex-M family of low-power, 32-bit processors. Cortex-M7 introduces a number of micro-architectural features which enable our partners to build chips that can reach much higher levels of performance than existing microcontroller cores in terms of general-purpose code, DSP code and floating point code.

Cortex-M7_Diagrams_V2(3)-03-03 (1).jpg

Three lead licensees: Atmel, Freescale and STMicroelectronics have been working with ARM since the very early stage of development on the Cortex-M7 processor – they will be bringing exciting new products to market over the coming months. The ARM Cortex-M7 processor is targeted at demanding embedded applications used in next generation vehicles, connected devices, and smart homes and factories Through these products, the benefits delivered by the Cortex-M7 processor will be apparent to users in our increasingly connected world.

Cortex-M7 summary.PNG

For example domestic appliances (or white goods as they are referred to) would have previously had a simple user interface and be controlled by simple processors. But the next generation devices are getting smarter in order to operate more efficiently using minimal energy and resources. Next generation products are moving to more sophisticated displays, advanced touch screen panels, advanced control motors to include field oriented control algorithms in their motor driver control in order to operate more efficiently.  Some of these also need to run communications software stacks to interface with other appliances and interface with the outside world to provide billing information, power usage and maintenance information.

WhiteGoods cortex-M7.PNG

All of these requirements demand more performance from the microcontroller, which lies at the heart of the appliance – Cortex-M7 based MCUs will deliver that performance. In addition to excellent performance, not only does the Cortex-M7 processor extend the low power DNA inherent in the Cortex-M family but it also provides the same C-friendly programmer's model and is binary compatible with existing Cortex-M processors. Ecosystem and software compatibility enables simple migration from any existing Cortex-M core to the new Cortex-M7 core. System designers can therefore take advantage of extensive code reuse which in turn offers lower development and maintenance costs. You can find more information on Cortex-M7 on arm.com.

 

ARM TechCon - the largest meeting of the ARM Partnership - is taking place in Santa Clara in just a few days. Dr Ian Johnson, Product Manager for the Cortex-M7, will talk in greater depth about the the features of this new processor in “The Future Direction of the ARM Cortex-M Processor Family” session (2pm-3.50pm, October 1st) along with invited speakers from lead licensees and additional guests. Free ARM Expo passes are available with ARMExp100 code.


But why wait, you can start discussing Cortex-M7 processors with embedded experts here today!

 

Related content and discussions also on:

Atmel

Freescale

STMicroelectronics

Cortex-M7 Launches,you can read a detailed introduction from AnandTech.

AnandTech | Cortex-M7 Launches: Embedded, IoT and Wearables

And you can also find the information from ARM official website:

Cortex-M7 Processor - ARM

Yesterday we released version 3.10.0 of Valgrind, a GPL'd framework for building simulation-based debugging and profiling tools.  3.10.0 is the first official release to support 64-bit ARMv8.  The port is available from http://www.valgrind.org, and the release notes are available at http://www.valgrind.org/docs/manual/dist.news.html.

 

Porting the framework to the 64-bit ARM instruction set has been relatively straightforward.  The main challenge has been the large number of SIMD instructions, with some instructions involving significant arithmetical complexity: saturation, rounding, doubling and lane-width changes.  On the whole, the 64-bit instruction set is easier to simulate efficiently than the 32-bit ARMv7 instruction set, as it lacks dynamically conditionalised instructions (a la Thumb) and partial condition code updates, both of which hinder fast simulation.  As the port matures I expect it to attain performance comparable with other Valgrind-supported architectures.

 

Porting the tools based on the framework was almost no effort, because the framework is specifically designed to insulate tools from the details of underlying instruction sets.  Currently the following tools work well enough for serious use: Memcheck (memory checking), Helgrind, DRD (thread checking), Cachegrind and Massif (time and space profiling).

 

Initial development was done using cross-compilation and running on the ARM Foundation model, which proved to be a reliable starting point.  Further development was done on an ARM Juno board running a Fedora snapshot.  The Juno board made a big difference, as it facilitated building Valgrind "natively" and can build and run regression tests in a reasonable time frame.

 

We look forward to feedback from developers using the port to debug/profile serious workloads, on the order of millions to tens of millions of lines of C++.

Embedded processors are frequently compared through the results of Power, Performance and Area (PPA) implementation analysis. Jatin Mistry and I have created a whitepaper that describes the specific details of the PPA analyses performed on the Cortex-R Series processors.

 

Often high-level figures are quoted for processors, for example http://www.arm.com/products/processors/cortex-r/cortex-r5.php under the "Performance" tab, shows top level details of the Cortex-R5 in a mainstream low power process technology (40nm LP) with high-density, standard-performance cell libraries and 32KB instruction cache and 32KB data cache - this shows the total area as 0.45mm2.


However, behind the top-level power, performance and area results there are many variables and details that can dramatically alter these figures. Different implementations target different configurations, for example the cache sizes or inclusion of the Floating Point Unit (FPU), and target different goals, for example aiming to achieve the highest possible frequency or the lowest possible area. The process and libraries used have a dramatic affect. The attached whitepaper describes the process we use to perform a PPA analysis for the Cortex-R Series processors.

 

The goal of the whitepaper is to describe, for those without really deep processor implementation knowledge, the many variables that should be understood to get real value from any PPA data presented to enable an estimation of the real PPA of your own proposed processor implementation and also to make fair comparisons between processors, both from a single IP partner or between processors from different processor IP vendors.

 

Any PPA data without understanding the details behind it is of very little value. We hope that you find it informative.

What is the connection between rugby football, interconnect and performance analysis kits?

 

There is a seemingly never-ending march towards smaller, cheaper and more efficiency in complex chip design, and every component of the modern SoC is being squeezed for more with each new design. There is a case of diminishing returns when seeking improvements and designers need to be creative in order to find new ways to eke out those extra bits of performance that ultimately make the difference across the entire chip. The World Cup-winning rugby coach Sir Clive Woodward famously stated that excellence was best achieved by improving 100 factors by 1%, and this theory certainly holds true for a lot of the SoC’s that are being designed these days. Staying on the theme of rugby for the moment, the interconnect is like a scrum half (or a quarterback for those of you who live east of the Atlantic!) as it acts as the go-between for each component and marshals them effectively to make the chip greater than the sum of its parts. A scrum half’s performance is measured by the speed and efficiency with which he passes the ball to his teammates, thus enabling them to do their job more effectively, similarly to how you would want your system interconnect to function.

Scrum half.jpg

This role increases in importance as massive growth in system integration places on-chip communication at the centre of system performance. The ARM CoreLink NIC-400 is a very powerful and highly configurable interconnect with many high-end features to manage traffic passing through it. It is in fact so configurable that it is regularly one of the most popular IP models created and downloaded on Carbon Design Systems’ IP exchange portal for virtual prototyping (found here). This configurability allows a single user to create dozens of models for the system interconnect, and reflects the importance that users place on having accurate models for the components in their system that have a great influence on overall performance. With so many parameters in play the ability to test the interconnect within the system prior to tapeout is clearly of great value. Just setting all parameters to max performance is rarely a sensible option as power and cost budgets demand that less silicon is used to achieve the same levels of performance the full system modelling allows refinement to save silicon are, reduce the number of wires without compromising performance goals.

 

While the configurability of the interconnect is an inherent and indeed crucial part of its effectiveness, the vast amount of choices available also means that users often do not fully optimise the interconnect to their individual system. This is where virtual prototyping tools come into the equation, and help designers to avoid arbitration problems, detect system bottlenecks and give a better picture of how to manage PPA requirements. This ability to foresee and avoid potential issues before they become a problem is invaluable in an age where the pressure to get designs right first-time and on time is a concern of every system architect. Additionally, the depth of analysis that the Carbon tool can undertake provides fast and meaningful feedback that can help you measurably improve your design. Last year I co-wrote a white paper on this subject with Bill Neifert, titled “Getting the most out of the ARM CoreLink NIC-400”, which is available to download.

Carbon.png
In the example shown here, a simple NIC-400 is configured with two masters and two slaves. The masters are set up to mimic the data loads from a CPU and DMA controller and the dummy targets are an Ethernet MAC and a DDR3 memory controller. Of course, since the traffic generators are quite configurable, it’s possible to model any number of different sources or targets and we’ll get more into that in a bit. Note though that we’re analysing traffic on any of the connections. The graphs shown here track the latency on the CPU interface and the queues in the DDR controller. The exact metrics for the system in question will of course vary based upon design targets however. It’s also beneficial to correlate data across multiple analysis windows and indeed even across multiple runs.

 

The important thing we’ve done here is establish a framework to begin gathering quantitative data on the performance of the NIC-400 so we can track how well it meets the requirements. The results can be analysed which will likely lead to reconfiguration, recompilation and re-simulation. It’s not unheard of to iterate through hundreds of various design possibilities with only slight changes in parameters. It’s important to vary the traffic parameters as well as the NIC parameters however since the true performance metric of the NIC-400 and really, all interconnect IP, is how it impacts the behavioural characteristics of the entire system.

 

I will be going into more detail on all of this on Thursday at 18:00 BST (1:00 pm EDT, 10:00 am PDT) in a webinar titled “Pre-silicon optimisation of system designs using the ARM CoreLink NIC-400 Interconnect” with Eric Sondhi, a corporate applications engineer at Carbon Design Systems. You can register for the webinar here, and make sure to attend live to ensure that your questions are answered immediately.

Filter Blog

By date:
By tag: