1 2 3 Previous Next

SoC Implementation

121 posts

The Internet of Things represents a vast opportunity for electronics designers if we can figure out one not-so-small problem.

 

We are rightly having myriad conversations here in the ARM Connected Community on the potential:

Will IoT Break M2M Silos for Start-up Apps?

IoT Success Depends upon Decoupling

Diversity drives IoT deployment

Balancing IoT's promise with privacy concerns

Freescale's Five S’s of IoT

 

(And this is just a fraction of what's been published here!)

It reflects our innate understanding that IoT's application potential is big, broad and pretty much unbounded. And yet we need to simplfy it, as Steve Nelson of Freescale has blogged about (Embedded Beat: The Five S’s of IoT | Freescale Community). (Thanks for the heads up, Lori Kate Smith!).

 

We need, as part of that simplification exercise, to tackle that not-so-small problem I alluded to: We need a natural user interface (Dennis Laudick has some great insight into interface design possibilities: From Keyboards to Touchscreens To....? The ‘Futuristic’ HMIs That Will Soon Be A Reality In Your Pocket.).

 

In earlier technology eras, we adapted to specific interfaces (computing keyboards, mobile touchscreens). But IoT will be so diverse that we need it to adapt to us. And that's where natural user interface design comes in.

My colleague Seow Yin Lim explains the implications, and she'll be writing a series about it. Natural user interface design should be a great industry conversation.


What are your thoughts?


As use of advanced node technologies (FinFET specifically) ramp, more designers are confronting challenges in technology, productivity, and time to market. To get a sense for what engineers need to know about advanced nodes, FinFETs, and parasitic extraction, Brian Fuller, editor-in-chief at Cadence, sat down with Hitendra Divecha, senior product marketing manager at Cadence, to understand more about where we are today.Hitendra%20Divecha-cropped.jpg

 

Q: Hitendra, let's start at a really high level: As we move down into leading-edge nodes—16/14nm FinFETs and beyond—what are the main challenges that designers face today?

 

A: Well, we can bucket these challenges into two main categories: Increasing complexity and modeling challenges. It’s not just tighter geometries and new design rules, which come with every new process node. We talked about the introduction of FinFET, but there is 3D-IC as well,the number of process corners is exploding, and specifically for FinFET devices, there is an explosion in the parasitics that couples capacitances and resistances. This increases the design complexity and sizes—the netlist is getting bigger and bigger, and as a result, there is an increase in extraction runtimes for SoC designs post-layout simulation and characterization runtimes for custom/analog designs.


Q: You mentioned modeling challenges and accuracy, what’s happening there?

 

A: Yes, so design complexity is one challenge, but there are various modeling challenges as well. For FinFET devices, for example, there is an introduction of local interconnects, there are second- and third-order manufacturing effects that also need to be modeled. So all of these new features have to be modeled with precise accuracy. Performance and turnaround times are one thing, but if you can’t provide accuracy for these devices, especially as it relates versus the foundry golden data, there is a burden on our customer that they have to over-margin their designs and leave performance on the table.

 

Q: Talk a little more about that. We talk about the enormous percentage of design time taken up by verification in general. How much has extraction, as a subset, grown as we get into these advanced nodes? Can we quantify that?

 

A: Well, from our customers' perspective, while their extraction and time-to-signoff times are increasing, their time to market is shrinking. It can take anywhere from six to eight weeks for designers to close the signoff loop and, as you know, extraction is a critical step in this loop. We have been told by our customers, that while the extraction run time varies based on the design sizes and types, that full flat extraction at these advanced nodes can take up to three days with their current extraction tools. This puts an enormous amount of pressure on our customers' ability to have design closure in a timely manner to meet their time-to-market pressures.

 

Q: OK, so extraction is a huge pain point for our customers…

 

A: Yes, so huge that we have solve our customers' problems and help them accelerate signoff extraction turnaround time. There’s no way around it. The market has lacked tools that deliver the performance required to produce a significant speed-up in both digital and transistor signoff extraction flows.

 

Q: We’re going to get to that in a second, but before we do, engineers coming around the corner and confronting advanced nodes may not have a sense for what they’re in for.

 

A: Absolutely. Let me put a finer point on that. Signoff extraction has become challenging due to a number of reasons.

 

First, the number of interconnect corners on both the digital and custom/analog sides has exploded partly due to the introduction of double-patterning technology (DPT), first introduced at 20nm and carried over to 16/14nm FinFETs.

 

Second, design sizes are increasing. At 20nm and below, there are more than 70 million net designs. With more corners and larger design sizes, extraction goes from taking a day to a few days to complete.

 

Q: We've talked about complexity challenges...let's move on to modeling challenges.

 

A: Are you ready for this? There are 155X more resistances for FinFET than 28nm devices. This growth means bigger netlists, which impact post-layout simulation performance and require faster simulation runtime. Tools need to model three different resistance types: contact resistance, spreading resistance, and extension resistance. And consider this—the thickness of the 3D gate introduces new capacitances. From FinFET to fringe capacitances, double patterning, and more, the modeling features have just grown more complex, and that stretches out extraction runtime.

 

Q: Parasitic extraction has been a big issue for some time. So what’s wrong with existing flows and tools?

 

A: In some cases nothing, especially for certain designs and older nodes. But as I’ve said, advanced nodes are a different ballgame. In most cases, different extraction engines are used in implementation and signoff, resulting in poorly correlated results that have a negative impact on design closure. Consistent extraction engines throughout the flow—meaning implementation and signoff—is a linchpin to our customers' time to signoff by reducing the number of ECO loops they have to go through.

 

Q: We touched a bit on productivity. So, at 16/14nm and FinFET technology, older extraction technologies can’t necessarily keep up with all the additional complexity you’ve alluded to, correct?

 

A: Yes, parasitic extraction is a means to an end in both digital- and transistor-level extraction flows. However, it is a very BIG means to an end. We listened to our customers' time-to-market challenges, and we’ve brought the  massively parallel architecture to bear on the problem. The Cadence Quantus™ QRC Extraction Solution, which we just announced, offers up to 5X better turnaround time for both single- and multi-corner extraction versus traditional extraction tools in the market today, provides scalability to 100s of CPUs and machines, and delivers best-in-class accuracy for FinFET designs measured against foundry golden. Also, with the Quantus QRC solution, we continue to provide leading functionality for custom/analog designs, including very cool functionality to address automotive application designs and our new random-walk field solver, Quantus FS.

 

For an example of a customer design like I talked about earlier, we can reduce their extraction runtimes in 10 hours or less instead of three days without compromising on accuracy. In summary, with the combination of performance, accuracy, and our tight integration with our implementation tools, the Encounter® Design Implementation System and the Virtuoso® platform, the Quantus QRC solution delivers the fastest path to signoff.

 

Q: You’re a busy man, so thanks for your time, Hitendra!

 

A: No problem!

Brian Fuller


Related stories

ARM has released DS-5 version 5.19 including the Ultimate Edition for ARMv8 to compile and debug 64-bit software. The Carbon Performance Analysis Kits (CPAKs) for the ARM Cortex-A57 and Cortex-A53 demonstrate 64-bit bare metal software examples which can be modified and compiled with DS-5. The software in the currently available CPAKs is compiled with ARM Compiler 5, better known as armcc, and not yet configured for ARM Compiler 6, also known as armclang. Fortunately, only a few changes are needed to move from armcc to armclang.

 

Today, I will provide some tips for using ARM Compiler 6 for those who would like to use the latest compiler from ARM with CPAK example software. In the future, all CPAKs will be updated for ARM Compiler 6, but now is a good time to give it a try and learn about the compiler.

 

ARM Compiler 6 is based on Clang and the LLVM Compiler Framework, and provides best in class code generation for the ARM Architecture. There are various articles covering the details, but the key takeaway is that ARM Compiler 6 is based on open source which has a flexible license and allows commercial products to be created without making the source code available.


Migration Guide

 

A good place to understand the differences between armcc and armclang is the ARM Compiler Migration Guide. It explains the command line differences between the two compilers and how to map switches from the old compiler to the new compiler. The migration guide also covers two additional tools provided to aid in switching compilers:

  • Source Compatibility Checker
  • Command-line Translation Wrapper

 

The compatibility checker helps find issues in the source code that is being migrated, while the translation wrapper provides an automatic way to call armcc as before, but invisibly calls armclang with the equivalent options. I didn’t spend too much time with either tool, but they are worth checking out.


The key point is that migration will involve new compiler invocation and switches, but it may also involve source code changes for things such as pragmas and attributes that are different between the compilers.


CPAK HOWTO

 

Let’s look at the practical steps to use ARM Compiler 6 on a Cortex-A53 CPAK software example. For this exercise I selected the DecrDataMP_v8/ example in the Applications/ directory of the CPAK. The system is a dual-cluster A53 where each cluster has 1 core. It also includes the CCI-400 to demonstrate cache coherency between clusters and the NIC-400 for connecting peripherals. The block diagram is shown below.

Cortex-A53 System

Setting up DS-5 is very easy, I use Linux and bash so I just add the bin/ directory of DS-5 to my PATH environment variable. Adjust the path to match your installation.

 

$ export PATH=$PATH:/o/tools/linux/ARM/DS5_5.19/64bit/bin

 

Only the 64-bit version of DS-5 includes ARM Compiler 6, it’s not included in the 32-bit version of DS-5 so make sure you install the 64-bit version and run on a 64-bit Linux machine.

 

The first step to using ARM Compiler 6 is to edit the Makefile and replace armcc with armclang to compile the C files. Any assembly files can continue to be compiled by armasm and linking done with armlink remains mostly the same. It is possible to compile assembly files and link with armclang, but for this case I decided to leave the flow as is to learn the basics of making the compiler transition.

 

The Makefile specifies the compiler as the CC variable so make it CC=armclang

 

The next important change is the specification of the target CPU. With armcc the --cpu option is used. You will see --cpu=8-A.64.no_neon in the Makefile. One tip is to use the command below to get a list of possible targets.

 

$ armcc --cpu list

 

With armclang the target CPU selection is done using the -target option. To select AArch64 use -target aarch64-arm-none-eabi in place of the --cpu option.

 

The invocation command and the target CPU selection are the main differences needed to switch from armcc to armclang.


Other Switches

 

This particular CPAK software is using –-c90 to specify the version of the C standard to use. For armclang the equivalent option is –xc –std=c90 so make this change in the Makefile also.

 

The next issue is the use of –-dwarf3 option. This is not supported by armclang and it seems like DWARF4 is the only option with armclang.

 

The Makefile also uses –Ospace as an option to shrink the program size at the possible expense of runtime speed. For armclang this should be changed to –Os.

 

The last difference relates to armlink. The armlink commands need --force_scanlib to tell armlink to include the ARM libraries. From the documentation, this option is mandatory when running armlink directly. Add this flag to the armlink commands and the compilation will complete successfully and generate .axf files!

 

Here is a table summarizing the differences.

 

ARM Compiler 5

ARM Compiler 6

Invoke using armcc

Invoke using armclang

--cpu=8-A.64.no_neon

-target aarch64-arm-none-eabi

--dwarf3

None

-Ospace

-Os

 

--force_scanlib

 

I encountered one other quirk when migrating this example to ARM Compiler 6, a compilation error caused by using .h file in the source file retarget.c

 

  #include <rt_misc.h>

 

For now I just commented out this line and the application compiled and ran fine. It’s probably something to look into on a rainy day.


Creating an eclipse Project for DS-5

 

It wouldn’t be DS-5 if we didn’t use the eclipse environment to compile the example. It’s very easy to do so I’ll include a quick tutorial for those who haven’t used it before. Since a Makefile already exists for the software I used a new Makefile project.

 

First, launch eclipse using

 

$ eclipse &

 

Once eclipse is started, use the menu File -> New -> Makefile Project with Existing Code

 

Pick a name for the a project and fill it into the dialog box, browse to the location of the code, and select ARM Compiler6 as the Toolchain for indexer settings.

 

DS5 armclang

 

There are many ways to get the build to start, but once the project is setup I use the Project menu item called Build Project and the code will be compiled.

 

There is a lot more to explore with DS-5, but this is enough information to get going in the right direction.


ARM Techcon

 

Now is a great time to start making plans to attend ARM TechCon, October 1-3 at the Santa Clara Convention Center. The schedule has just been published and registration is open. I will present Performance Optimization for an ARM Cortex-A53 System using Software Workloads and Cycle Accurate Models on Friday afternoon.

 

Jason Andrews

The annual Design Automation Conference (DAC) was held last month, and what a show it was!  If you recall, back in February, I talked about TSMC and ARM’s first tape-out with the ARM Cortex-A57 and Cortex-A53 processors. At DAC, we presented more data, as well as projections that show TSMC’s 16nm FinFET+ process can provide up to a 15% boost in performance from the 16nm FinFET process, or a 30% reduction in power at the same speed. While the “big” ARM Cortex-A57 could hit a frequency of over 2.5GHz, what was even more impressive was that it’s “LITTLE” counterpart, the ARM Cortex-A53, could operate at points that consume nearly 1/10th the peak power of the “big”. For our customers that are designing premium mobile this is great news, since these results show that high frequency can still be achieved while taking advantage of the power savings that the big.LITTLE architecture provides. 

As 64-bit designs in mobile and enterprise become more prevalent, the primary target is not just maximum frequency. There are always some power considerations that need to be met. As the graph below shows, using the big.LITTLE processors such as ARM Cortex-A57 and Cortex-A53 allow flexibility for the SoC designer to target various frequency and power targets. Manufacturing at TSMC 16nm FinFET+ provides the highest performance SoCs for premium mobile or enterprise applications. Even while some next generation mobile designs require reduced voltage domains down to 0.7V, you can see that the Cortex-A57 can hit over 2.1GHz (@TT / 85C) at those reduced voltage levels. This high frequency, combined with the Cortex-A57’s capacity to deliver more performance per MHz results in top of the line performance while staying well under the 750mW budget.  It also shows the extensive dynamic range of computing performance that a big.LITTLE configuration can deliver in a mobile power budget.

TSMC ARM bigLITTLE 16FF.png

The 16nm FinFET+ process is further evidence of TSMC’s commitment to improve the ecosystem to deliver to both ARM and TSMC customers best-in-class manufacturing along with best-in-class processors. Some of the design challenges that were addressed during the collaboration effort include:

  • Finding the right Vt/channel length mix and design flows
  • Building robust power distribution structures
  • Controlling local congestion and pin access
  • Driving up cell utilization
  • Understanding the effects of wire dominated paths – exposes different critical paths


Timely collaboration is the key to understanding these challenges early, and providing solutions both in the process technology and physical IP, and enabling the manufacturing ecosystem. Starting with the first ARM Cortex-A57 tape-out, then building on that with Cortex-A57/Cortex-A53 big.LITTLE tape-out, our collaboration with ARM has resulted in an optimized IP ecosystem that customers can take to market quickly. Our customers have planned multiple tape-outs this year in 16FF+, and more big.LITTLE based designs to drive next-generation mobile applications! 

Hot Chips 26.png

Registration is now open for HOT CHIPS 26 with early-bird registration ending this Friday, July 25.

 

The technical symposium is taking place at the Flint Center in Cupertino, California from Sunday, August 10 through Tuesday, August 12, bringing together designers and architects of high-performance chips, software, and systems. The tutorial and presentation sessions focus on up-to-the-minute developments in leading-edge industrial designs and research projects.

 

Attendees are in luck - we have some of our rockstars presenting throughout the conference.

 

Rob Aitken.jpg


Sunday, August 10 at 9:55AM

Emerging Trends in Hardware Support for Security Tutorial - "Mobile Hardware Security"

Rob Aitken, an ARM Fellow and heads ARM’s R&D efforts in advanced silicon technology. His areas of responsibility include low-power design, library architecture for advanced process nodes, and design for manufacturability. His research interests include design for variability, defect analysis, and fault diagnosis.

 

 

 

 

 

Bill Curtis.jpg

 

Sunday, August 10 at 4:50PM

Internet of Things Tutorial - "Standards for Constrained IoT Devices"

Bill Curtis, Lead Strategist for the Internet of Things at ARM. Prior to this role, Bill was Senior Fellow and Design Engineer at AMD, CTO of Dell’s Consumer and Displays groups, and Director of Computer Sciences for Landmark Graphics/Halliburton. Bill is a systems architect with experience ranging from tiny embedded devices to high-performance computing environments. He is currently focused on integrating those two worlds.

 

 

 

 

 

Mike Muller.jpg

 

Monday, August 11 at 11:30AM

Keynote - "Power Constraints: From Sensors to Servers"

Mike Muller, one of the founding members of ARM, Mike has held several executive positions including Marketing Director, Vice President of Marketing, Executive Vice President of Business Development and his current position of Chief Technology Officer. In October of 2001, he was appointed to the board. Before joining the Company, he was responsible for hardware strategy and the development of portable products at Acorn Computers. He was previously at Orbis Computers. He holds an MA in Computer Science from Cambridge University.

 

 

 

Mike Filippo.jpg


Monday, August 11 at 5:30PM

ARM Servers Session - "ARM Next-Generation IP Supporting LSI’s High-End Networking"

Mike Filippo, an ARM Fellow and currently the Lead Architect of ARM’s line of high-end CPUs, including Cortex-A57 and next-generation follow-ons. Prior to these roles, Mike was the Lead Architect of ARM’s first-generation enterprise interconnect family, including CCN-504 and CCN-508, as well as a Co-Lead Architect of ARM’s next-generation coherent interconnect architecture, AMBA 5 CHI. 

 

 

 

 

 

Register now via HOTCHIPS.org.

SAN FRANCISCO--Design Automation Conference 2014, more than ever, was about SoC design challenges and solutions, IoT, automotive, embedded market opportunities and how they electronics ecosystem is joining to push ahead into advanced nodes.


If you missed DAC 2014 (and even if you were there and couldn't shape-shift to follow multiple events in parallel) here are some highlights:

 

I was exhausted at the end of that week and now I know why! My editorial colleagues Schirrmeister, Richard Goering, Sarah Adams, Lani Wong, Christine Young and Sean O'Kane executed a full-court press on DAC to make sure we covered all the bases. Kudos to them and to the team at the ARM Connected Community Brad Nemire, Lori Kate Smith, Leah Schuth,Alban Rampon and all the other ARM CC rock stars!


Related stories:

Everything DAC

DAC 2014 – Five Mega Trends for System Design and Verification

So, that was DAC 2014

DAC Breakfast: 14nm is Real and Ready for Use

The Aftermath of the ARM Step Challenge at DAC

Here's a summary by IEEE's Spectrum Magazine regarding IBM's recent "7nm and beyond" research funding committment:

http://spectrum.ieee.org/nanoclast/semiconductors/nanotechnology/ibm-pours-3-billion-into-future-of-nanoelectronics

 

This is potentially great news for our industry, because IBM has a rich pipeline of research heading toward the post-CMOS nanoelectronics world, and they fill a key niche between University tinkering and foundry manufacturing.  There's a lot more going on at IBM than the couple of snippets provided in this article, so I'm excited to see the news.

 

And in case you weren't aware, our R&D group here at ARM has a history of working with IBM in the future technology arena:

 

http://www-03.ibm.com/press/us/en/pressrelease/33405.wsshttp://

 

http://www.cadence.com/cadence/newsroom/press_releases/pages/pr.aspx?xml=103012_14nm_test_chip

 

https://www.semiwiki.com/forum/content/1349-collaboration-28nm-20nm-14nm-ibm-cadence-arm-globalfoundries-samsung.html

When it comes to Internet of Things systems, the name of the game is low power and small footprint.


Cortex-M0-chip-diagram-LG.png

This is often easier said than done in an environment in which designers must take those two huge issues into consideration while figuring out how to implement and verify low-power mixed-signal blocks into cost-effective SoCs and get to market before their competitors.


ARM and Cadence are hosting a webinar July 22 to explore just that: How teams can address Internet of Things (IoT) and (SoC) design and verification challenges in a timely and effective manner.


The webinar will focus on SoC implementation and verification using an ARM Cortex-M0 processor. The Cortex-M0 is ARM’s smallest 32-bit core, consuming as little as 16µW/MHz (90LP process, minimal configuration) in an area of under 12,000 gates.


This turns out to be especially useful for engineers migrating from 8- and 16-bit systems who are keeping their eye on efficient code use but want the performance enhancements that come with a 32-bit architecture.


Diya Soubra, CPU Product Manager for ARM Cortex-M3 processors at ARM, and Ian Dennison, Solutions Marketing Senior Group Director for the Custom IC and PCB Groups at Cadence, will guide listeners through ways to reduce time to market and realize power-performance-area design targets.

 

Click here to register.

 

Brian Fuller

 

Related stories:

--Webinar: Addressing MCU Mixed-Signal Design Challenges

Integration challenges for Connectivity IP in the IoT space – Sunrise Micro Devices is addressing each one of them

 

I am going to broadly divide IP into RF/Analog IP and Digital IP. Both have unique integration and delivery challenges. Increasingly though, more of the IP is mixed signal. Take us, Sunrise Micro Devices (SMD), for instance. SMD is in the Internet of Things (IoT) market where we expect billions of devices to be connected to the internet in the next few years. Every semiconductor IC device, in addition to having a microprocessor and sensor solution, will need a connectivity solution. IPs in this space are necessarily mixed signal; consequently, they inherit the complexities and issues of both digital and RF IPs.


At a very high level, the main issue with IP is that the simulated environment is different from the final design environment. Analog and RF IP is dependent on process/node, foundry, layout, extraction, model fidelity, and placement. So you are either tied to just dropping it in ‘as is’ and treating it like a black box (nobody knows how it works and whether it meets the required specifications) or completely changing it (with the caveat that you can no longer expect the same results). Digital IP needs to be resynthesized followed by placement and routing, and it takes several iterations to make the IP you got work the way you want it to work. In addition, this process is extremely tool-dependent.

 

Finally, there are system level issues like interoperability, interface and controls (how does the IP talk to the rest of the SoC). A very important, often overlooked factor is the communication between the IP providers and the SoC implementation houses – there are documents outlining integration guidelines, but without an automated process that takes in all that information, a lot could be lost in translation.

 

IP is no longer just IP blocks, we now have IP sub-systems. This is very true in our space (IoT market), where a radio solution needs a transceiver, a baseband and a link layer controller – including blocks in RF, Analog, purely digital and mixed signal domains. The only way to address this effectively is to do so at the architectural level. We are seeing fragmented solutions from IP providers, some providing just the transceiver, and some baseband/controller solutions and making integration very difficult.


Very few companies have expertise in both the RF/Analog and digital domains; SMD is an expert in low-power radio design and our Technologist-in-Residence, David Flynn is an ARM Fellow who has extensive experience in the low power digital domain. Together, we have architected a solution to address the problem we see in this IP market. The resulting solution is a Hard Macro (pre-qualified) that has integrated transceiver, baseband and the Link Layer controller. Our radio IP has a peripheral interconnect which is a standard AHB interface that is compatible with most microprocessor architectures, greatly simplifying SoC integration.

 

Another aspect of system integration is SW availability – We provide firmware required for the radio in the ROM and also provide provision for updates through patch RAM. This along with the timing-independent interface to the host controller in all our IP offerings enables easy implementation of the stack and application layers. We have also paid special attention to system level reset, timing and control

 

Besides addressing the ‘Ease of integration’ issue at a system and architectural level, we support it via integration and implementation manuals, reference schematics, data sheets and applications notes on antenna selection, PCB layout, Bluetooth qualification, regulatory certifications and production test guidance which enables silicon partners with little or no prior RF experience to bring to market BLE enabled SOCs in a timely, risk-free, and cost effective manner.

 

Companies/engineers can only can keep up with the explosive growth in computing needs in the market by efficient IP re-use, so I actually see this as a huge potential for third-party IP vendors. There are known issues and IP companies recognize them. Every IP vendor, can address the specific IP integration issues in their domain, differentiate their offerings and offer better solutions. The cream will rise to the top. The vendors that offer effective solutions and ease integration pain points are the ones that will thrive.


Smart and Connected

中文版本 Chinese Version

 

 

June 24, 2014, Huawei Technologies officially announced their new smart phone Huawei Honor 6 smartphone in Beijing China, the device uses Hisilicon's new Hisilicon Kirin 920 SoC mobile processor.

 

For those who donot know Huawei mobile well, Huawei mobile terminals have two ranges, they are Huawei Ascend and Huawei Honor. The difference is that Huawei Ascend mainly targets high-end market, which are divided into four series: D (diamond), P (platinum), G (gold), Y (Young) [check out my blog in 2013 at Huawei Asecond P6 launch event in London Showcasing Huawei's Ascend P6 in ARM Powered Rubik's Cube Solver]. Whereas, Huawei Honor series aim at the mass market, cost-effective, high-profile, major expansion within the Chinese market, particularly wanting to be driven by a young, motivated group, & associate with daily living affordable brand.

 

The new release of Huawei Honour 6 smartphone uses the latest Hisilicon 8 Core Kirin920 SoC, 28nm HPM, with ARM big.LITTLE CPU -- a Quad-core Cortex-A15 + Quad-core Cortex-A7 processor, eight-core can be opened simultaneously. In other words, it is high-performance with ARM Cortex-A15 processor, low-power with ARM Cortex-A7 processor effect; it also comes with ARM Mali-T628 graphics processor (GPU) [check out ARM CC blog Huawei Chooses ARM Mali GPUs for its Premium Smartphone Offering] and features 3GB of RAM. The Huawei Honor 6 smartphone also features a 5 megapixel front facing camera and a 13 megapixel rear camera, the handset comes with Android 4.4.2 Kit Kat, plus the company’s EMUI 2.3 user interface. The device has an aluminum casing and it also comes with CAT 6 LTE, which supports download speeds of up to 300 Mbps. Although there are different price speculation before the launch, the final announced retail price for the Huawei Honor 6 16GB version is RMB1999 (US$320), 32GB version is RMB2,299 (US$370). This new smart handset will officially be on market from 1 July 2014. Here are some photos from the launch event.

 

 

Hisilicon CTO Mr. Wei Ai & President of Huawei Honor Division Mr. Jiangfeng Liu introduced Hisilicon R&D history and Kirin 920 performance, which is eight-core heterogeneous multi-processors (ARM big.LITTLE), five-mode full-band LTE Cat6 300M Modem, also comes with ARM Mali-T628 MP4 GPU based. Hisilicon Kirin920 is now officially launched!

    


Hisilicon Kirin 920 SoC architecture & performance

photo 2.JPGphoto 4.JPGphoto 5.JPG

Huawei Hornor 6 smartphone retail pricing and onsite products demo area

    

 

    

 

 

Here you can find out more about ARM big.LITTLE:

 

ARM Official Site on big.LITTLE (English, 中文)

ARM big.LITTE micro site - register for relevant news update, download white paper

Ten Things to Know About big.LITTLE

big.LITTLE in 64-bit

What is the latest progress on big.LITTLE technology?

big.LITTLE technology moves towards fully heterogeneous Global Task Scheduling - Techcon Presentation

ARM's big.LITTLE architecture aims to satisfy the hunger for power - Q&A with John Goodacre, Director, Technology and Systems, ARM Processor Division

Power Management with big.LITTLE: A technical overview

big.LITTLE and AMBA 4 ACE keep your cache warm and avoid flushes

Extended System Coherency - Part 1 - Cache Coherency Fundamentals

Extended System Coherency - Part 2 - Implementation, big.LITTLE, GPU Compute and Enterprise

First tape-out with TSMC’s 16nm FinFET and ARM’s 64-bit big.LITTLE Processors

ARM CoreLink 500 System IP — Enabling 64-bit big.LITTLE

ARM & Linaro - big.LITTLE FAQ on Google+ Hangouts

 

Video on Youtube via ARMflix

ARM big.LITTLE Technology Explained , published on 8 April 2014

Programming for Multicore & ARM big.LITTLE technology (GDC 2014), published on 21 March 2014

ARM® big.LITTLE™ Processing with ARM® Mali GPUs Demonstrating GPU Compute, published on 11 September 2013

ARM® big.LITTLE™ Processing with QuickOffice, published on 10 September 2013

ARM® big.LITTLE™ Processing with Angry Birds game, published on 10 September 2013

ARM big.LITTLE Hangout with the Experts, published on 14 August 2013

ARM big.LITTLE Overview and demonstration, published on 26 October 2012

ARM® big.LITTLE™ Technology, published on 25 October 2012

ARM & Linaro - big.LITTLE FAQ, published on 18 October 2012

ARM Cortex-A7 launch -- big.LITTLE demonstration, Nandan Nayampally, Director, Product Marketing, published on 19 October 2011

Last week, I received the call for papers for the Embedded World Conference for 2015. The list of topics is a good reminder of how broad the world of embedded systems is. It also reminded me how overloaded the term “embedded" has become. The term may invoke thoughts of a system made for a specific purpose to perform a dedicated function, or visions of invisible processors and software hidden in a product like a car. When I think of embedded, I tend think about the combination of hardware and software and learning how they work together, and the challenge of building and debugging a system running software that interacts with hardware. Some people call this hardware dependent software, firmware, or device drivers. Whatever it is called, it’s always a challenge to construct and debug both hardware and software and find out what the problems are. One of the great things about working at Carbon is the variety of the latest ARM IP combined with a spectrum of different types of software. We commonly work with software ranging from small bare-metal C programs to Linux running on multiple ARM cores. We also work with a mix of cycle accurate models and abstract models.

 

If you are interested in this area I would encourage you learn as much as possible about the topics below. Amazingly, the most popular programming language is still C, and being able to read assembly language also helps.


  • Cross Compilers and Debuggers
  • CPU Register Set
  • Instruction Pipeline
  • Cache
  • Interrupts and Interrupt Handlers
  • Timers
  • Co-Processors
  • Bus Protocols
  • Performance Monitors


I could write articles about how project X at company Y used Carbon products to optimize system performance or shrink time to market and lived happy ever after, but I prefer to write about what users can learn from virtual prototypes. Finding out new things via hands-on experience is the exciting part of what embedded systems are for me.


Today, I will provide two examples of what working with embedded systems is all about. The first demonstrates why embedded systems programming is different from general purpose C programming because working with hardware requires paying attention to extended details. The second example relates to a question many people at Carbon are frequently asked, “Why are accurate models important?” Carbon has become the standard for simulation with accurate models of ARM IP, but it’s not always easy to see why or when the additional accuracy makes a difference, especially for software development. Since some software development tasks can be done with abstract models, I will share a situation where accuracy makes a difference. Both of the examples in this article looked perfectly fine on the surface, but didn’t actually work.


GIC-400 Programming Example


Recently, I was working with some software that had been used on an ARM Cortex-A9 system. I ported it to a Cortex-A15 system, and was working on running it on a new system that used the GIC-400 instead of the internal GIC of the A15.


People that have worked with me know I have two rules for system debugging:

  1. Nothing ever works the first time
  2. When things don’t work, guessing is not allowed

When I ran the new system with the external GIC-400, the software failed to start up correctly. One of the challenges in debugging such problems is that the software jumps off to bad places after things don’t work and there is little or no trail of when the software went off the path. Normally, I try to use software breakpoints to close in on the problem. Another technique is to use the Carbon Analyzer to trace bus transactions and software execution to spot a wrong turn. In this particular case I was able to spot an abort and I traced it to a normal looking access to one of the GIC-400 registers.


I was able to find the instruction that was causing the abort. The challenge was that it looked perfectly fine. It was a read of the GIC Distributor Control Register to see if the GIC is enabled. It’s one of the easiest things that could be done, and would be expected to work fine as long as the GIC is present in the system. Here is the source code:

c1

The load instruction which was aborting was the second one in the function, the LDRB:

c2

The puzzling thing was that the instruction looked fine and I was certain I ran this function on other systems containing the Cortex-A9 and Cortex-A15 internal GIC.

 

After some pondering, I recalled reading that the GIC-400 had some restrictions on access size for specific registers. Sure enough, the aborting instruction was a load byte. It’s not easy to find a clear statement specifying a byte access to this register is bad, but I'm sure it's in the documentation somewhere. I decided it was easier to just re-code the function to create a word access and try again.

 

There are probably many ways change the code to avoid the byte read, but I tried the function this way since the enable bit is the only bit used in the register:

c3

Sure enough, the compiler now generated a load word instruction and it worked as expected.

 

This example demonstrates a few principles of embedded systems. The first is the ability to understand ARM assembly language is a big help in debugging, especially tracing loads and stores to hardware such as the GIC-400. Another is that the code a C compiler generates sometimes matters. Most of the time when using C there is no need to look at the generated code, but in this case there is a connection between the C code and how the hardware responds to the generated instructions. Understanding how to modify the C code to generate different instructions was needed to solve the problem.

 

Mysterious Interrupt Handler

 

The next example demonstrates another situation where details matter. This was a bare-metal software program installing an interrupt handler for the Cortex-A15 processor for the nIRQ interrupt by putting a jump to the address of the handler at address 0x18. This occurs during program startup by writing an instruction into memory which will jump to the C function (irq_handler) to handle the interrupt. The important code looked like this, VECTOR_BASE is 0:

c4

The code looked perfectly fine and worked when simulated with abstract models, but didn’t work as expected when run on a cycle accurate simulation. Initially, it was very hard to tell why. The simulation would appear to just hang and when the simulation was stopped and it was sitting in weird places that didn’t seem like code that should have been running. Using the instruction and transaction traces it looked like an interrupt was occurring, but the program didn’t go to the interrupt handler as expected. To debug, I first placed a hardware breakpoint on a change on the interrupt signal, then I placed a software breakpoint on address 0x18 so the simulation would stop when the first interrupt occurred. The expected instruction was there, but when I single stepped to the next instruction the PC just advanced one word to address 0x1c, and no jump. Subsequent step commands just incremented the PC. In this case there was no code at any other address except 0x18 so the CPU was executing instructions that were all 0.

 

This problem was pretty mysterious considering the debugger showed the proper instruction at the right place, but it was as if it wasn’t there at all. Finally, it hit me that the only possible explanation was that the instruction really wasn’t there.

 

What if the cache line containing address 0x18 was already in the instruction cache when the jump instruction was written by the above code? When the interrupt occurred the PC jumps to 0x18 but would get the value from the instruction cache and never see the new value that had been written.

 

The solution was to invalidate the cache line after writing the instruction to memory using a system control register instruction with 0x18 in r0:

c5

Although cache details are mostly handled automatically by hardware and cache modelling is not always required for software development, this example shows that sometimes more detailed models are required to fully test software. In hindsight experienced engineers would recognize self-modifying code, and the need to pay attention to caching, but it does demonstrate a situation where using detailed models does matter.

 

Summary

 

Although you may never encounter the exact problems described here, they demonstrate typical challenges embedded systems engineers face, and remind us to keep watch for hardware details. These examples also point out another key principle of embedded software, old code lives forever. This often means that while code may have worked on one system, it won’t automatically work on a new system, even if they seem similar. If these examples sound familiar, it might be time to look into virtual prototypes for your embedded software development.

 

Jason Andrews

AppliedMicro is hosting a panel discussion with keynotes from HP and Sandia on HPC workloads using AppliedMicro’s X-Gene SoC that is based on ARM 64-bit architecture at ISC’2014 this week. Following the keynotes, a partner panel will feature ARM, Boston, E4 Computer Engineering, Eurotech, Mellanox and NVIDIA. ARM’s Andrew N. Sloss, Senior Principle Engineer, will be speaking during the panel session on June 24, between 1:00 – 2:00 p.m. CEST, at Congress Center Leipzig, Germany, in Room M02. For more information on the panel, contact jbrendel@apm.com or tliew@apm.com. To see live demonstrations of X-Gene in HPC, visit the AppliedMicro booth (#506) at ISC 2014.

 

For more information on AppliedMicro’s X-Gene, read today’s announcement, AppliedMicro Announces Readiness of 64-bit ARM®-based Server SoC for High Performance Computing.

SAN FRANCISCO--We've written a lot about the really fun, really clever ARM Step Challenge at DAC 2014. From the initial The ARM Step Challenge at DAC: Throwing Down the Gauntlet post to The Aftermath of the ARM Step Challenge at DAC.

Sean O'Kane took me aside on the floor of Moscone Center to do a brief retrospective for ChipEstimate TV. John Heinlein and Phil Dworsky might want to take a look!

Now that DAC's in the rear view mirror, I know I'm joined by all competing partners in looking forward to doing it again at ARM TechCon (Oct. 1-3) in Santa Clara.

Keep on truckin'!

 

As a huge baseball fan.. I was disappointed that the Giants were not in San Francisco this week.

Then I went to the 51st annual Design Automation Conference and found that ARM was hitting the home run I wanted to see.

The ARM Connected Community was out in force on the exhibition hall.  We showed how complete the solution stack is for developers using ARM IP.

From Design to Verification to Software to Foundry there were great examples of how to pull it all together and get products done.

We had partners like  Phil Dworsky from Synopsys showing their full range of solutions from Galaxy Design Platform support to

FPGA Prototyping solutions and everything in between.

phild.JPG

And we had Cadence showing how real IoT solutions get done.

IMG_0005.JPG

 

Here is my old friend Valerie Rachko from Mentor Graphics Corporation showing real solutions for Embedded Debug and Questa, one of the industry standard products for functional verification.

IMG_0010.JPG

What struck me on a day where Intel made some announcements on the beginnings of their EDA partnerships, is how rich our solution truly is..

Companies of all sizes partner with ARM to innovate in Software and Hardware.  I am personally interested in the role that memory technology plays in the future..
I found one demo in the connected community booth really interesting.

For embedded applications (IoT and the like) a company called Memoir Systems has a a tool for automatic generation of a diverse set of memory architectures based on Artisan IP.

IMG_0004.JPG

 

So I went to the Intel Booth and what did I find?

I found a fine demonstration of Wind River Simics doing software debug on ARM.  Another example of how complete our story is.

 

From the time a team thinks about a product to develop to the time its done, there are strong tools for design, development and management of that project.  Tools like the ARM DS-5 Development Studio, or from our many EDA partners, even.. Intel

On the last day of the conference, the ARM VP for PIPD Dipesh Patel gave his visionary keynote.  He said "It has never been easier to develop complex SOC!" ?? Whoa.. and had I not spent the week there. I would have been stunned by this claim.. EASY?

This is EASY?   but when he said it .. I said "yep.. about right".

At the beginning of the year, Cadence introduced a great new video series and blog called Whiteboard Wednesdays. Each Wednesday, our engineers bring you a chalk talk-like series of insights on semiconductor and systems IP design issues we all face today, whether it's Scott Jacobson talking about how to close the memory wall gap in our inaugural Whiteboard Wednesday video or last week's edition in which Arif Khan takes a closer look at PCI Express and its role in improving power optimization:

 

 

Here's a selection of popular Whiteboard Wednesdays to date:

Here's a link to our Whiteboard Wednesdays blog with the complete listings, updated each week, and the link to our Cadence YouTube playlist.

Filter Blog

By date:
By tag: