As usual it happened late on Friday afternoon. A couple of weeks ago a message arrived in my inbox from one of our latest ARM® Cortex™-M0+ partners: "We're using 90LP and a similar configuration to your "min" with just a couple of additional and relatively small options and we can't match your reported dynamic consumption (11.2µW/MHz). We can't figure out what's wrong, can you please help to find out what we may have missed?"
For a fraction of a second I wondered whether I should pretend I was already gone and come back to them on Monday? Well...no, I wanted to get to the bottom of it, and we start exchanging mails: How many tracks for the cell library? Target synthesis frequency? Which precise processor options? And after each exchange, I got even more confused. So focused on the details I had missed the right question from the start: "How far are you over the 11.2µW/MHz?"; answer "Well in fact we are 7% below your figures, around 10.4µW/MHz, and we were not expecting to match your marketing values, even less to be lower. Could it be some parts of the processor are not synthesized or clocked?"
It was now my turn to make a late Friday afternoon call to our implementation manager: "Let me check and try some experiments, I should have something for you on Monday".
Back in office first thing on Monday morning, a message from Jonathan "Flycatcher new PPA @ 90LP"... hum someone was working late on Sunday evening. Reading quickly through the text: "new baseline flow 2013.03 ... relaxed max trans... tightened up the floor plan... better utilization.... additional area/power recovery step... ban use of high drive cells.... et voila: 9.828µW/MHz!". Even better it sounds like with some more work we can probably do even better.
So indeed, using a more recent flow and spending a little more time on the routing gives a much better result that our trials run before launch just over a year ago. Here is the PPA comparison using TSMC's 90LP with a 7-Track RVt lib at 50MHz, fully routed, extracted and STA'ed:
The Cortex-M0+ CoreMark performance also improved few weeks ago, raising from the 1.77 CoreMark/MHz as of the launch in March 2012 to now 2.15CoreMark/MHz using the latest ARMcc v5.03 (see footnote): a healthy 21.5% increase.
Beyond the intrinsic processor Power Performance Area (PPA), it is important to remember that Cortex-M0+ was designed from the beginning to significantly reduce the number of memory accesses to optimize energy consumption at system level; in general memories, both SRAM and FLASH, are even more energy hungry than the processor itself. Thanks to its 2-stage pipeline and additional smart optimizations it has the lowest instruction fetch activity across the Cortex-M family.
So even more than a year ago when we introduced Flycatcher, aka Cortex-M0+, the ARM Partnership can enjoy the most energy-efficient and size optimized embedded processor. And if you can beat Jonathan's PPA feel free to give to drop us a mail, even late on Friday afternoon!
CoreMark 1.0 : 21.46 /ARM C Compiler 5.03 [Build 24] -O3 --loop_optimization_level=2 -Otime -DMICROLIB --library_type=microlib --cpu=cortex-m0 / FPGA platform, Code in SRAM - Data in SRAM, memory and CPU clocked @10MHz