In Part 1 of this blog we saw how right-sizing of air conditioning is vitally important because it performs three different functions simultaneously: Cooling, dehumidification and ventilation. Increases in efficiency could be obtained by separating out these three functions and optimizing them independently. As we saw last time, pumping air through ducts is inefficient due to the wasted pumping energy. You could use a hydronic system like a traditional underfloor heating system as is common in northern Europe, but with cooling in the ceiling as well as heating in the floor. Pumping water is a more efficient way to move energy than air. Even though the water pipe is physically smaller than an air duct, in terms of thermal energy transfer capacity it's a fatter pipe. But this wouldn't dehumidify or ventilate, so you'd still need a very small A/C system with air ducts to provide these functions. Duct size and pumping losses would be much lower than a pure ducted central air system though, as the volume of air needed for dehumidification and ventilation is far lower than that for cooling. Effectively what you'd have is a very small A/C system for dehumidification and ventilation and a hydronic system for the large heating and cooling loads. The small system takes some of the strain off the big system too, as it does provide a very limited amount of heating and cooling, which can cope with moderate loads. Large cooling load powers would require the big hydronic system to kick in though.
These types of systems do actually exist for very large commercial projects (where energy costs add up to big numbers). In these systems the Dedicated Outdoor Air System (DOAS) which brings in ventilation air from outside, dehumidifies it and exchanges it with stale inside air could be larger than the A/C system in most houses. But no one has yet built a residential system (except in arid climates like Arizona where dehumidification isn't an issue) and there doesn't appear to be readily available units sized suitably for moderate-sized residences.
I wonder why this type of A/C system hasn't caught on? Maybe it just needs a cool marketing name? 'Heterogeneous Big/Little Air Conditioning' is the best one I've thought of so far. OK, it's actually not that cool a name really but I'm not trendy and I haven't got square glasses so what did you expect me to know about cool? Actually, its lack of popularity is probably more related to the additional up-front costs of two complete systems and the relatively low energy costs we've all become accustomed to. If those two things change, and I'm willing to bet that at least one will (guess which?) then who knows, maybe it'll become common?
Incidentally, I searched high and low for suitable A/C technology. I even looked at aquarium chillers (who knew there was such a thing?), portable A/C units (turns out they're actually not that efficient) and taking beer fridges apart (sacrilege, we Brits don't really drink warm beer) trying to find the right-sized chiller for the little system. But the biggest problem of all was controlling the system accurately enough to keep the hydronic system above the dew point; otherwise you could get condensation. Condensation on the ceiling panels equals indoor rain! When I explained the proposed system to the A/C expert at the Austin Green Energy Program (what he doesn't know about A/C isn't worth knowing) he said "Your house is going to be a fantastic science experiment! But if the experiment fails you'll have spent a lot of money on a house you can't live in". That focused my mind a little. So did the cost estimate.
It turns out that using right-sized equipment is important for SoC design as well. A year ago, ARM launched the Cortex-A15 MPCore processor. This is a very high performance, three-issue superscalar, extensively out-of-order processor. It's really going to smash down some barriers enabling ARM to move into markets we haven't previously been able to play in. But how much of the time do you really need that level of performance in a high-end smartphone? Recently I've been benchmarking smartphones to see how much of the time they spend at each Dynamic Voltage and Frequency Scaling (DVFS) performance point. It turns out that they tend to spend a lot of their time at the lower clock speeds, at least with high performance ARM v7A cores like Cortex-A8 and Cortex-A9 MPCore. Video, graphics and sometimes audio are hardware-accelerated (specialized hardware is always more energy-efficient) so a lot of the time the CPU is just coordinating things. And then there's all those background services you see when you type 'ps -ax' which no-one really quite knows what they are for (except the kernel gurus) but is too afraid to kill -9 just in case it's something important. You could do a lot of these tasks with a much smaller processor. It may be overkill to use a sledgehammer to crack a nut, but sometimes a sledgehammer is what you really need. But if all you've got is a sledgehammer, after a while everything starts to look like a wall. The main applications for flat-out maximum performance seems to be rendering content-rich web pages, high-end games or emerging computation for augmented reality. Here you need immediate responsiveness and all the performance as fast as possible. In fact pretty much all tasks seem to fit into one of two categories. Either they just need to process at a steady rate, often less than half of the maximum performance level, or they run as fast as possible to proved great user experience, then get back to idle as quickly as possible as the user consumes the data or thinks of the next "move". Back in the early days of DVFS with ARM1176JZ I was one of the folks arguing that performance points between 50% and 100%, like 75%, 80% or 90% would be important. From my recent work it looks like I was wrong, but I would argue that I was talking about ARM11 not a modern Cortex MPCore multi-core processor. 80% of an ARM11 would be less than 50% of a Cortex-A9 dual-core. So I was wrong (now), but I was right then (sic).
About four years ago ARM started to look seriously at the idea of Heterogeneous Big / Little multiprocessing, now known as big.LITTLE, to the trend over the last few years of continued improvement in high-end performance, with some incremental battery savings. The market demanded cost-effective ways to improve both in ample quantity. So ARM's approach was to have a small, very energy-efficient processor for the lighter tasks enabling you to completely power-down the big processor (e.g. Cortex-A15) to save energy and leakage power when you don't actually need the high-end performance. Since the smaller processor will have better energy efficiency (in terms of DMIPS/mW) and be lower in area (hence lower leakage), the little processor would be more energy-efficient on the workloads it's capable of handling.
The really big change was to have two different processors with the exact same instruction set and programmer's model, but completely different micro-architectures; i.e. different pipelines (one with a 'fat' pipe and one a 'skinny' pipe) and make them fully cache-coherent. When we told the operating system kernel gurus about the idea it was pretty similar to my experience with the A/C guru: "Sounds like an interesting science experiment, but you'll never get proper software support in an OS". Fortunately ARM has deeper pockets than me and wasn't going to be deterred by some nay-saying beardy sandal-wearers. We created our own open-sourceable software, which sits underneath the OS , utilizing our new hardware virtualization capability (introduced in the Cortex-A15 processor and present in the architecturally-compatible Cortex-A7) . This can help hide the processor specifics from the OS (there's minimal software-visible differences anyway). Of course this approach is simplified by using the same number of big and little processors. But as we moved on with the development, guess what? The beardy sandal-wearers changed their stance from: "It'll never work you know" through "Actually it's a pretty neat idea" to "When can we get our hands on some hardware?"
Once true OS support is there, various configurations will be possible. For example, whilst initial smartphones might use two big plus two little processors, in future one big plus four little could be optimal for lower-cost phones. There's usually only one main foreground task that needs completing as soon as possible, and lots of background tasks. As a bonus, the small processors take some of the strain off the big processor too, so it can be 100% dedicated to the foreground task and doesn't need to time-slice in all those other necessary but tedious background services. All this inter-cluster, system-wide coherency is handled by AMBA 4 ACE, the AMBA 4 coherency extensions, although it has been proposed within the academic community. With a cache-coherent system consisting of both the 'big' Cortex-A15 and the new 'little' ARM Cortex-A7 MPCore, you've got the Cortex-A7 for cracking lots of nuts and the Cortex-A15 for smashing down the occasional wall when you need to. Actually I think the Cortex-A7 is going to turn out to be such a good nutcracker that it's really going to give the competition a serious balls-ache.
My A/C idea was derailed mainly by cost and availability. But what about the additional cost of big.LITTLE multiprocessing I hear you ask? Well we've not only launched the Cortex-A7 processor, we've already delivered it to lead partners so clearly availability isn't an issue. Each core of a Cortex-A7 cluster is less than 0.5mm2 in a 28nm process. I've heard from various sources that silicon goes for about 10 cents/mm2 so you're looking at maybe a few cents of additional up-front silicon cost even for a dual-core Cortex-A7 with L2 cache and for that you get way better battery life. Sounds like quite a good deal to me! Better than my A/C system would have been anyway. Oh, and by the way, remember Part 1 when we talked about 'Dark Silicon'? All those femto-acres of wasted silicon we can't use for anything? Well in that situation, the additional silicon cost of heterogeneous big.LITTLE multiprocessing becomes free. I can even imagine a way-off future where you have more than just two sizes of processor. You could have lots of little processors, a few medium-sized ones and one or two really big ones. Just like efficient A/C, it's all about using the right-sized equipment for the task at hand. Maybe the future world of Dark Silicon won't be quite so dark after all?
Unfortunately the future world of heterogeneous big/little air-conditioning doesn't look quite so bright, and not just because of my uncool marketing name. What's needed is low-cost, very efficient 'little' energy-recovery heat pumps for the dehumidification and ventilation functions and a much more intelligent control system to manage the dew-point of the hydronic system. A self-organizing wireless intelligent sensor network ought to do the trick. Which sounds like yet another cool application for our little, energy-efficient ARM processors to me!