Originally this blog post was intended to be all-in-one, but I was suggested to split it into smaller parts.
So what I'll do, is that I'll mention the features I'd like in my ARM processor, one at a time, piece by piece.
The purpose of this, is to throw in new ideas (good and bad) to the ARM engineers.
-Features, that may be able to make a difference, especially features, which would help the soft- and hardware developers in getting to new places.
Now let's start...
Currently, the only processor I know of, that supports 128-bit floating point calculation, is the PowerPC (combining two 64-bit registers).
If we had 128-bit floating point registers, we could calculate precision math very quickly.
I'd use such feature to make billions of planet gravity calculations per second.
These mainly include multiply and add, subtract and square-root calculations.
Having a high precision vector unit would definitely make insane performance boosts here.
I know we will get there some day (after Cortex-A57), but the sooner we'll get it, the sooner we'll get the cool end-results.
Perhaps it'll be the next Cortex-A, which can deliver an impressive performance when it comes to precision math, opening up further possibilities.
If you had a 128-bit precision floating point unit, what would you use it for - or what kind of things do you think it could be used for ?
I agree with Jens, I think that 128 bit register (for floats and more) would be very useful.
Especially with hardware packing/unpacking capability for conversions to other bases
(i.e. 3/4/5/6 bit video encodings, etc.)
Personally I would like to see more Analog Capabilities in the smaller & medium size ARM processors.
The only folks (that I'm aware of...) working on this at present are Cypress with their PSoC4 (M0) & PSoC5 (M3).
Adding programmable analog greatly reduces the power budget compared to using high speed DSP operations to
accomplish the same thing in the digital domain. To be more specific, I'd like to see :
1) programmable gain OpAmps
2) DAC's that can fed by DMA without processor intervention
3) ADC's that can be sampled using DMA without processor intervention
4) Switched-Capacitor Filters
5) Internal Routing multiplexers that don't require external pins to connect 1/2/3/4 together.
And I'd like to see those on a higher performance part, like an M4R/M4F type part.