We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
The product page of both the m7 and m85 both show figures for integer MAC throughput; but omit such figures for floating point formats.
Im talking here about simple long-vector dot products, repeated fused-multiply-additions; read from TCM and with appropriate unrolling of course.
The best fp32 throughput is seen for the m4 comes from the arm libraries; at about 5-6 clock cycles per fused operation, which is a little disappointing.
The m85 architecture claims to be faster; and indeed makes concrete claims to that effect for int types.
But what float throughput (32 or 16 bit) does it actually manage? I am unable to find any data on the matter; but surely im not the only person interested in that figure?
Just of curiosity, can you link the relevant docs?