This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

fp64 on Mali T604

Note: This was originally posted on 21st August 2012 at http://forums.arm.com

First of all, congrats to ARM for submitting Mali T604 for OpenCL full profile conformance. I hope the tests are finished soon.
I was wondering about fp64 support on the T604. ARM has been quite vocal about T604 supporting fp64, but details (such as speed relative to fp32) have not been released. Any more details on how fp64 is implemented and what performance to expect?

fp64 will make it really useful for my project, which is related to GPGPU for scientific computing type workloads.
Parents
  • Note: This was originally posted on 14th November 2012 at http://forums.arm.com

    [font=arial, sans-serif][size=2]> 1/24 of fp32 rate [/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    I expect it will be significantly better than this ...
    [font=arial, sans-serif][size=2]
    [/size][/font]
    [font=arial, sans-serif][size=2]> to 1/2 of fp32 rate[/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    ... but not quite as good as this.


    More specifically the main issue how do you quantify "fp32" rate. For graphics workloads a lot of the fp32 (highp) and fp16 (mediump) calculations can be optimized in the hardware.  Common graphics operations like vector dot products are not usually done with "general purpose" floating point units; you can do it more power efficiently in fixed function accelerators. [font=arial, sans-serif][size=2]Now graphics doesn't need fp64 (or even fp32) in most cases, so this fixed function hardware probably doesn't exist for the fp64 cases.[/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
Reply
  • Note: This was originally posted on 14th November 2012 at http://forums.arm.com

    [font=arial, sans-serif][size=2]> 1/24 of fp32 rate [/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    I expect it will be significantly better than this ...
    [font=arial, sans-serif][size=2]
    [/size][/font]
    [font=arial, sans-serif][size=2]> to 1/2 of fp32 rate[/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    ... but not quite as good as this.


    More specifically the main issue how do you quantify "fp32" rate. For graphics workloads a lot of the fp32 (highp) and fp16 (mediump) calculations can be optimized in the hardware.  Common graphics operations like vector dot products are not usually done with "general purpose" floating point units; you can do it more power efficiently in fixed function accelerators. [font=arial, sans-serif][size=2]Now graphics doesn't need fp64 (or even fp32) in most cases, so this fixed function hardware probably doesn't exist for the fp64 cases.[/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
Children
No data