This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

fp64 on Mali T604

Note: This was originally posted on 21st August 2012 at http://forums.arm.com

First of all, congrats to ARM for submitting Mali T604 for OpenCL full profile conformance. I hope the tests are finished soon.
I was wondering about fp64 support on the T604. ARM has been quite vocal about T604 supporting fp64, but details (such as speed relative to fp32) have not been released. Any more details on how fp64 is implemented and what performance to expect?

fp64 will make it really useful for my project, which is related to GPGPU for scientific computing type workloads.
  • Note: This was originally posted on 21st September 2012 at http://forums.arm.com

    did you get answers?
  • Note: This was originally posted on 21st September 2012 at http://forums.arm.com


    did you get answers?


    No :(
  • Note: This was originally posted on 24th September 2012 at http://forums.arm.com

    [size=2]Mali-T604hardware supports FP64 now, work is ongoing to implement the cl_khr_fp64extensions. Performance ratio between FP64 and FP32 maths is in line with otherindustry implementations[/size]
  • Note: This was originally posted on 13th October 2012 at http://forums.arm.com


    Mali-T604hardware supports FP64 now, work is ongoing to implement the cl_khr_fp64extensions. Performance ratio between FP64 and FP32 maths is in line with otherindustry implementations


    Well, industry implementations vary between 1/24 of fp32 rate to 1/2 of fp32 rate, so not sure how to interpreter your statement. I guess I will just wait for the release of the official docs. :)
  • Note: This was originally posted on 14th November 2012 at http://forums.arm.com

    [font=arial, sans-serif][size=2]> 1/24 of fp32 rate [/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    I expect it will be significantly better than this ...
    [font=arial, sans-serif][size=2]
    [/size][/font]
    [font=arial, sans-serif][size=2]> to 1/2 of fp32 rate[/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    ... but not quite as good as this.


    More specifically the main issue how do you quantify "fp32" rate. For graphics workloads a lot of the fp32 (highp) and fp16 (mediump) calculations can be optimized in the hardware.  Common graphics operations like vector dot products are not usually done with "general purpose" floating point units; you can do it more power efficiently in fixed function accelerators. [font=arial, sans-serif][size=2]Now graphics doesn't need fp64 (or even fp32) in most cases, so this fixed function hardware probably doesn't exist for the fp64 cases.[/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]
    [font=arial, sans-serif][size=2]
    [/size][/font]