This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

clCreateProgramWithSource & clCreateProgramWithBinary on Mali GPU

What is the difference between these?

when using clCreateProgramWithBinary on qualcomm gpu, the execution time is reduced dozens of times compare to clCreateProgramWithSource.

but on Mali GPU, There is no big difference between these two methods, what could be the reason?

Parents
  • Hi arcsoft_ylj,

    The specification allows 2 possible formats for the binary version, either a 'Device specific executable' or an 'Implementation specific intermediate representation'.

    The latter is what we use, known as IR. It sounds like Qualcomm may use the former method for theirs.

    What this means is, we do our optimisations going from IR to the device specific executable, and thus these optimisations will take place on both binary and source versions of your program.

    We do expect a small increase in performance from the IR binary however, but certainly not 'dozens of times better' such as you are seeing with Qualcomm.

    Without more information and a reproducible, it is difficult to say exactly why, and I suspect it will vary depending on the program in question. Some things are much more easily optimised compared to other things.

    Apologies I could not give a more comprehensive answer, I hope this was useful for you.

    If you have further questions, please feel free to ask.

    Kind Regards,

    Michael McGeagh

Reply
  • Hi arcsoft_ylj,

    The specification allows 2 possible formats for the binary version, either a 'Device specific executable' or an 'Implementation specific intermediate representation'.

    The latter is what we use, known as IR. It sounds like Qualcomm may use the former method for theirs.

    What this means is, we do our optimisations going from IR to the device specific executable, and thus these optimisations will take place on both binary and source versions of your program.

    We do expect a small increase in performance from the IR binary however, but certainly not 'dozens of times better' such as you are seeing with Qualcomm.

    Without more information and a reproducible, it is difficult to say exactly why, and I suspect it will vary depending on the program in question. Some things are much more easily optimised compared to other things.

    Apologies I could not give a more comprehensive answer, I hope this was useful for you.

    If you have further questions, please feel free to ask.

    Kind Regards,

    Michael McGeagh

Children
No data