This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OpenCL support for Mali-T628 MP6 on Arndale Octa?

Summary

Is OpenCL support for the Mali-T628 (for example as found in the Exynos 5420 SoC on the Arndale Octa board) available? If so, how to set it up?

More details

According to the vendor, OpenCL should be supported, but the Arndale Octa Wiki does not state how this can be achieved.

I am using the latest Linaro developer build and installed Mali drivers that contain OpenCL libraries for Mali T604. According to this guide, the driver actually contains references to the Mali T628. So I tried to create the udev rule as specified, which is supposed to solve a permission problem with /dev/mali0, but I found that there is no /dev/mali0 on my installation at all. So my conclusion is that the driver indeed does not support T628.

When I execute a clinfo utility, clGetDeviceInfo returns CL_OUT_OF_HOST_MEMORY for some device properties. Why can I query the GPU for some characteristics, but does this fail for some others? When running a normal application, the same error appears when trying to create an OpenCL Context.

I was surprised to find this topic, where yoshi seems to have OpenCL working and can run benchmarks on his Arndale Octa board. How is this possible if there is no driver available? Or am I just missing something? I hope that you can help me to also establish a working OpenCL development environment.

Parents
  • Is there any reference document (by Mali?) that lists all important specs, like the number of FLOPS (including the exact number of vector, scalar and dot units per compute unit?
    Here the number of flops is 16 instead of 17

    It must be a typo for them, it's definitely 17 max.

    Is there anything I can do to verify which core group is actually used?

    I normally use http://graphics.stanford.edu/~yoel/notes/clInfo.c to quickly sanity check a platform. Will run this on the chromebook to confirm when I get it running again this afternoon.

    What benchmark are you using to measure those 33.27 GFLOPS?

    That's with clPeak, it gives me the numbers I would expect for the work that it's doing, i.e. just vector add and multiply, no scalar, no dot product. We have a kernel which exercises all functional units, but it is obviously synthetic and not representive of a real workload. I cannot stress enough that if you are interested in real world performance, you should move away from these synthetic benchmarks and look at actual data for proper use cases. There are some real-world oriented benchmarks out there already, which already have Mali powered devices in their results.

    directly compare the efficiency of this presumably energy-efficient GPU...

    Do you have a workload in mind? Does clPeak represent the sort of work you will be doing? If not I'd recommend looking into existing benchmarks/applications which represent the sort of work you will be doing and use that as your basis for comparison, rather than synthetic benchmarks like this one. Will your application run on handhend smartphones/tablets, or is it intended to be run in a compute farm somewhere?

    I get 6 and 600 respectively. So I would assume that I am running on all six cores at 600 Mhz. Is there anything I can do to verify which core group is actually used?

    On T628 MP[5-8] there will be 2 core groups, so it should not be possible to see one device with 6 cores, they will be exposes as 2 separate devices. It might be an issue with the benchmark? That does explain where your expectation came from however

    UPDATE: Have spoken to someone from the driver team, if you're only seeing one device then it must be an old driver, it's worth asking Insignal what their roadmap is for providing updates. On current drivers you will see 2 devices, one with 4 cores and one with 2.

Reply
  • Is there any reference document (by Mali?) that lists all important specs, like the number of FLOPS (including the exact number of vector, scalar and dot units per compute unit?
    Here the number of flops is 16 instead of 17

    It must be a typo for them, it's definitely 17 max.

    Is there anything I can do to verify which core group is actually used?

    I normally use http://graphics.stanford.edu/~yoel/notes/clInfo.c to quickly sanity check a platform. Will run this on the chromebook to confirm when I get it running again this afternoon.

    What benchmark are you using to measure those 33.27 GFLOPS?

    That's with clPeak, it gives me the numbers I would expect for the work that it's doing, i.e. just vector add and multiply, no scalar, no dot product. We have a kernel which exercises all functional units, but it is obviously synthetic and not representive of a real workload. I cannot stress enough that if you are interested in real world performance, you should move away from these synthetic benchmarks and look at actual data for proper use cases. There are some real-world oriented benchmarks out there already, which already have Mali powered devices in their results.

    directly compare the efficiency of this presumably energy-efficient GPU...

    Do you have a workload in mind? Does clPeak represent the sort of work you will be doing? If not I'd recommend looking into existing benchmarks/applications which represent the sort of work you will be doing and use that as your basis for comparison, rather than synthetic benchmarks like this one. Will your application run on handhend smartphones/tablets, or is it intended to be run in a compute farm somewhere?

    I get 6 and 600 respectively. So I would assume that I am running on all six cores at 600 Mhz. Is there anything I can do to verify which core group is actually used?

    On T628 MP[5-8] there will be 2 core groups, so it should not be possible to see one device with 6 cores, they will be exposes as 2 separate devices. It might be an issue with the benchmark? That does explain where your expectation came from however

    UPDATE: Have spoken to someone from the driver team, if you're only seeing one device then it must be an old driver, it's worth asking Insignal what their roadmap is for providing updates. On current drivers you will see 2 devices, one with 4 cores and one with 2.

Children