This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Signal Processing with MALI 400 MP

Note: This was originally posted on 16th April 2012 at http://forums.arm.com

Hi, I would like to offload the main CPU from some heavy brute-force signal processing and use the GPU for that.

For example, multiplying an array of floats by a scalar, or doing multiply-accumulates like a += coef[i] * data[i];


I don't know if this is possible with the MALI 400, if I have to use Open GL ES or the shader or ?
That would be great if I could develop and test all my GPU algorithms on the PC instead of doing it on the device.

I know more and more people are talking about this (with some NVIDIA GPU).
Any information is welcome,  because I don't know where to start !
  • Note: This was originally posted on 16th April 2012 at http://forums.arm.com

    Hi,

    it sounds as though you are talking about General Purpose computing on the Graphical Processing Unit (GPGPU).

    Whilst the Mali-400 chip can do the maths you're talking about, the API it supports (OpenGL-ES) doesn't provide easy methods for getting the answers back to the CPU - the API is very much designed to assume the results are going to progress further down the graphics pipeline.

    For that reason, I don't think the Mali-400 will be a suitable platform for your investigations.

    However, the next generation of Mali is nearly here - the Mali-T600 series GPUs will additionally support another Khronos API, OpenCL. This API has been designed specifically for the kind of job you're discussing - doing calculations on the GPU.

    I'd recommend reading up on OpenCL - the Khronos website http://www.khronos.org/opencl/ will have some good info on the API.

    There's some blogs here and other info here, too:

    GPU Computing in Android? With ARM Mali-T604 & RenderScript Compute You Can!

    Arm Developer: Mali

    HTH, Pete
  • Note: This was originally posted on 17th April 2012 at http://forums.arm.com

    Hi Pete,

    Thanks for your reply, for sure the MALI T-600 series look very nice and I would be more than happy to use OpenCl !!
    But the chip I really need to use for my project is a Cortex A8 with MALI 400 (because of cost). You are right, my wish is to use the GPU for general purpose computing, and my application is related to multi-channel audio processing (I have no real need for fancy 3D graphics ...).


    Whilst the Mali-400 chip can do the maths you're talking about, the API it supports (OpenGL-ES) doesn't provide easy methods for getting the answers back to the CPU - the API is very much designed to assume the results are going to progress further down the graphics pipeline.

    So the difficulty is not due to the MALI itself but to the APIs ?
    If, even at low-level in the MALI driver, the only output from the GPU that is accessible to the CPU is the framebuffer, I guess I can stop to have any hope right away. But if there is a way (even laborious) to retrieve the computation results, then that's awesome
  • Note: This was originally posted on 26th April 2012 at http://forums.arm.com

    Hi,

    the Mali *400* series was primarily designed as a 3D graphics processor, and essentially you can think of it as two main execution units, the vertex processor and the fragment processor. Each was designed to be efficient at its main purpose - the vertex processor is designed to (amongst other things) multiply vec4 vertex positions by 4x4 transformation matrices. It does this by running the same shader program on every vertex in a draw call. The transformed position and other vectors (e.g. a normal vector, a texture coordinate) are output ready to be interpolated across the surface of a triangle, defined by considering 3 of the vertices together (as specified by the draw call).

    Using the OpenGL-ES 2.0 API, you cannot interrupt at this stage and intercept the transformed vertex data.

    For every screen fragment contained within each triangle within the draw call, the fragment processor runs the current shader program whose main purpose is to calculate the color to write out. This may involve dot products between light vectors and interpolated normal vectors for instance, and/or looking up diffuse colors from a 2D texture map via interpolated coordinates. The main output available to a fragment processor is writing a vec4 color (highest precision likely to be 8 bits per color channel). It would then be possible to read this color data back to the CPU via something like glReadPixels() though this call is not intended to be fast - it can force a flush in the graphics pipeline, losing parallelisation benefits.

    If your platform supports pixmap EGL surfaces these may be better, as they are defined as a surface to which the GPU may render, but that the CPU can also access. This should avoid the necessity to use glReadPixels().

    As you may see, there are going to be some hoops to jump through even to attempt this! My worry is that whatever precision you had at the vertex stage will be significantly impacted by the time you have to write it out in an 8-bit color channel. In addition, efforts to unpack parts of numbers into color channels and reconstruct them on the CPU may outweigh any speed benefit from doing the processing on the GPU.

    Using the standard OpenGL-ES driver, I believe there is no other way to get results back out.

    HTH, Pete