This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Signal Processing with MALI 400 MP

Note: This was originally posted on 16th April 2012 at http://forums.arm.com

Hi, I would like to offload the main CPU from some heavy brute-force signal processing and use the GPU for that.

For example, multiplying an array of floats by a scalar, or doing multiply-accumulates like a += coef[i] * data[i];


I don't know if this is possible with the MALI 400, if I have to use Open GL ES or the shader or ?
That would be great if I could develop and test all my GPU algorithms on the PC instead of doing it on the device.

I know more and more people are talking about this (with some NVIDIA GPU).
Any information is welcome,  because I don't know where to start !
Parents
  • Note: This was originally posted on 26th April 2012 at http://forums.arm.com

    Hi,

    the Mali *400* series was primarily designed as a 3D graphics processor, and essentially you can think of it as two main execution units, the vertex processor and the fragment processor. Each was designed to be efficient at its main purpose - the vertex processor is designed to (amongst other things) multiply vec4 vertex positions by 4x4 transformation matrices. It does this by running the same shader program on every vertex in a draw call. The transformed position and other vectors (e.g. a normal vector, a texture coordinate) are output ready to be interpolated across the surface of a triangle, defined by considering 3 of the vertices together (as specified by the draw call).

    Using the OpenGL-ES 2.0 API, you cannot interrupt at this stage and intercept the transformed vertex data.

    For every screen fragment contained within each triangle within the draw call, the fragment processor runs the current shader program whose main purpose is to calculate the color to write out. This may involve dot products between light vectors and interpolated normal vectors for instance, and/or looking up diffuse colors from a 2D texture map via interpolated coordinates. The main output available to a fragment processor is writing a vec4 color (highest precision likely to be 8 bits per color channel). It would then be possible to read this color data back to the CPU via something like glReadPixels() though this call is not intended to be fast - it can force a flush in the graphics pipeline, losing parallelisation benefits.

    If your platform supports pixmap EGL surfaces these may be better, as they are defined as a surface to which the GPU may render, but that the CPU can also access. This should avoid the necessity to use glReadPixels().

    As you may see, there are going to be some hoops to jump through even to attempt this! My worry is that whatever precision you had at the vertex stage will be significantly impacted by the time you have to write it out in an 8-bit color channel. In addition, efforts to unpack parts of numbers into color channels and reconstruct them on the CPU may outweigh any speed benefit from doing the processing on the GPU.

    Using the standard OpenGL-ES driver, I believe there is no other way to get results back out.

    HTH, Pete
Reply
  • Note: This was originally posted on 26th April 2012 at http://forums.arm.com

    Hi,

    the Mali *400* series was primarily designed as a 3D graphics processor, and essentially you can think of it as two main execution units, the vertex processor and the fragment processor. Each was designed to be efficient at its main purpose - the vertex processor is designed to (amongst other things) multiply vec4 vertex positions by 4x4 transformation matrices. It does this by running the same shader program on every vertex in a draw call. The transformed position and other vectors (e.g. a normal vector, a texture coordinate) are output ready to be interpolated across the surface of a triangle, defined by considering 3 of the vertices together (as specified by the draw call).

    Using the OpenGL-ES 2.0 API, you cannot interrupt at this stage and intercept the transformed vertex data.

    For every screen fragment contained within each triangle within the draw call, the fragment processor runs the current shader program whose main purpose is to calculate the color to write out. This may involve dot products between light vectors and interpolated normal vectors for instance, and/or looking up diffuse colors from a 2D texture map via interpolated coordinates. The main output available to a fragment processor is writing a vec4 color (highest precision likely to be 8 bits per color channel). It would then be possible to read this color data back to the CPU via something like glReadPixels() though this call is not intended to be fast - it can force a flush in the graphics pipeline, losing parallelisation benefits.

    If your platform supports pixmap EGL surfaces these may be better, as they are defined as a surface to which the GPU may render, but that the CPU can also access. This should avoid the necessity to use glReadPixels().

    As you may see, there are going to be some hoops to jump through even to attempt this! My worry is that whatever precision you had at the vertex stage will be significantly impacted by the time you have to write it out in an 8-bit color channel. In addition, efforts to unpack parts of numbers into color channels and reconstruct them on the CPU may outweigh any speed benefit from doing the processing on the GPU.

    Using the standard OpenGL-ES driver, I believe there is no other way to get results back out.

    HTH, Pete
Children
No data