Arm Community
Site
Search
User
Site
Search
User
Arm Developer
Documentation
Learning Paths
On-Demand Videos
Groups
Arm Ambassadors
Education Hub
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Automotive forum
Compilers and Libraries forum
Embedded and Microcontrollers forum
Internet of Things (IoT) forum
Keil forum
Laptops and Desktops forum
Mobile, Graphics, and Gaming forum
Morello forum
Operating Systems forum
Servers and Cloud Computing forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI blog
Announcements
Architectures and Processors blog
Automotive blog
Embedded and Microcontrollers blog
Internet of Things (IoT) blog
Laptops and Desktops blog
Mobile, Graphics, and Gaming blog
Operating Systems blog
Servers and Cloud Computing blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Mobile, Graphics, and Gaming forum
Signal Processing with MALI 400 MP
Jump...
Cancel
Locked
Locked
Replies
3 replies
Subscribers
136 subscribers
Views
7629 views
Users
0 members are here
OpenCL
OpenGL ES
Mali-GPU
Mali-400
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Signal Processing with MALI 400 MP
Laurent Ovaert
over 11 years ago
Note: This was originally posted on 16th April 2012 at
http://forums.arm.com
Hi, I would like to offload the main CPU from some heavy brute-force signal processing and use the GPU for that.
For example, multiplying an array of floats by a scalar, or doing multiply-accumulates like a += coef[i] * data[i];
I don't know if this is possible with the MALI 400, if I have to use Open GL ES or the shader or ?
That would be great if I could develop and test all my GPU algorithms on the PC instead of doing it on the device.
I know more and more people are talking about this (with some NVIDIA GPU).
Any information is welcome, because I don't know where to start !
Parents
Pete
over 11 years ago
Note: This was originally posted on 26th April 2012 at
http://forums.arm.com
Hi,
the Mali *400* series was primarily designed as a 3D graphics processor, and essentially you can think of it as two main execution units, the vertex processor and the fragment processor. Each was designed to be efficient at its main purpose - the vertex processor is designed to (amongst other things) multiply vec4 vertex positions by 4x4 transformation matrices. It does this by running the same shader program on every vertex in a draw call. The transformed position and other vectors (e.g. a normal vector, a texture coordinate) are output ready to be interpolated across the surface of a triangle, defined by considering 3 of the vertices together (as specified by the draw call).
Using the OpenGL-ES 2.0 API, you cannot interrupt at this stage and intercept the transformed vertex data.
For every screen fragment contained within each triangle within the draw call, the fragment processor runs the current shader program whose main purpose is to calculate the color to write out. This may involve dot products between light vectors and interpolated normal vectors for instance, and/or looking up diffuse colors from a 2D texture map via interpolated coordinates. The main output available to a fragment processor is writing a vec4 color (highest precision likely to be 8 bits per color channel). It would then be possible to read this color data back to the CPU via something like glReadPixels() though this call is not intended to be fast - it can force a flush in the graphics pipeline, losing parallelisation benefits.
If your platform supports pixmap EGL surfaces these may be better, as they are defined as a surface to which the GPU may render, but that the CPU can also access. This should avoid the necessity to use glReadPixels().
As you may see, there are going to be some hoops to jump through even to attempt this! My worry is that whatever precision you had at the vertex stage will be significantly impacted by the time you have to write it out in an 8-bit color channel. In addition, efforts to unpack parts of numbers into color channels and reconstruct them on the CPU may outweigh any speed benefit from doing the processing on the GPU.
Using the standard OpenGL-ES driver, I believe there is no other way to get results back out.
HTH, Pete
Cancel
Vote up
0
Vote down
Cancel
Reply
Pete
over 11 years ago
Note: This was originally posted on 26th April 2012 at
http://forums.arm.com
Hi,
the Mali *400* series was primarily designed as a 3D graphics processor, and essentially you can think of it as two main execution units, the vertex processor and the fragment processor. Each was designed to be efficient at its main purpose - the vertex processor is designed to (amongst other things) multiply vec4 vertex positions by 4x4 transformation matrices. It does this by running the same shader program on every vertex in a draw call. The transformed position and other vectors (e.g. a normal vector, a texture coordinate) are output ready to be interpolated across the surface of a triangle, defined by considering 3 of the vertices together (as specified by the draw call).
Using the OpenGL-ES 2.0 API, you cannot interrupt at this stage and intercept the transformed vertex data.
For every screen fragment contained within each triangle within the draw call, the fragment processor runs the current shader program whose main purpose is to calculate the color to write out. This may involve dot products between light vectors and interpolated normal vectors for instance, and/or looking up diffuse colors from a 2D texture map via interpolated coordinates. The main output available to a fragment processor is writing a vec4 color (highest precision likely to be 8 bits per color channel). It would then be possible to read this color data back to the CPU via something like glReadPixels() though this call is not intended to be fast - it can force a flush in the graphics pipeline, losing parallelisation benefits.
If your platform supports pixmap EGL surfaces these may be better, as they are defined as a surface to which the GPU may render, but that the CPU can also access. This should avoid the necessity to use glReadPixels().
As you may see, there are going to be some hoops to jump through even to attempt this! My worry is that whatever precision you had at the vertex stage will be significantly impacted by the time you have to write it out in an 8-bit color channel. In addition, efforts to unpack parts of numbers into color channels and reconstruct them on the CPU may outweigh any speed benefit from doing the processing on the GPU.
Using the standard OpenGL-ES driver, I believe there is no other way to get results back out.
HTH, Pete
Cancel
Vote up
0
Vote down
Cancel
Children
No data