This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why does the CPU perform better than the GPU when blending small areas?

Hello All,

Perhaps this question is due to the nature of all GPU, regardless of the Mali GPU. I would like to know the theoretical reason why the CPU can perform better than the GPU.

Short test environment information:

  • GPU : Mali G-51 MP4 @ 700MHz
  • CPU : Cortex-A53 Quad @ 1.4GHz 
  • Test : 1x1, 4x4, 8x8, 16x16, 32x32, 64x64, 128x128, 256x256 blend operation RGBA32 Format

Test Result:

*MP/s = (Number of operations during a single test run * pixmap width * pixmap height) / (1000 * 1000 * execution time in seconds) 

*Measure time after glFinish() finishes

Area CPU Blending, MP/s GPU Blending, MP/s
1x1 0.02 0.0005
4x4 0.39 0.008
8x8 1.35 0.033
16x16 3.62 0.133
32x32 6.33 0.530
64x64 8.23 2.111
128x128 9.5 8.315
256x256 10.48 31.402

Only GPU with better performance than 256x256 are better and CPUs perform better with smaller sizes.

Is there something wrong I measured it? Is there another problem? Thanks in advance for any answer.

Parents
  • GPUs are designed for solving large data-parallel problems, with millions of threads of execution. Small workloads have two issue which make them a bad-fit for any GPU:

    • Small workloads lack enough parallelism to effectively utilize the hardware; with a 1x1 area the majority of your processing hardware in the GPU will be sitting idle. It's simply not designed for efficient processing of small workloads with a low thread count.
    • There is a lot of overhead in setting up a piece of work to run on the GPU. For a small screen area the processing cost of doing the blend in software will be less than the cost in the driver doing glDrawElements(), even if you ignore the GPU cost completely.
Reply
  • GPUs are designed for solving large data-parallel problems, with millions of threads of execution. Small workloads have two issue which make them a bad-fit for any GPU:

    • Small workloads lack enough parallelism to effectively utilize the hardware; with a 1x1 area the majority of your processing hardware in the GPU will be sitting idle. It's simply not designed for efficient processing of small workloads with a low thread count.
    • There is a lot of overhead in setting up a piece of work to run on the GPU. For a small screen area the processing cost of doing the blend in software will be less than the cost in the driver doing glDrawElements(), even if you ignore the GPU cost completely.
Children
No data