Mobile, Graphics, and Gaming forum Mali's texture is poor than buffer, why？

State Suggested Answer
Locked Locked
Replies 11 replies
Answers 1 answer
Subscribers 137 subscribers
Views 9077 views
Users 0 members are here

Options

Related

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali's texture is poor than buffer, why？

Shaquille.Wu over 2 years ago

I tested the performance of mali's texture(cl_image), I found it is poor than buffer(cl_mem).

my GPU is mali G76

I think the texture should be better than buffer, such as: bilinear.

but, my test tell me G76's texture is poor than buffer about 10%-20%. my test format is RGBA

I don't know why?

is there anyone would like to tell me the secret?

or, is there any standard benchmark program?

Top replies

Parents

0 Shaquille.Wu over 2 years ago in reply to Peter Harris
Hi Peter, thanks for your reply at first

I think I cannot understand your analyzation correctly:

1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low, it is abnormal. is it right?

furthermore, what are the normal number？

2. I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

3. I cannot understand these two performance couters: Non-fragment tasks(unit: tasks) and Non-fragment jobs(unit: jobs)

buf_style: Non-fragment tasks(unit: tasks): 562500 Non-fragment jobs(unit: jobs): 300 texture_style: Non-fragment tasks(unit: tasks): 562500 Non-fragment jobs(unit: jobs): 400

1). how to caculate them if my program is OpenCL?

2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

buf_style: Non-fragment jobs(unit: jobs): 300 texture_style: Non-fragment jobs(unit: jobs): 400

I run the both tests for 100 times, so, the reports are 300 and 400. why not 100?
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Shaquille.Wu over 2 years ago in reply to Peter Harris
Hi Peter, thanks for your reply at first

I think I cannot understand your analyzation correctly:

1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low, it is abnormal. is it right?

furthermore, what are the normal number？

2. I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

3. I cannot understand these two performance couters: Non-fragment tasks(unit: tasks) and Non-fragment jobs(unit: jobs)

buf_style: Non-fragment tasks(unit: tasks): 562500 Non-fragment jobs(unit: jobs): 300 texture_style: Non-fragment tasks(unit: tasks): 562500 Non-fragment jobs(unit: jobs): 400

1). how to caculate them if my program is OpenCL?

2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

buf_style: Non-fragment jobs(unit: jobs): 300 texture_style: Non-fragment jobs(unit: jobs): 400

I run the both tests for 100 times, so, the reports are 300 and 400. why not 100?
Cancel
Vote up 0 Vote down

Cancel

Children

0 Peter Harris over 2 years ago in reply to Shaquille.Wu

Shaquille.Wu said:
1. you mean, the two performance counters which you prompt should be quite high, but, my report is very low,

Your bytes per-access value is high, so inline with expectations for a downscale.

Shaquille.Wu said:
. I cannot understand the "access pattern". I think I cannot specify "access pattern" in OpenCL, would you like to explain it furthermore?

Correct, you can't control it. But buffers and textures may have a different memory layouts, and so have different access patterns.

Shaquille.Wu said:
1). how to caculate them if my program is OpenCL?

You can't.

You will get at least one Job per compute dispatch, but may get more as the driver generates small jobs for some management activities.

Tasks are somewhat meaningless to an application developer. For compute workloads a task is some multiple of the workgroup size, but the exact scaling is chosen by the driver and depends on the system configuration.

Shaquille.Wu said:
2). I only run one kernel both in buf_style program and texture_style program, but, the Non-fragment jobs in reports are 3 and 4, instead of 1

As above, will get get at least one Job per fragment workload, but may get more.
Cancel
Vote up +1 Vote down

Cancel