We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello forum,
I am trying to understand the meaning of warp divergence rate metric in streamline for G-715 GPU.
Using the below test case and I was expecting the divergence rate metric to show up around 50% assuming the warp size on the GPU is 16 and local size is set 16, and only 8 threads are executing either if or else block. But the streamline shows Warp divergence of 98% and Full warp rate 100 and Number fragment warps 2 warps. Any insights on why the divergence is close to 100% would be appreciated. (I am launching the test with global size of X=16, Y=1, Z=1)
#version 320 eslayout(std430, binding = 0) buffer OutputBuffer { float data[];};layout(local_size_x = 16, local_size_y = 1, local_size_z = 1) in;void main() { uint threadId = gl_LocalInvocationID.x; if (threadId > 8u) { for(int i=0; i<64; i++) { data[threadId] = data[threadId] + float(threadId) * 2.0; } } else { for(int i=0; i<64; i++) { data[threadId] = data[threadId] + float(threadId) * 5.0; } }}
Warp size is 16 wide.
If only uses 8, so is divergent.
Else uses the other 8, so is divergent.
The if and the else contain a lot of instructions due to the loop, so the divergent code dominates the initial 16-wide non-divergent code that tests thread ID. 98% seems right to me.
*EDIT* Note that divergent counter simply counts the number of instruction issues that have any level of divergence. It does not count the amount of divergence in each instruction issue.