This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

streamline counter - warp divergence rate

Hello forum,

I am trying to understand the meaning of warp divergence rate metric in streamline for G-715 GPU.

Using the below test case and I was expecting the divergence rate metric to show up around 50% assuming the warp size on the GPU is 16 and local size is set 16, and only 8 threads are executing either if or else block. But the streamline shows Warp divergence of 98% and Full warp rate 100 and Number fragment warps 2 warps. Any insights on why the divergence is close to 100% would be appreciated. (I am launching the test with global size of X=16, Y=1, Z=1)

#version 320 es
layout(std430, binding = 0) buffer OutputBuffer { float data[];};
layout(local_size_x = 16, local_size_y = 1, local_size_z = 1) in;
void main() {
uint threadId = gl_LocalInvocationID.x;
if (threadId > 8u) {
   for(int i=0; i<64; i++) {
     data[threadId] = data[threadId] + float(threadId) * 2.0;
   }
} else {
   for(int i=0; i<64; i++) {
    data[threadId] = data[threadId] + float(threadId) * 5.0;
  }
}
}

  • Warp size is 16 wide. 

    If only uses 8, so is divergent.

    Else uses the other 8, so is divergent.

    The if and the else contain a lot of instructions due to the loop, so the divergent code dominates the initial 16-wide non-divergent code that tests thread ID. 98% seems right to me.

  • Warp size is 16 wide. 

    If only uses 8, so is divergent.

    Else uses the other 8, so is divergent.

    The if and the else contain a lot of instructions due to the loop, so the divergent code dominates the initial 16-wide non-divergent code that tests thread ID. 98% seems right to me.

    *EDIT* Note that divergent counter simply counts the number of instruction issues that have any level of divergence. It does not count the amount of divergence in each instruction issue.