CMSIS confusion

Note: This was originally posted on 27th October 2012 at http://forums.arm.com

Hello all
I have few questions about CMSIS. I have reviewed some libs with basic functions and there is no loops for rest of the samples (Cortex M3/4), for example (Abs function q31):

/*loop Unrolling */
  blkCnt = blockSize >> 2u;

  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.  
   ** a second loop below computes the remaining 1 to 3 samples. */
  while(blkCnt > 0u)
  {
    /* C = |A| */
    /* Calculate absolute of input (if -1 then saturated to 0x7fffffff) and then store the results in the destination buffer. */
    in = *pSrc++;
    *pDst++ = (in > 0) ? in : ((in == 0x80000000) ? 0x7fffffff : -in);
    in = *pSrc++;
    *pDst++ = (in > 0) ? in : ((in == 0x80000000) ? 0x7fffffff : -in);
    in = *pSrc++;
    *pDst++ = (in > 0) ? in : ((in == 0x80000000) ? 0x7fffffff : -in);
    in = *pSrc++;
    *pDst++ = (in > 0) ? in : ((in == 0x80000000) ? 0x7fffffff : -in);

    /* Decrement the loop counter */
    blkCnt--;
  }

  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.  
   ** No loop unrolling is used. */
  blkCnt = blockSize % 0x4u;



I think there should be a loop for remaining samples. Can you explain why there is 4 operations in one loop iteration? Is it related with time optimization?

Best Regards
More questions in this forum