We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
compiler: linaro-aarch64-2020.09-gcc10.2-linux5.4
optimization option: -O3
CPU: Arm A53 1Ghz
Hello, this is newbie.
code1 is 3.1x slower than code2
- code1: 106 ms
- code2: 34 ms
I think using constant in for-loop is the only(?) difference.
I really wonder why such big performance difference between two code.
<code 1: img_bitshift function>
void img_bitshift ( CAMERA_OPAQUE_t *pstDevInfo, int16_t img_width, int16_t img_height, int16_t bitshift ) { uint16_t *src_img = (uint16_t *) pstDevInfo->some_field.pVirt; uint8_t *dst_img = (uint8_t *) pstDevInfo->some_field.pVirt; for (int i = 0; i < img_height; i++) { for (int j = 0; j < img_width; j++) { uint16_t pixel = src_img[i*img_width + j]; dst_img[i*img_width + j] = pixel >> bitshift; } } } // img_bitshift(_, 12800, 8000, _) took 106 ms
<code 2: copy and paste of img_bitshift function>
void dummy ( CAMERA_OPAQUE2_t *camerainfo, DummyType *dummy ) { int32_t channelIndex = 0; for( channelIndex = 0 ; channelIndex < 1 ; channelIndex++ ) { // copy&paste of img_bitshift() CAMERA_OPAQUE_t *pstDevInfo = camerainfo->channelDevice; uint16_t *src_img = (uint16_t *) pstDevInfo->somefield.pVirt; uint8_t *dst_img = (uint8_t *) pstDevInfo->somefield.pVirt; // NOTE:----------------------------------------- // Here, we used constant instead of variable! // ---------------------------------------------- uint16_t img_width = 12800; uint16_t img_height = 8000; uint16_t bitshift = 8; for (int i = 0; i < img_height; i++) { for (int j = 0; j < img_width; j++) { uint16_t pixel = src_img[i*img_width + j]; dst_img[i*img_width + j] = pixel >> bitshift; } } /* end of loop */ } } //line23 ~ line30 took 34 ms.
Thank in advance.
Hello, to understand this it is best to look at the outputted disassembly of each example.
I suspect the latter is able to make (better) use of Neon instructions to vectorize the algorithmhttps://developer.arm.com/architectures/instruction-sets/simd-isas/neon