This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

which kind of compiler optimization can be applied in this code?

soojin over 3 years ago

compiler: linaro-aarch64-2020.09-gcc10.2-linux5.4

optimization option: -O3

CPU: Arm A53 1Ghz

Hello, this is newbie.

code1 is 3.1x slower than code2

- code1: 106 ms

- code2: 34 ms

I think using constant in for-loop is the only(?) difference.

I really wonder why such big performance difference between two code.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void img_bitshift
(
    CAMERA_OPAQUE_t *pstDevInfo,
    int16_t img_width,
    int16_t img_height,
    int16_t bitshift
)
{
    uint16_t *src_img = (uint16_t *) pstDevInfo->some_field.pVirt;
    uint8_t *dst_img = (uint8_t *) pstDevInfo->some_field.pVirt;
    for (int i = 0; i < img_height; i++)
    {
        for (int j = 0; j < img_width; j++)
        {
            uint16_t pixel = src_img[i*img_width + j];
            dst_img[i*img_width + j] = pixel >> bitshift;
        }
    }
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

void img_bitshift
(
    CAMERA_OPAQUE_t *pstDevInfo,
    int16_t img_width,
    int16_t img_height,
    int16_t bitshift
)
{
    uint16_t *src_img = (uint16_t *) pstDevInfo->some_field.pVirt;
    uint8_t *dst_img = (uint8_t *) pstDevInfo->some_field.pVirt;

    for (int i = 0; i < img_height; i++)
    {
        for (int j = 0; j < img_width; j++)
        {
            uint16_t pixel = src_img[i*img_width + j];
            dst_img[i*img_width + j] = pixel >> bitshift;
        }
    }
}

// img_bitshift(_, 12800, 8000, _) took 106 ms

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void dummy
(
    CAMERA_OPAQUE2_t *camerainfo,
    DummyType *dummy
)
{
    int32_t channelIndex = 0;
    for( channelIndex = 0 ; channelIndex < 1 ; channelIndex++ )
    {
        // copy&paste of img_bitshift()
        CAMERA_OPAQUE_t *pstDevInfo = camerainfo->channelDevice;
        uint16_t *src_img = (uint16_t *) pstDevInfo->somefield.pVirt;
        uint8_t *dst_img = (uint8_t *) pstDevInfo->somefield.pVirt;
        
        // NOTE:-----------------------------------------
        // Here, we used constant instead of variable!
        // ----------------------------------------------
        uint16_t img_width = 12800;
        uint16_t img_height = 8000;
        uint16_t bitshift = 8;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

void dummy
(
    CAMERA_OPAQUE2_t *camerainfo,
    DummyType *dummy
)
{
    int32_t channelIndex = 0;

    for( channelIndex = 0 ; channelIndex < 1 ; channelIndex++ )
    {
        // copy&paste of img_bitshift()
        CAMERA_OPAQUE_t *pstDevInfo = camerainfo->channelDevice;
        uint16_t *src_img = (uint16_t *) pstDevInfo->somefield.pVirt;
        uint8_t *dst_img = (uint8_t *) pstDevInfo->somefield.pVirt;
        
        // NOTE:-----------------------------------------
        // Here, we used constant instead of variable!
        // ----------------------------------------------
        uint16_t img_width = 12800;
        uint16_t img_height = 8000;
        uint16_t bitshift = 8;

        for (int i = 0; i < img_height; i++)
        {
            for (int j = 0; j < img_width; j++)
            {
                uint16_t pixel = src_img[i*img_width + j];
                dst_img[i*img_width + j] = pixel >> bitshift;
            }
        } /* end of loop */
    }
}

//line23 ~ line30 took 34 ms.

Thank in advance.

which kind of compiler optimization can be applied in this code?

Top replies