This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

byte Vs Half word Vs Word comparison

Hi Experts,

unsigned int var1_32;
unsigned int var2_32;

unsigned short int var1_16;
unsigned short int var2_16;

unsigned char var1_8;
unsigned char var2_8;

In the above declarations which is faster,

if(var1_32 == var2_32)
{

}

or

if(var1_16 == var2_16)
{

}

or

if(var1_8 == var2_8)
{

}
Parents
  • As a follow-on to Chris' comment about type conversions often coming for free, it's worth pointing out that compilers also know that there is no need to do type conversions for intermediate results.  For example, in the following code:

         extern unsigned char c[4];
    
         unsigned char sum(void)
         {
              return c[0] + c[1] + c[2] + c[3];
         }
    

    ... the compiler may generate code like this for the core of sum():

         ldrb    r1, [r3]
         ldrb    r2, [r3, #2]
         add     r0, r0, r1
         ldrb    r3, [r3, #3]
         add     r0, r0, r2
         add     r0, r0, r3
         uxtb    r0, r0
    

    The upper bits of the intermediate result in r0 contain garbage in the form of overflowed bits, but the compiler knows that this doesn't affect the bits that are important for the result.  Only one truncation is needed, at the end - and that it only needed because the procedure call standard requires the spare bits to be zero when returning a value of type unsigned char.

    If the function is inlined, the compiler doesn't need to follow the procedure call standard for this value and the uxtb will likely disappear.

    On the whole, you should not worry about which types are "more efficient" - the CPU architecture and implementation and the compiler between them will generally do a pretty good job.  Good choice of algorithms and data representation, or using appropriate pre-optimized libraries for your program, have a much bigger impact on performance.  This is the part the compiler can't do for you. Focusing on the code design also keeps your code more portable - important if you want it to perform well on both AArch32 and AArch64 for example.

    It's definitely worth getting into the habit of disassembling the code coming out of the compiler - the optimisations the compiler applies (or fails to apply) can be very surprising, especially at high optimization levels.

Reply
  • As a follow-on to Chris' comment about type conversions often coming for free, it's worth pointing out that compilers also know that there is no need to do type conversions for intermediate results.  For example, in the following code:

         extern unsigned char c[4];
    
         unsigned char sum(void)
         {
              return c[0] + c[1] + c[2] + c[3];
         }
    

    ... the compiler may generate code like this for the core of sum():

         ldrb    r1, [r3]
         ldrb    r2, [r3, #2]
         add     r0, r0, r1
         ldrb    r3, [r3, #3]
         add     r0, r0, r2
         add     r0, r0, r3
         uxtb    r0, r0
    

    The upper bits of the intermediate result in r0 contain garbage in the form of overflowed bits, but the compiler knows that this doesn't affect the bits that are important for the result.  Only one truncation is needed, at the end - and that it only needed because the procedure call standard requires the spare bits to be zero when returning a value of type unsigned char.

    If the function is inlined, the compiler doesn't need to follow the procedure call standard for this value and the uxtb will likely disappear.

    On the whole, you should not worry about which types are "more efficient" - the CPU architecture and implementation and the compiler between them will generally do a pretty good job.  Good choice of algorithms and data representation, or using appropriate pre-optimized libraries for your program, have a much bigger impact on performance.  This is the part the compiler can't do for you. Focusing on the code design also keeps your code more portable - important if you want it to perform well on both AArch32 and AArch64 for example.

    It's definitely worth getting into the habit of disassembling the code coming out of the compiler - the optimisations the compiler applies (or fails to apply) can be very surprising, especially at high optimization levels.

Children
  • Dave's comments are spot on here. In general, the compiler will do a very good job with what you give it. But some thought into the most efficient/appropriate/suitable data types will give it a lot of help.

    One other reason which occurs to me for using "small" containers is the possibility of getting much more value out of SIMD instructions. The NEON architecture (and to a lesser extend the v6 SIMD extensions) are capable of handling a number of individual data items packed into wide vector registers. The smaller the items are, the more of them you can fit into a vector. This can pay huge dividends if used correctly.

    Chris