I use Keil sometimes and while using a timer in autoload mode, the TH0 is loaded with the calculated value but TL0 starts incrementing from 00 instead of the calculated value F0.Am I missing something while using or configuring the keil simulator?
"I guess the 8-bit devices incur a large penalty in processing wider data types."
A strict 8-bit processor shouldn't not need any penalty since a decent compiler should manage to detect that the operation is four 8-bit reads and four 8-bit writes. The cost should mainly be the ability to work with pointers.
With a bad compiler and no barrel shifter, the cost can be tremendous, but it is a disgrace if the compiler does not contain logic for detecting shifts resulting in 8-bit aligned reads and writes.
"but it is a disgrace if the compiler does not contain logic for detecting shifts resulting in 8-bit aligned reads and writes."
the pic24f family (probably the h and e families too) has a shifter block that can do up to 15 bit shifts in hardware.
as a result, the shift approach on those chips are only 4x longer than the union approach.
but the fastest really is to use a (double) pointer to point to the unsigned long / unsigned char array representation of a double: that takes about 1/4 of the time of the union approach.