This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Help with writing assembly..

Hi,
In my project I'm using LPC2378.
I Connect display to the IO ports. (DLC0283)
This display use a tft driver ILI9341

It has 8 bit data - I Connect it to P1.24-31
It has WR bit and CS bit.
The display data is arranged in 16bit per pixel.

To fill a full screen with data (To clear the screen)
(The color data - 16 bits - lower and upper bytes are not the same)

I create 2 integer that hold the upper & lower part of color (col_M32 & col_L32)

1. Set the frame addr (not shown here)
2. Set the CS bit to 0
3. Write the Most 8 bit of color
4. Set WR bit to 0 - then 1
5. Write the Least 8 bit of color
6. Set WR bit to 0 - then 1
7. Repeat lines 3-6 320*240 times.
8. Set the CS bit to 0

        ILI_CS_0;
        for (row=0;row<TFT_Y;row++){
                for (col=0;col<TFT_X;col++){

                        ILI_DAT_CLR=0xff000000;
                        ILI_DAT_SET=col_M32;
                        ILI_WR_0;
                        ILI_WR_1;

                        ILI_DAT_CLR=0xff000000;
                        ILI_DAT_SET=col_L32;
                        ILI_WR_0;
                        ILI_WR_1;

                }
        }
        ILI_CS_1;

The problem is that it works perfect - but it took about 1 second to finish and the effect of clear the screen is very bad.
In perfect world - the driver should have a command to do that - but thats life.

So - I need this code to run the fastest as it can.

When I look at the assembly code of the above code It seems to me that there are many assemblly commands that can be removed.
But I'm not good in assembly - I think that the above code can be created with less commands in assembly that I can insert in my code.

Here is the assembly part of the above code :

   467:         ILI_CS_0;
0x0003C908  E3A00701  MOV       R0,#0x00040000
0x0003C90C  E3A0190A  MOV       R1,#0x00028000
0x0003C910  E281120E  ADD       R1,R1,#WDMOD(0xE0000000)
0x0003C914  E581001C  STR       R0,[R1,#0x001C]
   468:         for (row=0;row<TFT_Y;row++){
0x0003C918  E3A05000  MOV       R5,#last_energy_valsH(0x00000000)
0x0003C91C  EA000019  B         0x0003C988
   469:                 for (col=0;col<TFT_X;col++){
   470:
0x0003C920  E3A04000  MOV       R4,#last_energy_valsH(0x00000000)
0x0003C924  EA000013  B         0x0003C978
   471:                         ILI_DAT_CLR=0xff000000;
0x0003C928  E3A004FF  MOV       R0,#0xFF000000
0x0003C92C  E3A0190A  MOV       R1,#0x00028000
0x0003C930  E281120E  ADD       R1,R1,#WDMOD(0xE0000000)
0x0003C934  E581001C  STR       R0,[R1,#0x001C]
   472:                         ILI_DAT_SET=col_M32;
0x0003C938  E59F006C  LDR       R0,[PC,#0x006C]
0x0003C93C  E5900000  LDR       R0,[R0]
0x0003C940  E5810014  STR       R0,[R1,#0x0014]
   473:                         ILI_WR_0;
0x0003C944  E3A00702  MOV       R0,#0x00080000
0x0003C948  E581001C  STR       R0,[R1,#0x001C]
   474:                         ILI_WR_1;
   475:
0x0003C94C  E5810014  STR       R0,[R1,#0x0014]
   476:                         ILI_DAT_CLR=0xff000000;
0x0003C950  E3A004FF  MOV       R0,#0xFF000000
0x0003C954  E581001C  STR       R0,[R1,#0x001C]
   477:                         ILI_DAT_SET=col_L32;
0x0003C958  E59F0048  LDR       R0,[PC,#0x0048]
0x0003C95C  E5900000  LDR       R0,[R0]
0x0003C960  E5810014  STR       R0,[R1,#0x0014]
   478:                         ILI_WR_0;
0x0003C964  E3A00702  MOV       R0,#0x00080000
0x0003C968  E581001C  STR       R0,[R1,#0x001C]
   479:                         ILI_WR_1;
   480:
   481:                 }
   483:         }
0x0003C96C  E5810014  STR       R0,[R1,#0x0014]

0x0003C970  E2840001  ADD       R0,R4,#0x00000001
0x0003C974  E3C04801  BIC       R4,R0,#0x00010000
0x0003C978  E3540D05  CMP       R4,#0x00000140
0x0003C97C  BAFFFFE9  BLT       0x0003C928

0x0003C980  E2850001  ADD       R0,R5,#0x00000001
0x0003C984  E3C05801  BIC       R5,R0,#0x00010000
0x0003C988  E35500F0  CMP       R5,#0x000000F0
0x0003C98C  BAFFFFE3  BLT       0x0003C920

   484:         ILI_CS_1;
   485:
   486: #endif
0x0003C990  E3A00701  MOV       R0,#0x00040000
0x0003C994  E3A0190A  MOV       R1,#0x00028000
0x0003C998  E281120E  ADD       R1,R1,#WDMOD(0xE0000000)
0x0003C99C  E5810014  STR       R0,[R1,#0x0014]
   487: }

Any help?
Thanks,
Doron

Parents
  • Hi Per,
    Thanks for your reply,

    One - The time change for the nested loops can be ignored.

    Two - Can't be done in this hardware.

    Three - the LPC23xx force me to first clear and then set pins values, there is no command to change bits. Mask don't help here..

    Four - all my constants are in local variables, and i turn on max optimization (-Ot).

    Five - Good advice - I Will check.

    Six - No - it cant...

    Thanks again,
    Doron

Reply
  • Hi Per,
    Thanks for your reply,

    One - The time change for the nested loops can be ignored.

    Two - Can't be done in this hardware.

    Three - the LPC23xx force me to first clear and then set pins values, there is no command to change bits. Mask don't help here..

    Four - all my constants are in local variables, and i turn on max optimization (-Ot).

    Five - Good advice - I Will check.

    Six - No - it cant...

    Thanks again,
    Doron

Children
  • 1) A nested loop consumes more registers which just might matter.

    2) You have magic hardware that doesn't allow you to duplicate the inner loop block 2 or 4 times - even when you after rewrite of the code has new code that is way smaller than your current code? Sorry - not sure I believe you.

    3) You haven't read the processor user manual. Now would be a very good time to read chapter 10 of the manual. Especially section 5.4 about FIOxPIN registers.

    Important sentences to think about:
    "Writing to the IOPIN register stores the value in the port output register, bypassing the
    need to use both the IOSET and IOCLR registers to obtain the entire written value."

    "Access to a port pin via the FIOPIN register is conditioned by the corresponding bit of the FIOMASK register (see Section 10–5.5 “Fast GPIO port Mask register FIOMASK(FIO[0/1/2/3/4]MASK - 0x3FFF C0[1/3/5/7/9]0)”)."

    4) You haven't shown us what exactly your constants look like. It is always (!) a good idea to supply the full information to let a reader actually be able to compile a problem function without having to guess.

    We especially do not know what ILI_DAT_CLR, ILI_DAT_SET, ILI_WR_0, ILI_WR_1 actually looks like.

    6) No you can't? Care to explain?

    If you have clock bit on same port as the data:

    set_chipsel();
    set_port_mask_for_clock_and_data();
    data1 = (data & 0xff00) << 16;
    data1_b = data1 | clock_bit;
    data2 = (data & 0xff) << 24;
    data2_b = data2 | clock_bit;
    fiopin = addr_fpin_port;
    for (i = 0; i < 320*200; i++) {
        *fiopin = data1;
        *fiopin = data1_b;
        *fiopin = data1;
        *fiopin = data2;
        *fiopin = data2_b;
        *fiopin = data2;
    }
    clear_port_mask();
    clear_chipsel();
    

    Alternatively - note that the LPC23xx also allows 8-bit and 16-bit writes. So you might be able to use FIOxPIN3 to write just the high 8 bits without use of mask. Or use FIOxPINU to access top 16 bits and leave low 16 bits free from mask operation.

    If the clock bit is on other port - or you use 8-bit or 16-bit operation to write the data then you could have an inner loop like:

        *fiopin_1 = data1;
        *fiopin_2 = 1;
        *fiopin_2 = 0;
        *fiopin_1 = data2;
        *fiopin_2 = 1;
        *fiopin_2 = 0;
    


    or
    <per> *fiopin_1 = data1; *fioset = clockbit; *fioclear = clockbit; *fiopin_1 = data2; *fioset = clockbit; *fioclear = clockbit;

  • Damn - last should have been:

        *fiopin_1 = data1;
        *fioset = clockbit;
        *fioclear = clockbit;
        *fiopin_1 = data2;
        *fioset = clockbit;
        *fioclear = clockbit;
    

  • And a footnote - depending on the operation of the clock bit, you might be able to shrink 6 writes into 4 writes inside the loop.

  • Hi Per,
    You open my eyes to many new ideas...
    Let me try all your suggestions - I'm sure it will improve my code,
    Thanks,
    Doron

  • Hi Per,
    Many thanks for all the suggestion - now it looks great
    Doron

  • Nice. How much did you manage to reduce the assembler output of the inner loop?

  • Hi Per,
    The final code reduced to 6 lines - as you recomended - with pointer.
    The WR bit is on the same port

    
                            *io_port=V_M_1;
                            *io_port=V_M_0;
                            *io_port=V_M_1;
    
                            *io_port=V_L_1;
                            *io_port=V_L_0;
                            *io_port=V_L_1;
    


    The result is so fast.
    Thanks,
    Doron