We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi, In my project I'm using LPC2378. I Connect display to the IO ports. (DLC0283) This display use a tft driver ILI9341
It has 8 bit data - I Connect it to P1.24-31 It has WR bit and CS bit. The display data is arranged in 16bit per pixel.
To fill a full screen with data (To clear the screen) (The color data - 16 bits - lower and upper bytes are not the same)
I create 2 integer that hold the upper & lower part of color (col_M32 & col_L32)
1. Set the frame addr (not shown here) 2. Set the CS bit to 0 3. Write the Most 8 bit of color 4. Set WR bit to 0 - then 1 5. Write the Least 8 bit of color 6. Set WR bit to 0 - then 1 7. Repeat lines 3-6 320*240 times. 8. Set the CS bit to 0
ILI_CS_0; for (row=0;row<TFT_Y;row++){ for (col=0;col<TFT_X;col++){ ILI_DAT_CLR=0xff000000; ILI_DAT_SET=col_M32; ILI_WR_0; ILI_WR_1; ILI_DAT_CLR=0xff000000; ILI_DAT_SET=col_L32; ILI_WR_0; ILI_WR_1; } } ILI_CS_1;
The problem is that it works perfect - but it took about 1 second to finish and the effect of clear the screen is very bad. In perfect world - the driver should have a command to do that - but thats life.
So - I need this code to run the fastest as it can.
When I look at the assembly code of the above code It seems to me that there are many assemblly commands that can be removed. But I'm not good in assembly - I think that the above code can be created with less commands in assembly that I can insert in my code.
Here is the assembly part of the above code :
467: ILI_CS_0; 0x0003C908 E3A00701 MOV R0,#0x00040000 0x0003C90C E3A0190A MOV R1,#0x00028000 0x0003C910 E281120E ADD R1,R1,#WDMOD(0xE0000000) 0x0003C914 E581001C STR R0,[R1,#0x001C] 468: for (row=0;row<TFT_Y;row++){ 0x0003C918 E3A05000 MOV R5,#last_energy_valsH(0x00000000) 0x0003C91C EA000019 B 0x0003C988 469: for (col=0;col<TFT_X;col++){ 470: 0x0003C920 E3A04000 MOV R4,#last_energy_valsH(0x00000000) 0x0003C924 EA000013 B 0x0003C978 471: ILI_DAT_CLR=0xff000000; 0x0003C928 E3A004FF MOV R0,#0xFF000000 0x0003C92C E3A0190A MOV R1,#0x00028000 0x0003C930 E281120E ADD R1,R1,#WDMOD(0xE0000000) 0x0003C934 E581001C STR R0,[R1,#0x001C] 472: ILI_DAT_SET=col_M32; 0x0003C938 E59F006C LDR R0,[PC,#0x006C] 0x0003C93C E5900000 LDR R0,[R0] 0x0003C940 E5810014 STR R0,[R1,#0x0014] 473: ILI_WR_0; 0x0003C944 E3A00702 MOV R0,#0x00080000 0x0003C948 E581001C STR R0,[R1,#0x001C] 474: ILI_WR_1; 475: 0x0003C94C E5810014 STR R0,[R1,#0x0014] 476: ILI_DAT_CLR=0xff000000; 0x0003C950 E3A004FF MOV R0,#0xFF000000 0x0003C954 E581001C STR R0,[R1,#0x001C] 477: ILI_DAT_SET=col_L32; 0x0003C958 E59F0048 LDR R0,[PC,#0x0048] 0x0003C95C E5900000 LDR R0,[R0] 0x0003C960 E5810014 STR R0,[R1,#0x0014] 478: ILI_WR_0; 0x0003C964 E3A00702 MOV R0,#0x00080000 0x0003C968 E581001C STR R0,[R1,#0x001C] 479: ILI_WR_1; 480: 481: } 483: } 0x0003C96C E5810014 STR R0,[R1,#0x0014] 0x0003C970 E2840001 ADD R0,R4,#0x00000001 0x0003C974 E3C04801 BIC R4,R0,#0x00010000 0x0003C978 E3540D05 CMP R4,#0x00000140 0x0003C97C BAFFFFE9 BLT 0x0003C928 0x0003C980 E2850001 ADD R0,R5,#0x00000001 0x0003C984 E3C05801 BIC R5,R0,#0x00010000 0x0003C988 E35500F0 CMP R5,#0x000000F0 0x0003C98C BAFFFFE3 BLT 0x0003C920 484: ILI_CS_1; 485: 486: #endif 0x0003C990 E3A00701 MOV R0,#0x00040000 0x0003C994 E3A0190A MOV R1,#0x00028000 0x0003C998 E281120E ADD R1,R1,#WDMOD(0xE0000000) 0x0003C99C E5810014 STR R0,[R1,#0x0014] 487: }
Any help? Thanks, Doron
One thing - you have two nested loops when one loop would be enough. Two - loop unrolling 2 or maybe 4 pixels/iteration reduces the cost of the loop. Three - the LPC23xx doesn't require you to first clear and then set pins values - you can do a direct assign of the 8 bits. Also remember that the LPC23xx supports a mask operation that allows a 32-bit wide write to the port registers to only affect your high 8 bits. Four - make sure all your constants are in local variables and turn on max optimization. Five - check what peripherial clock speed you have selected for GPIO. That affects how fast the processor can handle accesses to the GPIO subsystem. Six - check that the LPC23xx port register can be direct-accessed as a pointer. Or consider using an explicit pointer.
Recompile and check the difference.
Hi Per, Thanks for your reply,
One - The time change for the nested loops can be ignored.
Two - Can't be done in this hardware.
Three - the LPC23xx force me to first clear and then set pins values, there is no command to change bits. Mask don't help here..
Four - all my constants are in local variables, and i turn on max optimization (-Ot).
Five - Good advice - I Will check.
Six - No - it cant...
Thanks again, Doron
1) A nested loop consumes more registers which just might matter.
2) You have magic hardware that doesn't allow you to duplicate the inner loop block 2 or 4 times - even when you after rewrite of the code has new code that is way smaller than your current code? Sorry - not sure I believe you.
3) You haven't read the processor user manual. Now would be a very good time to read chapter 10 of the manual. Especially section 5.4 about FIOxPIN registers.
Important sentences to think about: "Writing to the IOPIN register stores the value in the port output register, bypassing the need to use both the IOSET and IOCLR registers to obtain the entire written value."
"Access to a port pin via the FIOPIN register is conditioned by the corresponding bit of the FIOMASK register (see Section 10–5.5 “Fast GPIO port Mask register FIOMASK(FIO[0/1/2/3/4]MASK - 0x3FFF C0[1/3/5/7/9]0)”)."
4) You haven't shown us what exactly your constants look like. It is always (!) a good idea to supply the full information to let a reader actually be able to compile a problem function without having to guess.
We especially do not know what ILI_DAT_CLR, ILI_DAT_SET, ILI_WR_0, ILI_WR_1 actually looks like.
6) No you can't? Care to explain?
If you have clock bit on same port as the data:
set_chipsel(); set_port_mask_for_clock_and_data(); data1 = (data & 0xff00) << 16; data1_b = data1 | clock_bit; data2 = (data & 0xff) << 24; data2_b = data2 | clock_bit; fiopin = addr_fpin_port; for (i = 0; i < 320*200; i++) { *fiopin = data1; *fiopin = data1_b; *fiopin = data1; *fiopin = data2; *fiopin = data2_b; *fiopin = data2; } clear_port_mask(); clear_chipsel();
Alternatively - note that the LPC23xx also allows 8-bit and 16-bit writes. So you might be able to use FIOxPIN3 to write just the high 8 bits without use of mask. Or use FIOxPINU to access top 16 bits and leave low 16 bits free from mask operation.
If the clock bit is on other port - or you use 8-bit or 16-bit operation to write the data then you could have an inner loop like:
*fiopin_1 = data1; *fiopin_2 = 1; *fiopin_2 = 0; *fiopin_1 = data2; *fiopin_2 = 1; *fiopin_2 = 0;
or <per> *fiopin_1 = data1; *fioset = clockbit; *fioclear = clockbit; *fiopin_1 = data2; *fioset = clockbit; *fioclear = clockbit;
Damn - last should have been:
*fiopin_1 = data1; *fioset = clockbit; *fioclear = clockbit; *fiopin_1 = data2; *fioset = clockbit; *fioclear = clockbit;
And a footnote - depending on the operation of the clock bit, you might be able to shrink 6 writes into 4 writes inside the loop.
Hi Per, You open my eyes to many new ideas... Let me try all your suggestions - I'm sure it will improve my code, Thanks, Doron
Hi Per, Many thanks for all the suggestion - now it looks great Doron
Nice. How much did you manage to reduce the assembler output of the inner loop?
Hi Per, The final code reduced to 6 lines - as you recomended - with pointer. The WR bit is on the same port
*io_port=V_M_1; *io_port=V_M_0; *io_port=V_M_1; *io_port=V_L_1; *io_port=V_L_0; *io_port=V_L_1;
The result is so fast. Thanks, Doron