This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Questions about on-chip Arithmetic capabilities of F132 series

Howdy all,
Recently I've been working on a Silicon Labs C8051F132 and am trying to implement a simple averaging filter using it. Unfortunately the time the cpu takes making the necessary calculations seems excessive, and enabling the on-chip arithmetic gains me no performance improvement. I was wondering if perhaps something else had to be initialized for the MAC to work.

I am using the Keil Compiler, uVision 3, and checking the box under the device settings, and see it add MDU_F120 to the Compiler control string. Unfortunately during the chip's operation from debug mode I can't witness the MAC doing anything. Thank you for any insight you can give me.

As I am new to this line of micro, the compiler, and all this stuff in general heh, please let me know if you need some more information. Thanks!

Parents
  • After thinking this over a bit, I was wondering if I am way off base. Does Keil automatically optimize my code to utilize the enhanced speed the MAC provides? Or, do I need to utilize the SFRs in my code?

    For example in part of my code I have :

    for (s=0; s < DEPTH_DataBuffer; s++)
    {
    dw += *(ch_ptr[gl_nADC_Channel] + s); // Sum the Buffer
    }
    dw = dw / DEPTH_DataBuffer; // Divide by Num of samples

    gl_wADC_OutData[gl_nADC_Channel] = dw; // Store in Output Buffer


    This scans the 16 bit buffer, DataBuffer, and sums the contents. Then after the summation it divides the result by 16. Simply an average filter. Utilizing a pointer into the array of data inputs was recommended as a faster alternative than array indicies.

    Overall this function takes 36us to complete with a 96Mhz clock, and I need to trim it down to under 10us if possible. You can probably tell I'm a little green in my understanding of whether optimizing this is a compiler's, or a programmer's job. Thank you again for any help you can give me.

Reply
  • After thinking this over a bit, I was wondering if I am way off base. Does Keil automatically optimize my code to utilize the enhanced speed the MAC provides? Or, do I need to utilize the SFRs in my code?

    For example in part of my code I have :

    for (s=0; s < DEPTH_DataBuffer; s++)
    {
    dw += *(ch_ptr[gl_nADC_Channel] + s); // Sum the Buffer
    }
    dw = dw / DEPTH_DataBuffer; // Divide by Num of samples

    gl_wADC_OutData[gl_nADC_Channel] = dw; // Store in Output Buffer


    This scans the 16 bit buffer, DataBuffer, and sums the contents. Then after the summation it divides the result by 16. Simply an average filter. Utilizing a pointer into the array of data inputs was recommended as a faster alternative than array indicies.

    Overall this function takes 36us to complete with a 96Mhz clock, and I need to trim it down to under 10us if possible. You can probably tell I'm a little green in my understanding of whether optimizing this is a compiler's, or a programmer's job. Thank you again for any help you can give me.

Children
  • Overall this function takes 36us to complete with a 96Mhz clock, and I need to trim it down to under 10us if possible
    for such dramatic "trim" you will need to go to the assembler. There you can use the MAC where appropiate and not use it where not.

    Erik

  • "36us ... trim (sic?!) down to under 10us"

    I don't think a reduction of 4:1 really counts as a "trim," does it?!
    That's major surgery!

    As Erik says, You're probably going to have to do this in assembler - and it's probably going to be hard work!

  • Actually after working with it all day with people more experienced than myself, here is the deal.

    The CPU is capable of 100 MIPS, or 1 instruction every 10.4ns while running at 96Mhz assuming 1 instruction per cycle. Doing a simple multiple A = WORD * WORD takes 800 or so clock cycles, which is about 8us. I cannot understand why the compiler won't shove this operation into the MAC, which does 16 bit multiplication in 1 clock cycle.

    After speaking with the tech support at Keil they said their compiler should automatically use the MAC when its faster, and had to "look into" the problem. What happens in the "dissasembly" is a few mov and pushs and pops, and a pointer increment, then a call to an external library thats just gigantic.

    Any ideas, or possibly some sample code that DOES use the MAC. Hell I can't seem to get the compiler to use the MAC even when I write functions designed to force its use. Heh. I appreciate the continued feedback!

  • That should read .8 us on the multiply of WORD * WORD. Darn decimals ;P

    That snippet of code above probably won't involve the MAC, seeing as how its primarily an ADD operation. However the rest of the function has several WORD * WORD operations I think the MAC would help a ton on. Just don't know why the compiler won't utilize it seeing as how there is a check box under the debugger options to enable its use. I'll see what the support from Keil can tell me and post back later tomorrow depending on how it goes.

  • Any ideas, or possibly some sample code that DOES use the MAC. Hell I can't seem to get the compiler to use the MAC even when I write functions designed to force its use.
    I do not know what you mean by "force its use", but wroting a 16bit * 16 bit multiply C subroutine in assembler should be a matter of 1/2 hour.

    Erik

  • The code is right there in the datasheet page 171

    The example below implements the equation:
    MOV MAC0CF, #0Ah ; Set to Clear Accumulator, Use fractional numbers
    MOV MAC0AH, #40h ; Load MAC0A register with 4000 hex = 0.5 decimal
    MOV MAC0AL, #00h
    MOV MAC0BH, #20h ; Load MAC0B register with 2000 hex = 0.25 decimal
    MOV MAC0BL, #00h ; This line initiates the first MAC operation
    MOV MAC0BH, #E0h ; Load MAC0B register with E000 hex = -0.25 decimal
    MOV MAC0BL, #00h ; This line initiates the second MAC operation
    NOP
    NOP ; After this instruction, the Accumulator should be equal to 0,
    ; and the MAC0STA register should be 0x04, indicating a zero
    NOP ; After this instruction, the Rounding register is updated
    0.5 0.25 × ( ) 0.5 0.25 – × ( ) + 0.125 0.125 – 0.0 = =
    The example below implements the equation:
    MOV MAC0CF, #01h ; Use integer numbers, and multiply only mode (add to zero)
    MOV MAC0AH, #12h ; Load MAC0A register with 1234 hex = 4660 decimal
    MOV MAC0AL, #34h
    MOV MAC0BH, #FEh ; Load MAC0B register with FEDC hex = -292 decimal
    MOV MAC0BL, #DCh ; This line initiates the Multiply operation
    NOP
    NOP ; After this instruction, the Accumulator should be equal to
    ; FFFFEB3CB0 hex = -1360720 decimal. The MAC0STA register should
    ; be 0x01, indicating a negative result.
    NOP ; After this instruction, the Rounding register is updated

  • just another thought, the f13 chips are fairly new, if Keil has an implementation it may only work if you specify the "equivalent" f12 as your chip

    Erik

  • That code is in assembly though, while using Keil I am programming in C. The benefit of using Keil is supposed to be its ability to implement those routine calls to the Mac without me having to find a way to add assembly code by hand. At no time does the Keil compiler output directly in assembly that I have seen so far. It goes directly from C to a file I burn on the chip. I even spoke with our production group about somehow adding the correct assembly code in post-compilation, and they shook their heads and said there will be a better way, we can't do that. :shrug:

    Seeing as how Keil has the checkbox for implementing the Arithmetic unit, and has libraries that are designed to access the MAC listed on their site that would do assembly routines like the one posted - I'd think everything would be taken care of for me. Perhaps Keil has a bug or erroneous declaration of MAC support on SiLabs F13x line of chips. Perhaps it would be possible to write my own assembly routine that Keil calls from a C function. I'm still pretty new to the ins and outs of Keil, so if you could tell me how to modify the compiled code to manually add in those assembly routines I'd appreciate it.

  • "so if you could tell me how to modify the compiled code to manually add in those assembly routines I'd appreciate it"

    There's a whole section in the manual devoted to interfacing C and assembler. I do agree, though, that if Keil support use of the MAC direct from 'C' it would be worth persevering until you find out how to do it. In general the technical support is very good, don't give up!

  • the easy way would be to make a separate c module with just one non-optimized function:
    unsigned short MACmul (U16 val1, U16 val2)
    {
    return (val1 * val2)
    )

    Then use the generated assembler ("At no time does the Keil compiler output directly in assembly" is incorrect) as a base for the assembly routine, just replace "the guts")

    Erik

  • I'll give it a try. I've only had the team's insight to go by, and of course assumptions about the compiler can cause confusion. I'm sorry if anything I've believed to be true has been in err.

    My main concern was - the design team picked this CPU as a result of it having the capability of fast multiply and register shifting, and made the assumption that Keil would support these routines by default using their compiler. If Keil isn't working properly I hope to let them know, so it can be fixed in a future release. It's only a half hour of coding I suppose, but that one check box could save a lot of people a lot of time!

    As for Keil support..
    This morning I received an email telling me that a solution to my problem is checking the box under the debug options for using the on chip arithmatic accelerator... duh! Thats whats not working lol. Well, hopefully round 2 is better. Thank you again to everyone for the ideas.

  • If Keil isn't working properly I hope to let them know, so it can be fixed in a future release. It's only a half hour of coding I suppose, but that one check box could save a lot of people a lot of time!
    Keil IS "working properly". The Keil toolset is a generic '51 toolset and the MAC is unique to the SILabs f12x and f13x series. Keil has some adaptations to some of the unique features in some chips. Lamblasting Keil for not supporting all features in all 5234 derivatives of the '51 is not reasonable.

    As for Keil support..
    This morning I received an email telling me that a solution to my problem is checking the box under the debug options for using the on chip arithmatic accelerator... duh! Thats whats not working lol. Well, hopefully round 2 is better.

    not surprised. I called about watch of local variables showing up in C, but not in assembler. The answer was "to watch, you must open a watch window"

    Erik

  • "At no time does the Keil compiler output directly in assembly that I have seen so far."

    See it now: http://www.keil.com/support/man/docs/c51/c51_src.htm

    You could also enable the assembler listing in the Listing Options.

    "I even spoke with our production group about somehow adding the correct assembly code in post-compilation, and they shook their heads and said there will be a better way"

    Yes, they're definitely right on that one!

    "Perhaps it would be possible to write my own assembly routine that Keil calls from a C function."

    Easy - see: http://www.keil.com/support/man/docs/c51/c51_ap_ctoasm.htm

  • Keil IS "working properly". The Keil toolset is a generic '51 toolset and the MAC is unique to the SILabs f12x and f13x series. Keil has some adaptations to some of the unique features in some chips. Lamblasting Keil for not supporting all features in all 5234 derivatives of the '51 is not reasonable.

    Again, I hadn't said there was in fact a true problem with the compiler - I first and foremost acknowledged my own ignorance might be to blame. Secondly I've tried contacting them for help, and have been unable to get any information about what those "some adaptations" might be. The compiler never under any situation I've been able to dream up invokes directives to use the MAC. Therfore it is my opinion that something might be amiss. Maybe an option somewhere, or a syntax in my code, or a true bug,,, who knows.



    In resolution to the problem, I've gone ahead and written my own routines to invoke the MAC for one of my functions. It reduced the time necessary for a multiple and divide by more than half. With some additional tweaking I can probably get that down even more as I dig into the assembly.

    The biggest problem will be having to treat the MAC like a peripheral complete with its own module and learning curve, instead of a performance enhancing behind the scenes feature that was hoped for - ie. Click check box, get speed boost :P

    I greatly appreciate everyone's responses and help!!! It is apparent the problem was a combination of mis-perception and assumption. Some people believe the compiler should do more for the user, some people think it does more than enough. The line has to be drawn somewhere I suppose and that's fair enough. Heh, now to go tackle some other "challenges" ;)

  • Some people believe the compiler should do more for the user, some people think it does more than enough. The line has to be drawn somewhere I suppose and that's fair enough

    This is not a question of whether the compiler should "do more for the user", but whether the compiler should adapt to whatever "exotic" feature some derivative might include. I suggest you try the Keil device database
    http://www.keil.com/dd/parm_search.asp and just enter '51, nothing else and you will see how many derivatives Keil must support with the '51 toolset..

    Erik