This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 (SIMD) - Fastest way to un-pack 1 (one) uint32 to 4 (four) uint8

Hi to you all,
In my current project I need to send over a serial bus an array of integers:

  • type = unsigned 32 bit integers
  • length = 4096

The driver I'm using (actually USB CDC VCOM from NXP, which is embedded in LPCOpen) takes pointer to unit8 and does a bulk transfer using DMA. I can output long strings, no problem. But of course I need to output data and so 32 bit integers.

Here's the function I'm trying to call:

   /** \fn uint32_t WriteEP(USBD_HANDLE_T hUsb, uint32_t EPNum, uint8_t *pData, uint32_t cnt)
   *  Function to write data to be sent on the requested endpoint.
   *
   *  This function is called by USB stack and the application layer to send data
   *  on the requested endpoint.
   *  
   *  \param[in] hUsb Handle to the USB device stack. 
   *  \param[in] EPNum  Endpoint number as per USB specification. 
   *                    ie. An EP1_IN is represented by 0x81 number.
   *  \param[in] pData Pointer to the data buffer from where data is to be copied. 
   *  \param[in] cnt  Number of bytes to write. 
   *  \return Returns the number of bytes written.
   */
 uint32_t (*WriteEP)(USBD_HANDLE_T hUsb, uint32_t EPNum, uint8_t *pData, uint32_t cnt);

I need to do the parsing in 8 bit tokens as fast as possible because the project as some serious time constraints. Is anyone aware of some SIMD instruction to un-pack data to this purpose?

Any help would be highly appreciated.
Thanks in advance,
Andrea

Parents
  • Hi , thanks a lot for the reply.
    I looked at the doc for UXTAB:

    UXTAB{cond} {Rd}, Rn, Rm {,rotation}

    This instruction does the following:

    1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.

    2. Extract bits[7:0] from the value obtained.

    3. Zero extend to 32 bits.

    4. Add the value from Rn.

    as far as I understand your suggestion is to use it for the "Rotate" feature, right? But How does this rotation work?
    Let's see how confuse I am:
    say we have the number 0x12-34-56-78 then

    • 8 bit rotation > 0x12-34-78-56
    • 16 bit rotation > 0x56-78-12-34
    • 24 bit rotation > 0x34-56-78-12

    I guess that's not the case...

    " there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t" yes the numbers are in the right order, no need to sort, so really a cast should work this out? I've not the board here (I study and work not in the sample place so sometimes it becomes difficult to test in real-time) so unfortunately I cannot check, but I think I tried a week ago and got some troubles. Anyway, if that's true then my bad, I'll try to fix it as soon as possible. Anyway the UXTAB solution seems intersting.

Reply
  • Hi , thanks a lot for the reply.
    I looked at the doc for UXTAB:

    UXTAB{cond} {Rd}, Rn, Rm {,rotation}

    This instruction does the following:

    1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.

    2. Extract bits[7:0] from the value obtained.

    3. Zero extend to 32 bits.

    4. Add the value from Rn.

    as far as I understand your suggestion is to use it for the "Rotate" feature, right? But How does this rotation work?
    Let's see how confuse I am:
    say we have the number 0x12-34-56-78 then

    • 8 bit rotation > 0x12-34-78-56
    • 16 bit rotation > 0x56-78-12-34
    • 24 bit rotation > 0x34-56-78-12

    I guess that's not the case...

    " there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t" yes the numbers are in the right order, no need to sort, so really a cast should work this out? I've not the board here (I study and work not in the sample place so sometimes it becomes difficult to test in real-time) so unfortunately I cannot check, but I think I tried a week ago and got some troubles. Anyway, if that's true then my bad, I'll try to fix it as soon as possible. Anyway the UXTAB solution seems intersting.

Children