Cortex M4 (SIMD) - Fastest way to un-pack 1 (one) uint32 to 4 (four) uint8

Hi to you all,
In my current project I need to send over a serial bus an array of integers:

  • type = unsigned 32 bit integers
  • length = 4096

The driver I'm using (actually USB CDC VCOM from NXP, which is embedded in LPCOpen) takes pointer to unit8 and does a bulk transfer using DMA. I can output long strings, no problem. But of course I need to output data and so 32 bit integers.

Here's the function I'm trying to call:

   /** \fn uint32_t WriteEP(USBD_HANDLE_T hUsb, uint32_t EPNum, uint8_t *pData, uint32_t cnt)
   *  Function to write data to be sent on the requested endpoint.
   *
   *  This function is called by USB stack and the application layer to send data
   *  on the requested endpoint.
   *  
   *  \param[in] hUsb Handle to the USB device stack. 
   *  \param[in] EPNum  Endpoint number as per USB specification. 
   *                    ie. An EP1_IN is represented by 0x81 number.
   *  \param[in] pData Pointer to the data buffer from where data is to be copied. 
   *  \param[in] cnt  Number of bytes to write. 
   *  \return Returns the number of bytes written.
   */
 uint32_t (*WriteEP)(USBD_HANDLE_T hUsb, uint32_t EPNum, uint8_t *pData, uint32_t cnt);

I need to do the parsing in 8 bit tokens as fast as possible because the project as some serious time constraints. Is anyone aware of some SIMD instruction to un-pack data to this purpose?

Any help would be highly appreciated.
Thanks in advance,
Andrea

Parents
  • UXTAB or some clever usage of UXTB16 would be probably the instructions you want - they'll swizzle out 8-bit values from a 32-bit register (or two halfwords in a 32-bit register) into a destination register or two (so you'll need four 32-bit destinations to unpack four bytes from a 32-bit register). Is that kind of what you're going for?

    However I think you'll find that if the byte order in memory (from lowest address to highest) is how you want them output then there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t. If you need to do some reordering of those bytes then REV, REV16 and maybe some ROR usage might be your best bet. There're not really a single SIMD/DSP instruction that will do it for you.

  • Hi , thanks a lot for the reply.
    I looked at the doc for UXTAB:

    UXTAB{cond} {Rd}, Rn, Rm {,rotation}

    This instruction does the following:

    1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.

    2. Extract bits[7:0] from the value obtained.

    3. Zero extend to 32 bits.

    4. Add the value from Rn.

    as far as I understand your suggestion is to use it for the "Rotate" feature, right? But How does this rotation work?
    Let's see how confuse I am:
    say we have the number 0x12-34-56-78 then

    • 8 bit rotation > 0x12-34-78-56
    • 16 bit rotation > 0x56-78-12-34
    • 24 bit rotation > 0x34-56-78-12

    I guess that's not the case...

    " there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t" yes the numbers are in the right order, no need to sort, so really a cast should work this out? I've not the board here (I study and work not in the sample place so sometimes it becomes difficult to test in real-time) so unfortunately I cannot check, but I think I tried a week ago and got some troubles. Anyway, if that's true then my bad, I'll try to fix it as soon as possible. Anyway the UXTAB solution seems intersting.

Reply
  • Hi , thanks a lot for the reply.
    I looked at the doc for UXTAB:

    UXTAB{cond} {Rd}, Rn, Rm {,rotation}

    This instruction does the following:

    1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.

    2. Extract bits[7:0] from the value obtained.

    3. Zero extend to 32 bits.

    4. Add the value from Rn.

    as far as I understand your suggestion is to use it for the "Rotate" feature, right? But How does this rotation work?
    Let's see how confuse I am:
    say we have the number 0x12-34-56-78 then

    • 8 bit rotation > 0x12-34-78-56
    • 16 bit rotation > 0x56-78-12-34
    • 24 bit rotation > 0x34-56-78-12

    I guess that's not the case...

    " there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t" yes the numbers are in the right order, no need to sort, so really a cast should work this out? I've not the board here (I study and work not in the sample place so sometimes it becomes difficult to test in real-time) so unfortunately I cannot check, but I think I tried a week ago and got some troubles. Anyway, if that's true then my bad, I'll try to fix it as soon as possible. Anyway the UXTAB solution seems intersting.

Children
More questions in this forum