This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 (SIMD) - Fastest way to un-pack 1 (one) uint32 to 4 (four) uint8

Hi to you all,
In my current project I need to send over a serial bus an array of integers:

  • type = unsigned 32 bit integers
  • length = 4096

The driver I'm using (actually USB CDC VCOM from NXP, which is embedded in LPCOpen) takes pointer to unit8 and does a bulk transfer using DMA. I can output long strings, no problem. But of course I need to output data and so 32 bit integers.

Here's the function I'm trying to call:

   /** \fn uint32_t WriteEP(USBD_HANDLE_T hUsb, uint32_t EPNum, uint8_t *pData, uint32_t cnt)
   *  Function to write data to be sent on the requested endpoint.
   *
   *  This function is called by USB stack and the application layer to send data
   *  on the requested endpoint.
   *  
   *  \param[in] hUsb Handle to the USB device stack. 
   *  \param[in] EPNum  Endpoint number as per USB specification. 
   *                    ie. An EP1_IN is represented by 0x81 number.
   *  \param[in] pData Pointer to the data buffer from where data is to be copied. 
   *  \param[in] cnt  Number of bytes to write. 
   *  \return Returns the number of bytes written.
   */
 uint32_t (*WriteEP)(USBD_HANDLE_T hUsb, uint32_t EPNum, uint8_t *pData, uint32_t cnt);

I need to do the parsing in 8 bit tokens as fast as possible because the project as some serious time constraints. Is anyone aware of some SIMD instruction to un-pack data to this purpose?

Any help would be highly appreciated.
Thanks in advance,
Andrea

Parents
  • UXTAB or some clever usage of UXTB16 would be probably the instructions you want - they'll swizzle out 8-bit values from a 32-bit register (or two halfwords in a 32-bit register) into a destination register or two (so you'll need four 32-bit destinations to unpack four bytes from a 32-bit register). Is that kind of what you're going for?

    However I think you'll find that if the byte order in memory (from lowest address to highest) is how you want them output then there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t. If you need to do some reordering of those bytes then REV, REV16 and maybe some ROR usage might be your best bet. There're not really a single SIMD/DSP instruction that will do it for you.

Reply
  • UXTAB or some clever usage of UXTB16 would be probably the instructions you want - they'll swizzle out 8-bit values from a 32-bit register (or two halfwords in a 32-bit register) into a destination register or two (so you'll need four 32-bit destinations to unpack four bytes from a 32-bit register). Is that kind of what you're going for?

    However I think you'll find that if the byte order in memory (from lowest address to highest) is how you want them output then there's no need to do this at all, you can just cast your uint32_t pointer to uint8_t. If you need to do some reordering of those bytes then REV, REV16 and maybe some ROR usage might be your best bet. There're not really a single SIMD/DSP instruction that will do it for you.

Children