Can the Cortex-M3 stream videos?

I'm (trying to) build a device that can receive and play/stream videos wirelessly to an embedded system. I wanted to know what would be the best course to take? If the ARM Cortex-M3 is a viable option for this, do I need a separate video controller with it? or can I use an alternative solution in the ARM family. (Also I understand that I am targeting a wide range of chips by simply saying Cortex-M3, I would like to know (if anyone else knows) whether I could through such a wide net or does it have to certain models)

Thanks!

Parents
  • Cortex-M3 comes in many colours and shapes.

    Said in a different way: The maximum speed of a Cortex-M3 can be 16 MHz, or it can be 180 MHz, depending on which model you choose.

    If you're to try receiving video and output it to some kind of display, you should know that there are differences between display types as well.

    It also depends on how many pixels you need to output, as well as the colour-depth.

    And finally, it depends on the video format you're receiving.

    It would be a fairly easy task to output a live picture on a 320x240 display in 16-bit colour. However, if you want to handle high resolutions, such as 1024x768@60 Hz, and do a H.264 decoding while you're also receiving data wirelessly, I am very sure that you'll see a lot of chopping.

    Since you're building the device, which is going to send the stream to the microcontroller, you're in charge of the compression format. You should know that the Cortex-M4 has a better chance of decoding video, as it has built-in DSP instructions, and you also get a CMSIS DSP library for free, which you can start using right after unboxing the Cortex-M4.

    In addition, NXP makes a LPC43xx, which is a hybrid Cortex-M4/Cortex-M0, running at 204MHz. The LPC4357 has a built-in LCD-controller, and the large versions (BGA256) can also be interfaced with SDRAM and SRAM at the same time as it interfaces with the LCD display. It supports 100Mbit Ethernet as well, so it might be a good device to try things out on. The LPC43xx also have a very strong feature called State Configurable Timer. It's not just another timer, it's a feature, which enables this device to handle data quicker than usually.

    If crazy solutions is one of your strong points, you might be able to combine the SCT and the DMA, in order to make those two units do their part of decompressing the data while receiving them (Note: This is challenging and requires some skill! - The states of the SCT can be used for data decompression, plus the SCT can control two DMAs at the same time. The DMAs can be set up in a crazy way, so that they can modify eachother's variables, including source and destination addresses, number of bytes to transfer and transfer modes. The DMA can also modify the timer registers, thus they will be able to modify the SCT. Make sure you're not sleepy while making this kind of code, though.)

    You should know that the code that outputs video should most likely be run from the internal SRAM instead of the flash-memory, because it runs much faster from SRAM.

    Two other strong devices are from STMicroelectronics; First the STM32F439. This one runs at up to 180 MHz; it does not have the SCT, but it has another interesting feature called Chrom-ART. This is a 2D graphics accelerator, which can be used for a different kind of decompression.

    Basically it's a feature that copies an image portion of any size from one location to another location; while it's able to also crop the image if necessary.

    In addition to just copying, it can actually blend two images into a third. As it supports indexed colours, this feature too, can also be used in the decompression.

    The STM32F439 has a flash-accelerator called ART-accelerator (Advanced Real-Time accelerator). This makes your code run just as fast from Flash-memory, as it would from SRAM, so you will not have to copy your code to SRAM, before executing it in this case. The STM32F439 is also a Cortex-M4 based device, so it includes the DSP instructions.

    Finally, there's a Cortex-M7, which has not really been released yet (I believe it'll start becoming available at Farnell, RS, DigiKey, Mouser, Avnet and the like in the beginning of next year; perhaps already in January (it's my personal impression; I do not know if this is actually the case).

    The Cortex-M7 is by my estimation approximately 1.65 times faster than the Cortex-M4. STM's Cortex-M7 will run at 200MHz, and decoding plus displaying received video streams should not be a real problem for this microcontroller.

    Now, let's get back to the original question. Will it be possible to use a Cortex-M3 for video streaming ?

    -Yes. But it does depend on many things as mentioned earlier.

    But you can get small 320x240 displays, which are connected to your microcontroller via SPI, for instance. One of NXP's smaller devices would be able to transmit data via an SPI interface (actually the specific implementation is called SSP); the LPC1751 can clock the SSP at 50 MHz, which means you get 5MB per second. This means you will get 5000000 bytes per second. If you are using 16-bit pixels, it will mean you get 2500000 pixels per second.

    Since you have 320x240, your maximum frame-rate of uncompressed video would be 2500000 / 320 / 240 = 32.5 FPS.

    You will of course also have to receive the video stream and decompress it. The microcontroller is running at 100MHz.

    Depending on how complex your video compression format is, you will maybe not be able to get the full 32 FPS, and it might be a bit tricky to make a suitable compression format, because you don't have the luxury of the DSP-instructions from the Cortex-M4.

    Still the LPC1751 has a DMA, and you can do tricks with this as well. (These tricks are not documented anywhere; I speak from personal experience).

    The LPC175x does not have any support for external RAM, so you'd have to make do with maximum 32KB local SRAM. That's not much.

    A better choice would be the LPC1769, it also has support for Ethernet and this device runs at 120MHz. The LPC1788 is also a Cortex-M3 that has a LCD controller, plus this device runs at 120MHz and has a Ethernet support as well. The LPC4088 is more or less identical to the LPC1788, except from that it is a Cortex-M4, so it has the DSP instruction set.

    I am no expert on other brands, perhaps someone else can tell you about features in Texas Instruments, Silicon Labs, Microchip, Freescale, Infineon Technologies, Analog Devices Inc, Nordic Semiconductor, Spansion, Fujitsu Semiconductor, Cypress and all those I've forgotten to mention (if I did forget any, it was not intended).

    But my recommendations are:

    Go for highest possible speeds; 72MHz is probably too low. I suggest 160MHz or more.

    Choose a device with Ethernet

    Consider choosing a device, which has a built-in LCD-controller.

    Look at as many datasheets as you can, and find the benefits of each device, consider whether or not they can be used for video acceleration in one way or the other (decompression, especially).

    If the device you've chosen, also supports I2S, you might want to use this to interface with a low-cost, fairly good quality audio DAC (WM8523 for instance).

    An alternative to using I2S and a DAC, is to use the PWM timers to generate PWM audio (if you're using the LPC series, you should consider using the Motor Control PWM for this feature).

    If you can disclose further details on your requirements, I'll be happy to narrow down your search and give more hints.

    Note: The LCD-controllers allow you to use for instance external SDRAM for video frame buffers. This makes it easier to decompress video, but not necessarily faster. It all depends on the kind of video stream we're dealing with.

Reply
  • Cortex-M3 comes in many colours and shapes.

    Said in a different way: The maximum speed of a Cortex-M3 can be 16 MHz, or it can be 180 MHz, depending on which model you choose.

    If you're to try receiving video and output it to some kind of display, you should know that there are differences between display types as well.

    It also depends on how many pixels you need to output, as well as the colour-depth.

    And finally, it depends on the video format you're receiving.

    It would be a fairly easy task to output a live picture on a 320x240 display in 16-bit colour. However, if you want to handle high resolutions, such as 1024x768@60 Hz, and do a H.264 decoding while you're also receiving data wirelessly, I am very sure that you'll see a lot of chopping.

    Since you're building the device, which is going to send the stream to the microcontroller, you're in charge of the compression format. You should know that the Cortex-M4 has a better chance of decoding video, as it has built-in DSP instructions, and you also get a CMSIS DSP library for free, which you can start using right after unboxing the Cortex-M4.

    In addition, NXP makes a LPC43xx, which is a hybrid Cortex-M4/Cortex-M0, running at 204MHz. The LPC4357 has a built-in LCD-controller, and the large versions (BGA256) can also be interfaced with SDRAM and SRAM at the same time as it interfaces with the LCD display. It supports 100Mbit Ethernet as well, so it might be a good device to try things out on. The LPC43xx also have a very strong feature called State Configurable Timer. It's not just another timer, it's a feature, which enables this device to handle data quicker than usually.

    If crazy solutions is one of your strong points, you might be able to combine the SCT and the DMA, in order to make those two units do their part of decompressing the data while receiving them (Note: This is challenging and requires some skill! - The states of the SCT can be used for data decompression, plus the SCT can control two DMAs at the same time. The DMAs can be set up in a crazy way, so that they can modify eachother's variables, including source and destination addresses, number of bytes to transfer and transfer modes. The DMA can also modify the timer registers, thus they will be able to modify the SCT. Make sure you're not sleepy while making this kind of code, though.)

    You should know that the code that outputs video should most likely be run from the internal SRAM instead of the flash-memory, because it runs much faster from SRAM.

    Two other strong devices are from STMicroelectronics; First the STM32F439. This one runs at up to 180 MHz; it does not have the SCT, but it has another interesting feature called Chrom-ART. This is a 2D graphics accelerator, which can be used for a different kind of decompression.

    Basically it's a feature that copies an image portion of any size from one location to another location; while it's able to also crop the image if necessary.

    In addition to just copying, it can actually blend two images into a third. As it supports indexed colours, this feature too, can also be used in the decompression.

    The STM32F439 has a flash-accelerator called ART-accelerator (Advanced Real-Time accelerator). This makes your code run just as fast from Flash-memory, as it would from SRAM, so you will not have to copy your code to SRAM, before executing it in this case. The STM32F439 is also a Cortex-M4 based device, so it includes the DSP instructions.

    Finally, there's a Cortex-M7, which has not really been released yet (I believe it'll start becoming available at Farnell, RS, DigiKey, Mouser, Avnet and the like in the beginning of next year; perhaps already in January (it's my personal impression; I do not know if this is actually the case).

    The Cortex-M7 is by my estimation approximately 1.65 times faster than the Cortex-M4. STM's Cortex-M7 will run at 200MHz, and decoding plus displaying received video streams should not be a real problem for this microcontroller.

    Now, let's get back to the original question. Will it be possible to use a Cortex-M3 for video streaming ?

    -Yes. But it does depend on many things as mentioned earlier.

    But you can get small 320x240 displays, which are connected to your microcontroller via SPI, for instance. One of NXP's smaller devices would be able to transmit data via an SPI interface (actually the specific implementation is called SSP); the LPC1751 can clock the SSP at 50 MHz, which means you get 5MB per second. This means you will get 5000000 bytes per second. If you are using 16-bit pixels, it will mean you get 2500000 pixels per second.

    Since you have 320x240, your maximum frame-rate of uncompressed video would be 2500000 / 320 / 240 = 32.5 FPS.

    You will of course also have to receive the video stream and decompress it. The microcontroller is running at 100MHz.

    Depending on how complex your video compression format is, you will maybe not be able to get the full 32 FPS, and it might be a bit tricky to make a suitable compression format, because you don't have the luxury of the DSP-instructions from the Cortex-M4.

    Still the LPC1751 has a DMA, and you can do tricks with this as well. (These tricks are not documented anywhere; I speak from personal experience).

    The LPC175x does not have any support for external RAM, so you'd have to make do with maximum 32KB local SRAM. That's not much.

    A better choice would be the LPC1769, it also has support for Ethernet and this device runs at 120MHz. The LPC1788 is also a Cortex-M3 that has a LCD controller, plus this device runs at 120MHz and has a Ethernet support as well. The LPC4088 is more or less identical to the LPC1788, except from that it is a Cortex-M4, so it has the DSP instruction set.

    I am no expert on other brands, perhaps someone else can tell you about features in Texas Instruments, Silicon Labs, Microchip, Freescale, Infineon Technologies, Analog Devices Inc, Nordic Semiconductor, Spansion, Fujitsu Semiconductor, Cypress and all those I've forgotten to mention (if I did forget any, it was not intended).

    But my recommendations are:

    Go for highest possible speeds; 72MHz is probably too low. I suggest 160MHz or more.

    Choose a device with Ethernet

    Consider choosing a device, which has a built-in LCD-controller.

    Look at as many datasheets as you can, and find the benefits of each device, consider whether or not they can be used for video acceleration in one way or the other (decompression, especially).

    If the device you've chosen, also supports I2S, you might want to use this to interface with a low-cost, fairly good quality audio DAC (WM8523 for instance).

    An alternative to using I2S and a DAC, is to use the PWM timers to generate PWM audio (if you're using the LPC series, you should consider using the Motor Control PWM for this feature).

    If you can disclose further details on your requirements, I'll be happy to narrow down your search and give more hints.

    Note: The LCD-controllers allow you to use for instance external SDRAM for video frame buffers. This makes it easier to decompress video, but not necessarily faster. It all depends on the kind of video stream we're dealing with.

Children
No data