This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

1-cycle multiply, 64-bit result,  reciprocal?

Can someone tell me how many extra gates the 1-cycle multiply uses? If there was a 64-bit result, how many more gates would be used? Can these gates also be used to find the reciprocal of a number so instead of divides, the coder multiplies the reciprocal? In the 90s, Nvidia sent a coder to optimize Tomb Raider on their video cards. He explained that they didn't use a Z-buffer but rather a W-buffer which was the reciprocal of the Z. This meant that the draw engine used multiplies rather than divides when calculating texture coordinates, lighting and other vertex controls.. As far as I know, they still do.

MP3 on the M0 (or 2 x M0s) uses a LOT of multiplies for the FFTs. Since the BBC Microbit has a Nordic Semiconductors nRF51822 bluetooth chip. Like the CPU, it's clocked at 48MHz. If the RAM of this CPU could be mapped into the CPU address space, it would be possible to build 1 channel using the Nordic chip & the other using the CPU.

I'm looking to use 16:16 fixed-point by modification of the Minimp3 player with some extra speed/space tradeoff so that the player can go right up to 320Kb/S.

Thanks in advance.

Sean

Parents
  • Have a look with google at some images for carry look ahead adders - which are practical - and Wallace tree multipliers - which aren't but you might see some others which are slower but practical. These work in time proportional to the log of the number of bits so that has to fit into one cycle to do what you want. And whilst one can get a reciprocal using an approximation and some multiplies you'd need yet another circuit to do it much faster..Speeding up that sort of thing uses lots of gates and is left to bigger systems than an M0 core!

    I just looked at WIkipedia about the Microbit and it said the main processor ran at 16Mhz and the Bluetooth one at 48Mhz which seems a bit strange to me. On a quick look on the web at what other MP3 decoders achieve I'd guess you'd have real trouble at 16Mhz doing the highest bitrate so I can see why you want to hack the Bluetooth chip. It may well be possible but really that would be up to you I'm afraid.,

Reply
  • Have a look with google at some images for carry look ahead adders - which are practical - and Wallace tree multipliers - which aren't but you might see some others which are slower but practical. These work in time proportional to the log of the number of bits so that has to fit into one cycle to do what you want. And whilst one can get a reciprocal using an approximation and some multiplies you'd need yet another circuit to do it much faster..Speeding up that sort of thing uses lots of gates and is left to bigger systems than an M0 core!

    I just looked at WIkipedia about the Microbit and it said the main processor ran at 16Mhz and the Bluetooth one at 48Mhz which seems a bit strange to me. On a quick look on the web at what other MP3 decoders achieve I'd guess you'd have real trouble at 16Mhz doing the highest bitrate so I can see why you want to hack the Bluetooth chip. It may well be possible but really that would be up to you I'm afraid.,

Children
  • Yep - if the CPU is 16Mhz while the Bluetooth is 48MHz, I was considering asking for a small amount of the program-RAM for the Bluetooth CPU to do all of those FFT calculations. My other option is not good - specify Sandisk USB sticks and reprogram the program RAM in them to placed the MP3 decode inside the memory stick. Of course, nobody has specified the speed they are running the memory stick. I know that on a PC they run at 100MHz so plenty of power for MP3. If all else fails, ADPCM is simple but not so compact. I really need to find a codec that is designed to output 8-bit data...

    I wrote a multiply for the 8080 in the Gameboy in which I split the numbers into 4-bit blocks and used a lookup table. I wonder if a similar trick would beat the usual 32x32=64 bit software multiply which itself uses 31 cycles IF it has the 1-cycle multiply. Without it, it's about 250 cycles!

  • I believe both of the cores implement the single cycle multiply option but you'd better check that.