This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

The BBC Micro:bit - 2 totally different computers... how convenient

I have been asked to write an audio driver for the BBC Micro:bit. The problem is that the Mark 1 & Mark 2 versions of this computer are utterly different. To access the full power of these computers, I intend to develop the cost in 100% assembly language. Luckily I have quite a lot of experience in programming the Cortex M0/M0+ thanks to the kind help of Jens Bauer who is a truly great guy and who always produced fragments of truly amazing code e.g. a 32-bit x 32-bit ---> 64-bit routine that takes only 17 instructions and so only 17 cycles. He has provided many other tricks although it has to be admitted that of the 30+ instruction-sets I have programmed in anger (commercial games including audio drivers, area fills & loops (it only knocks off 1 cycle per loop BUT if the loop goes down from 6 to 5 cycles, it's BIG speed up. Anyway, here are the specifications of the two different versions of the Micro::bit

v1:
Nordic nRF51822 (contains 16MHz M0)
16 MHz ARM Cortex-M0 core
256 KB Flash
16 KB RAM


v2:
Nordic nRF52833 (contains 16MHz M0)
64 MHz ARM Cortex-M4 core
512 KB Flash
128 KB RAM

Now, the Cortex M4 supports the Thumb-2 instruction-set which means that it more or less has all of the original 32-bit ARM instruction-set. ARM states that the M0 achieves around 0.9 MIPS/MHz so 14.4 MHz whereas ARM states that the M4 achieves around 1.25 MIPS/MHz so 80 MIPS.so the v2 processes around around 5½ times faster. Those figures don't even deal with the fact that the Thumb-2 instructions reducing the number of instructions used. Both processors have a 3-stage pipeline although it appears that the M4 is able to read 2 x 16-bit instructions in one cycle but the M4 is able to fetch instructions early so that instructions that access memory do not slow down the pipeline.

I have asked the people behind the Micro:bit if they intend to replace all of the v1 machines with the v2 but they have yet to reply.

Now, I for one was impressed by the use of PWM to produce a 4 channel tracker that had audio quality that was similar to 8-bit samples. It would be nice to do something using the audio. I've already reversed engineered the C64 version of SAM (the speech utility provided with the Micro:bit). The quality is pretty poor because SAM was originally developed for the Apple ][ which only had a 1-bit beeper. I don't know exactly how the hardware works but I suspect that when the bit is set to 1, a positive DC voltage is sent to the speaker as long as the bit is 1. When the bit is set to 0, a negative DC voltage is sent through the speaker. The difference is that the Micro:bit is sufficiently powerful to use PWM to increased the perceived bit-depth. I would like to apply PWM to SAM.

I have already identified the table of phonemes stored by SAM and wondered if it would be possible to replace the samples with 2-bit ADPCM format. I found this routine on Github which supports 2-bit ADPCM. Of course, the only problem with this routine is that it converts 2-->16 bit whereas I think 2-->8 is more appropriate ALTHOUGH I am quite willing to be guided by someone more expert than myself (that's most of you). Of course, IF I write the audio player with the v2 hardware in mind, I may well be able to mix more than 4 channels in which case having 16-bit values to combine will be more accurate.

Long ago I found a drum machine for the Commodore 64. The way it worked was quite simple. As people might know, the C64 does not have a specific DAC convertor but writing values between 00 & 0F acts as a DAC but the clever trick here was that the drum machine had 3 tables. One was used when just 1 channel was in use, a second when 2 channels were in use and a third when all 3 channels were in use. I'm sure you can imagine it's contents.

Ideally I would love to support 4 drum channels & 4 'real' sample channels i.e. they support frequency & amplitude with vibrato, glissando, trill and all of those other tricks to allow a limited number of channels to act like a LOT of channels. Of course, this would require the 4 harmonic channels to be decompressed from 2-->16 bits, mixed and then the drum channels added.

I do realise that this is quite a complex task but over the 14 years I was a professional computer games programmer, I got to write an awful lot of music/SFX drivers:

-Commodore C4
- Apple ][
-ZX spectrum (including a single channel of sample sound)
-Sega Master System
-Sega Megadrive (including a single channel of sample sound)
-Sega X32
-Sega Saturn
-Nintendo Entertainment System (1 channel of Δ samples)
-Super Nintendo Entertainment System
-Nintendo Virtual Boy (waveform was 32 6-bit values so rewrites allowed 5 channels of samples. Channel 6 was noise)
-Nintendo 64
-Nintendo Gameboy Color (including a single channel of sample sound)
-Neo Geo Pocket (color)
-Nintendo Gameboy Advance
-PSX (Playstation)
-Nintendo DS (16 sample channels)

I'm not showing off - I learnt the hard way that a different technique is needed for each and every platform. Since these drivers had to use a minimal amount of bus bandwidth so that they didn't slow the games down, many potential techniques were not possible. If I had all of the NES CPU time, I could have mixed and written 7 bit values directly to the DAC. A 2MHz 6502 (well, an A203 which is a 6502 with all of the BCD removed).

In this case I'm intending to write a stand-alone synthesizer. I have experience writing a drum machine because all of the samples are played at a fixed rate so it's merely a case of reading the sample data for each of the 3 channels, mixing them using simple tables and writing to the DAC. The issue is the variable frequency of the harmonic channels (4 I hope) that first need decompressing from 2-bit ADPCM to 16-bit, mixing them, mixing them with the drum tracks and outputting the whole lot to the DAC but in this case, the quality relies on PWM - a methodology that I am not familiar with.

The Micro:bit only has very simple input & output so I am wondering if I will need to add a touch-screen LED screen or an LED screen and a selection of other input methods. A number of vendors are offering  Waveshare and a number of other vendors are offering 1.8" FTF screens (but I cannot find >1.8") and none of them are touch screens. A few simply offer 256 x 16 i.e. two rows of 32 characters. On reflection, I think the TFT screens look the better option.

But that leaves me with the problem of input. I have found just 1 solution:

https://www.youtube.com/watch?v=6EP4AaF8HHE

I don't know if anyone has got one of those 1.8" TFT screens and/or a PS2 keyboard adaptor. If so, I really would appreciate your input.

It would be lovely if the whole setup would work on the Mark 1 hardware but 4 drum channels & 4 harmonic channels is going to use a fair amount of processing power. I do not think 14.4 MIPS of THUMB will be reliably sufficient. It would be dreadful to discover that the whole think works UNLESS the user plays a high frequency on all for of the melody channels! I've run into such bugs - the ones you only find on the last day and for which you have no answers. I think the 80 MIPS within a Thumb 2 instruction-set offers the programmer the chance to optimise and to optimise and to optimise.

I've not coded in Thumb 2 yet but I am keen to explore the cache because it's a mixture of 16 & 32 bit instructions. I PRESUME that this means that if the PC is on a 32-bit boundary & the next 2 instructions are 16-bit, it reads 2 instructions as the same time. Does this then mean that the bus goes unused for a cycle. In short, can DMA be set to a lower priority than the CPU so that it uses up those unused instruction fetches? I have read that some M4 processors come equipped with a 4K mixed cache (16 byte lines) i.e. 256 cache lines. --With that in mind, I will design the mixer to keep the code within the cache. Since the drum patterns are played at a fixed rate, they might benefit from being cached i.e. 16 samples can be loaded at once BUT the melodic channels that may have to skip samples (after 2->16 bit decompression.

Depending on the behavior of the cache, the lookup tables for decodeing the ADPCM only take StepSizeTable (89 x 2 = 178) + StepSizeTable2 (11 x 2 = 22) + IndexTable (16 x 1 = 16) i.e. a total of  14 of the 25666 cache lines. If I place the ADPCM decoder & the drum mixer into the cache, it SHOULD just about fit. I am wondering if I should place some of the resulting 1-bit beeper values in the cache although it it's outside the cache, I can simply DMA the 1-bit beeper value to the appropriate hardware register.

On that front, I have looked at many documents but nowhere does it mention where that 1-bit beeper setting IS. Can someone enlighten me as to it's address.

Now, I have covered a lot of ground. There are a lot of ideas that may/may not work and I'm certainly going to need to listen to you experts. I would just like to make it clear that this is a not-for-profit project and anyone who helps out even to the smallest amount WILL receive an equal credit. I have always believed that the product is the important thing. I am MORE than happy to take advice and however you help, you will not find yourself forgotten.

So, many thanks for reading this, many thanks for your time and effort.

Parents Reply Children
No data