I'm looking to emulate a 6502 on the ARM but I would like to make it cycle accurate so I need some way to interface to an external clock. I can't rely on an internal clock as there are external components that will rely on the external clock as well and the emulator needs to be able to run in lockstep with everything else. One thing to keep in mind is once everything is in working order I will want to increase my clock to several hundred MHz so the mechanism with which I interface with the external clock has to be extremely fast, preferably a pin on the ARM itself that I can poll and sync with. I don't want to utilize interrupts either since the code is performance sensitive. Any ideas on whether this is possible on the ARMv8 such as the A53/A57? From what I understand, the Raspberry Pi 3 uses a low speed protocol for communication and is interrupt based, and for my purposes those two are big no-no's for what I'm doing so that won't work. I'd appreciate some expert advice as this is starting to stretch the limits of what may be possible.
It is going to be difficult to get a software emulated core running at several hundred MHz (I haven't seen 6502 microcontrollers running at that speed anyway, excluding FPGA/ASICs with modern process technologies :-) ). One thing you can consider is to generate a software controlled clock to exteral components.
Most of the ARM processors have an external event input signal (for Cortex-M processors, this is called RXEV - receive event). You can use it together with WFE instruction to provide a wait control. But this WFE instruction enter sleep conditionally and might not fit your application. Potentially there could be other sources generate events, or if pin is not accessible from external. Page 10 of the paper below explained how this could be used in FSM (Finite State Machine) design.
http://www2.keil.com/docs/default-source/default-document-library/software-based-finite-state-machine-(fsm)-with-general-purpose-processorsf4f837f788736c26abc1ff00001d2c02.pdf
I doubt that any SoC has an input which can sample "several hundred MHz" signals. I do not know the limits of AHB or AXI by heart, but what I remember on AHB you have something about 100 or 133MHz.I wonder, did you start to emulate the 6502 already? I doubt you can achieve several hundred MHz of simulated clock.Even a simple instruction like "nop" needs a dozen of ARM instructions. So for simplicity, if you need "only" 10 instructions and running the ARM with 1.5GHz, you cannot be better than 150MHz..Unless you do a JIT compilation of the 6502 code.You might want to go for an FPGA and do the 6502 in HW and only the peripherals in SW.
You may be right, I have emulated it in software and I'm now looking to emulate it bare-metal on an arm core but I'm going in blind without any ARM experience.The A75 will run at ~3ghz, so I'm counting on that boost, plus (hopefully) some cleverness on my part. Jitting (or any form of dynamic translation) will not work due to Apple II programs' ability to self modify and the inability to know what is an instruction and what is data. I'm curious that you mention the NOP instruction however; what makes you think that alone would cost a dozen instructions?
Right, 6502 code is full of "dirty tricks" :-) Wrote myself a lot ..
"the dozen instructions for NOP" comes out of experience. I my-self wrote a 6502 simulator for the Atari Jaguar.
You need to pick the opcode, update PC, call the emulation function, increment cycles. If you do this in C then you quickly get 12 instructions.
I saw one emulator in ARM Thumb-2 on githup (link below) (out of 235 6502 emulators) which needs 9 ARM instruction for one 6502 instruction.
But no special hardware emulated, yet. Which also takes time though can be directed to other cores.
BigEd/a6502: https://github.com/BigEd/a6502
Hi Anthony,
There's nothing that's defined by the Arm Architecture or the implementation of the Cortex-A processors that fits your needs.
Just taking your RPi mention, though, there are two input clocks to the Generic Timer on the system, one is the "APB" clock (which is half the CPU input clock, and varies dependent on that) and the other is the 19.2MHz oscillator. Since the Generic Timer architecture dictates that the clock must tick at a constant rate (although it may update more slowly, it must still keep time at the same fundamental rate) the platform firmware moves the clock to the 19.2MHz oscillator.
Your problem using some input clock and sampling it is interrupt latency. Being cycle accurate to a particular input clock relies on sampling that data, getting an interrupt, and servicing it in a timely manner. There could be anything from tens to thousands of cycles between the event generated by the timer and actually taking an interrupt and executing the handler. The other issue of using clock inputs and sampling data is that the most convenient way of sampling the Generic Timer clock architecturally requires ISB barriers (an interrupt will give you the same effect here) to prevent the timer value from being speculatively read some time ahead of where you'd want to read it in terms of program flow.
Anyway if you need a 100s MHz clock on an RPi just change the core clock, don't switch over to the 19.2MHz clock. You can read CNTPCT_EL0 (or CNTVCT_EL0, preferably) to get the current count value and use CNT_TVAL_EL0 as a downcounter so you know how much time you're spending missing events (if you set it to '19200000' you'll see it count down towards 0 over the course of a second. After that it counts backwards, so after another second it'll reflect -19200000. Obviously a faster input clock will count faster..). This will allow you to set the downcounter the next time to get a constant rate dependent on work, i.e. if you see the value -350000 in TVAL at the point you execute the code you want, then to get a strict interval you can simply add it to the interval you would have used in the first place.. that is actually quite efficient to do.
If you need to change the clock rate then the Generic Timer isn't for you, but most SoCs have other timers in the system IP. They usually have clock roots on high speed PLLs, divided down to sub-100MHz speeds. The one in the RPi, I have no idea what rate it ticks at since I've never bothered to use it - and it has no way of telling you what the input clock speed is. You could measure it with the known 19.2MHz rate of the Generic Timer, though :D
The code for the ARMv8 stub loader is in the raspberrypi github's 'tools' repository, and the counter configuration is relatively easy to see and documents what it is doing. It's also documented in the peripherals datasheet on the RaspberryPi website.
Ta,
Matt