Can we convert HEX code to C code? Is there any tools avaliable?
with thanks, Karthik
"There are no tools to convert a hex file to C."
There actually are tools to produce C from machine code.
Not readable C. But a C that can be compiled on another processor. You will get the logic of the original program in a totally impossible to read C file. And it will compile into a way larger and slower application. But still possible to run on a different architecture at a higher speed than what a normal simulator can manage. This is quite similar to the JIT compilation that is used in some simulators, where the simulator performs an on-the-fly analysis of the program run in the simulated environment and then automatically replaces some of the simulated code with auto-generated native functions.
Since C does not have a natural way to handle carry, parity, zero etc, a number of trivial constructs in the machine code will have to be translated into function calls in the generated C gibberish.
Methods to do a machine-controlled move of a binary from one architecture to another are important.
Trying to go from machine code to C just to read the logic is on the other hand futile. Even the original developer will fail to read the recreated code.
"Trying to go from machine code to C just to read the logic is on the other hand futile. Even the original developer will fail to read the recreated code."
Please be patient.
I'm working on it.
So far, I have managed to read the hex file into memory and determined how big the original file was. (I know, sounds quite tough, but I'm a professional.)
Not much more now.
Still on target to have it finished by tomorrow morning.
What is the point of the conversion? Would you try to maintain and expand on it? I would imagine if you recompiled it it may not work the same. things would be bigger or smaller, faster or slower. I once tried to make a change to C code written by an ASM programmer. I had the source, but not the flow chart. I knew what to change, but could not find the right place among the jump target labels. If you were going to simulate it you could simulate the whole CPU. for a quick tweak you could just try ASM.
Other than the "science project" of it what would be a real use for it? It seems like the hamburger cow, interesting, but now what do you do with it?
"...to C code written by an ASM programmer."
Being a programmer of embedded systems for more years than I care to mention, I find it strange that a distinction is made.
I would not employ someone for the position of embedded programmer if they wanted to classify themself as specifically a C programmer or specifically an ASM programmer.
"Other than the "science project" of it what would be a real use for it? It seems like the hamburger cow, interesting, but now what do you do with it?"
I have seen references to this concept outside the "science project" world, as an alternative to use of a free-standing simulator of the original processor for use on many different target architectures. The conversion basically results in a simulator being integrated with the found machine code.
One problem with simulating a processor is that a very large percentage of the time is spent always making sure that the simulator produces identical processor flags since the simulator does not know if any of the following instructions will care about the carry, parity, ... By flow-analyzing the machine instructions, you can identify flags that are not relevant and in some situations perform a native a+b instead of a simulate_add(a,b).
As I mentioned, this concept does exist in the form of just-in-time compilation of partial code blocks in a number of simulators. This is for example how better x86 simulators manages to get a reasonable speed when running Win32 applications on non-x86 processor.
But a simulator with just-in-time compilation has one big portability problem, since such a simulator is bound to both host and target. It must know the instructions it is simulating, and it must know the instructions it should generate.
If analyzing and generating a sequence of C source lines instead, can produce an output file that can be compiled and run on more than one host machine. You can get an application that runs significantly faster than if 100% simulated. That the binary produced from compiling the generated C code and linking with the support library will be significantly larger than the original binary doesn't often matter since you are probably moving the application to a way bigger machine.
But the big thing to note, is that the produced C code is not the C code you or I could use as a basis for maintaining the original application. It is just low-level "intermediate" data. A huge number of goto and a very small (possibly zero) amount of detected higher-end constructs such as loop, switch statements etc.
Converting assembler to a "real" C program is still research class, and a type of problem that shouldn't be spent time on other than possibly as a stepping stone (input data) for the general development of new pattern-matching and search algorithms.
Ok, this is a little more tricky than the OP might have though.
But I can now read in a hex file and produce an output of:
int main(int argc, char *argv[]) { return(0); }
I've done the complicated part, but I'll have to bow out due to me having too many other commitments.
Anyone want to take the project over?