Try setting up the timing function inside your program binary and measure a relatively large block of instructions so that the measurements overheads are small relative to the measurement.