Yesterday we released version 3.10.0 of Valgrind, a GPL'd framework for building simulation-based debugging and profiling tools. 3.10.0 is the first official release to support 64-bit ARMv8. The port is available from http://www.valgrind.org, and the release notes are available at http://www.valgrind.org/docs/manual/dist.news.html.
Porting the framework to the 64-bit ARM instruction set has been relatively straightforward. The main challenge has been the large number of SIMD instructions, with some instructions involving significant arithmetical complexity: saturation, rounding, doubling and lane-width changes. On the whole, the 64-bit instruction set is easier to simulate efficiently than the 32-bit ARMv7 instruction set, as it lacks dynamically conditionalised instructions (a la Thumb) and partial condition code updates, both of which hinder fast simulation. As the port matures I expect it to attain performance comparable with other Valgrind-supported architectures.
Porting the tools based on the framework was almost no effort, because the framework is specifically designed to insulate tools from the details of underlying instruction sets. Currently the following tools work well enough for serious use: Memcheck (memory checking), Helgrind, DRD (thread checking), Cachegrind and Massif (time and space profiling).
Initial development was done using cross-compilation and running on the ARM Foundation model, which proved to be a reliable starting point. Further development was done on an ARM Juno board running a Fedora snapshot. The Juno board made a big difference, as it facilitated building Valgrind "natively" and can build and run regression tests in a reasonable time frame.
We look forward to feedback from developers using the port to debug/profile serious workloads, on the order of millions to tens of millions of lines of C++.