Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Google's V8 on ARM: Five Times Better
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Google's V8 on ARM: Five Times Better

Martyn
Martyn
September 11, 2013
7 minute read time.

The modern web is built primarily from three technologies: HTML, CSS and JavaScript. It is JavaScript that drives the interactive web; slow JavaScript means slow web pages. So today, a huge amount of effort is being put into improving the performance of JavaScript, giving us access to powerful web applications, with features from your desktop, but available wherever you are.

Web applications like Gmail, Google Maps and Google Docs use JavaScript extensively, and the user experience is greatly improved on systems with fast, efficient JavaScript engines. In 2008, this motivated Google to create the V8 JavaScript engine project.


V8 is now, on modern benchmarks, the fastest JavaScript engine available. Rather than interpreting JavaScript as the old engines used to do, V8 uses a Just-In-Time compiler to produce and execute native instructions tailored to the processor on which it is running. The generated instructions are cached, avoiding the overhead of repeated code generation, and deleted when no longer needed.

V8 is now the core technology used in a number of important applications. It is the JavaScript engine used in Google's super-fast browser Chrome, and mobile OS Android. It is used in HP's mobile OS, webOS. And it is at the heart of cool new server applications based on the node.js framework.


The web is increasingly mobile. With iPhones, Android phones, tablets and other devices, we can leave our desktops behind. Powerful web applications running on fast JavaScript engines provide the way to cut the ties with your desk, and work or play on the move. It is therefore essential that JavaScript is quick on mobile devices. Which means quick on ARM.


Google's V8 engine is an open source project, driven by the contributions of hundreds of coders. Its development is rapid, with features added and performance improved every day. Over the last year, ARM has been contributing to this effort, helping to make V8 on ARM super fast.

What has ARM added?

ARM has pushed many large and small patches to the V8 project. Here are some of the more interesting changes.

Return stack

The return stack has been a part of ARM processors since the ARM 11. It is a small stack of addresses and ARM/Thumb state information used to accelerate returning from function calls. It works by pushing addresses on to the stack when a function call is recognised, and popping them off again on return from the function. It saves valuable cycles when code calls lots of functions.


However, only certain instructions activate the return stack's push and pop behaviour, and these are listed in the processor's Technical Reference Manual. For example, on Cortex-A9, the following are recognised as calls and returns in ARM and Thumb state:

    • Call
      • BL (immediate)
      • BLX (immediate)
      • BLX (register)
    • Return
      • BX lr
      • MOV pc, lr
      • LDM sp, {... pc}
      • LDR pc, [sp] (any addressing mode)


Only these instructions cause the return stack to be used. ARM's first patch committed to V8 made the call and return instructions consistent, giving a big performance boost.

Floating point


Modern ARM cores provide hardware support for floating point operations in two ways.

  • For scalar calculations, the VFP unit handles single and double precision floating point numbers, with support for operations such as division and square root.
  • For vector calculations, the NEON unit handles single precision floating point numbers (and integers), with support for operations useful in vector processing, such as reciprocal and reciprocal square root.


JavaScript's native numeric type is double precision floating point. So, where V8 can not optimize operations to use integers, the natural choice is to use VFP to speed up calculations. But to do this efficiently, V8 has to support VFP code generation directly, rather than suffer the costly overhead of repeated calls into library code.


ARM has provided a number of patches to broaden the use of VFP in V8, such as adding some of the new features found in VFPv3, and adding support for these new features in V8's built-in ARM simulator.

Bitfields


ARM architecture version 7 introduced new instructions to manipulate bitfields, useful when operating on space-efficient packed data structures.

    • UBFX - Unsigned bitfield extract.
      • Copies a number of consecutive bits from a given position in the source register, and places them into the least-significant bits of the destination register.
    • SBFX - Signed bitfield extract.
      • Like UBFX, but sign-extends the copied bits before writing them to the destination register.
    • BFI - Bitfield insert.
      • Inserts consecutive bits from the source register into a given position in the desitination register.
    • BFC - Bitfield clear.
      • Clears a number of consecutive bits in the destination register.


These operations would previously have been implemented using masking (BIC) and bitwise-or (ORR), so one bitfield instruction can often replace two or three traditional instructions. As less code is required to achieve the same effect, the processor's instruction cache is used more efficiently.

There is a further benefit. Developing a JIT requires balancing the amount and quality of code generated against the time taken to generate that code. Users experience this time as an annoying latency — the delay between loading a web page, and being able to use it. Adding bitfields makes a small contribution towards this, allowing a JIT to generate less code to complete the same operation.

Crankshaft

At the end of 2010, Google introduced a new technology to V8, called Crankshaft.

It consists of a fast and simple compiler, combined with a slower, profile-guided optimizing compiler. We have contributed a number of patches that helped to complete support for Crankshaft on ARM, and in March 2011, Crankshaft became the default code generator in V8. It gives a huge performance boost on many benchmarks.


Crankshaft needs a modern processor, which for ARM means architecture version 7, with VFP support; an ARM Cortex-A class processor is required.

How has V8 improved on ARM?


The many contributions of the V8 coders, including the patches provided by ARM, have resulted in huge performance gains. We have benchmarked the latest development version of the V8 engine on an ARM Cortex-A9 system, and compared the results to those produced by the V8 engine from a year ago. The results are striking.

V8 Benchmark Suite


The V8 benchmark suite (version 6) contains seven benchmarks that are used to tune the V8 engine. These include ray tracing, regular expression, cryptography and OS simulation
tests. On the same hardware, performance has increased by up to 500%.

Sunspider

Other benchmarks tell a similar story. Sunspider,a suite containing a set of very simple operations, is over 50% faster than a year ago.

Sunspider was designed before the creation of modern, high-performance JavaScript engines, and it is often difficult to make performance gains here that are relevant to today's JavaScript-heavy web applications.

Kraken

Kraken is a recent benchmark from Mozilla that focuses on the more iterative tasks that you would encounter in real web applications, using workloads much larger than those present in Sunspider; in terms of execution time, Kraken is approximately 20 times larger than Sunspider.


V8 on ARM has also seen an impressive performance gain here. The benchmark is over four times faster on today's engine, compared to that from a year ago. Crankshaft is particularly important in delivering this result, as it is most suited to the tight, iterative loops seen in the Kraken suite.

Where is V8 on ARM going?


It takes a few months of work for Google to integrate and test the latest V8 engine with new devices, so you will not be able to see these performance improvements appearing in products until the second half of 2011. But the V8 developers continue to increase the speed of the engine, so you can expect even higher performance in 2012.


Further out, you will see the introduction of devices based on the latest ARM core, Cortex-A15, with advanced features that will push the speed of JavaScript to new heights.


ARM will continue to contribute to the V8 project, with both optimizations and support for new processors. However, as V8 is an open source project, good patches are welcome from any interested ARM coders. So, if you want to be part of the evolution of the web in mobile devices, check out the code from the Google repository and start hacking!

Anonymous
Parents
  • Martyn
    Martyn over 12 years ago

    Nice article.It would be interesting to now the percentage increase due to bitfield assembly operations. Can you share the stats ? Is there a library with C API set for gcc toolchain which I can use in driver code?


    The effect of bitfields on V8 is quite small, but it seemed to be a good example of using new ARMv7 features in real-world applications. I don't think GCC provides intrinsics for the bitfield instructions, so if you need to use them directly, assembly language is the way to do it.
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Comment
  • Martyn
    Martyn over 12 years ago

    Nice article.It would be interesting to now the percentage increase due to bitfield assembly operations. Can you share the stats ? Is there a library with C API set for gcc toolchain which I can use in driver code?


    The effect of bitfields on V8 is quite small, but it seemed to be a good example of using new ARMv7 features in real-world applications. I don't think GCC provides intrinsics for the bitfield instructions, so if you need to use them directly, assembly language is the way to do it.
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Children
No Data
Architectures and Processors blog
  • When a barrier does not block: The pitfalls of partial order

    Wathsala Vithanage
    Wathsala Vithanage
    Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
    • September 15, 2025
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025