Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog BOLT instrumentation brings 52% performance uplift for MongoDB on Neoverse N2
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • optimization
  • MongoDB
  • BOLT
  • Server and Infrastructure
  • Neoverse
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

BOLT instrumentation brings 52% performance uplift for MongoDB on Neoverse N2

Bolt Liu
Bolt Liu
June 3, 2024
3 minute read time.

BOLT is a post-link optimization technology which brings performance improvement for various workloads. Previously, BOLT was enabled through CoreSight and perf, which improved performance for some typical workloads. Find out more about BOLT optimization technology in the following blog. However, CoreSight is required to capture branch perf datas, which is not convenient to deploy in the production environment.

BOLT instrumentation is an alternative method which optimizes the executable binary based on the profile data, which is collected by instrumenting and running the binary. Only llvm-bolt utility is required as there is no dependency on CoreSight and perf.

This blog illustrates the steps to enable BOLT instrumentation and benchmark results on MongoDB.

Test environment

Two Alibaba ECS instances are reserved for the benchmark. Client runs the ycsb while the server runs MongoDB. 200G AutoPL ESSD, which has a higher bandwidth, is attached to the server to ensure there is no bottleneck on the drive.

 MongoDB BOLT instrumentation test environment

MongoDB BOLT instrumentation test environment

There are two steps when running ycsb: load and run. This sends 40000000 records and 5000000 operations. Run the following command:

REC_CNT=40000000
OP_CNT=5000000
./bin/ycsb.sh load mongodb -s -P workloads/workloada -p recordcount=$REC_CNT -p operationcount=$OP_CNT -threads 64 -p mongodb.url="mongodb://$1:27017/ali"
./bin/ycsb.sh run mongodb -s -P workloads/workloada -p recordcount=$REC_CNT -p operationcount=$OP_CNT -threads 64 -p mongodb.url="mongodb://$1:27017/ali"

Steps to enable BOLT instrumentation

Build Default MongoDB

  1. Download MongoDB source code and checkout version 7.0.5
  2. Upgrade gcc version to 11.4.0, which is required to build MongoDB 7.0.5
  3. Build mongo (name it as mongod.orig) with the following options:

    python3 buildscripts/scons.py DESTDIR=$WORKSPACE/install/mongo install-mongod \
            CCFLAGS="-fno-reorder-blocks-and-partition -mcpu=native -O3 -w" \
            LINKFLAGS="-Wl,--emit-relocs" --disable-warnings-as-errors

Collect profile data

  1. Build llvm-bolt with version 6841395
  2. Convert mongod.orig to mongod.inst:

llvm-bolt mongod.orig -instrument -o mongod.inst --instrumentation-file=`pwd`/prof.fdata --instrumentation-sleep-time=60

        3. Start mongod.inst and run ycsb to collect profile data. Run the following command:

OP_CNT=5000000

./bin/ycsb.sh load mongodb -s -P workloads/workloada -p operationcount=$OP_CNT -threads 64 -p mongodb.url="mongodb://$1:27017/ali"
./bin/ycsb.sh run mongodb -s -P workloads/workloada -p operationcount=$OP_CNT -threads 64 -p mongodb.url="mongodb://$1:27017/ali"

       4. Stop mongod.inst

Optimize executable

         1. Convert mongod.orig to optimized executable (name it mongod.bolt):

llvm-bolt mongod.orig -o mongod.bolt -data=prof.fdata -reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions=2 -split-all-cold -split-eh -dyno-stats
    

         2. Run mongod.orig and mongod.bolt, and compare the results of them.

Test Results

The benchmark shows that MongoDB improved 58% for INSERT and 52% for READ and UPDATE. Latencies also dropped significantly with BOLT enabled.

INSERT:

metrics Default BOLT enhanced Improvement (%)
Total time 1394331 879745 36.90
throughputs 28687 45467 58.49
INSERT AverageLatency (us) 2211 1390 37.13
INSERT 95th Latency (us) 4103 2739 33.24
INSERT 99th Latency (us) 7679 5595 27.13

READ and UPDATE (with ratio 1:1):

metrics Default BOLT enhanced Improvement (%)
Total time 249593 164264 34.18
throughputs 20032 30438 51.94
READ Average Latency (us) 3146 2051 34.80
READ 95th Latency (us) 7527 6571 12.70
READ 99th Latency (us) 12863 10831 15.79
UPDATE Average Latency (us) 3211 2122 33.91
UPDATE 95th Latency (us) 7659 6771 11.59
UPDATE 99th Latency (us) 13119 11111 15.30

Throughput improvement

The throughput improvement after using BOLT increased by 58% for INSERT and 52% for READ and UPDATE:

Throughput improvement report for BOLT

Throughput improvement report for BOLT

Latency improvement

Latency improvement after using BOLT increased by 37% for INSERT, 35% for READ and 34% for UPDATE average latency:

 Latency improvement report for BOLT

Latency improvement report for BOLT

Perf data

The perf data concludes that L1-icache-misses, branch-misses and iTLB-load-misses dropped significantly. Use the following command to capture perf data:

perf stat -e instructions,L1-icache-misses,branches,branch-misses,iTLB-load,iTLB-load-misses -p `pgrep mongo` -- sleep 60

Perf data report for BOLT

Summary

BOLT instrumentation results in a 52% performance uplift for MongoDB READ and UPDATE tests, whilst latencies have dropped significantly. Moreover, the instrumentation method is easy to deploy as it has no dependency on hardware counters and perf.

Anonymous
Servers and Cloud Computing blog
  • Hands-on with MPAM: Deploying and verifying on Ubuntu

    Howard Zhang
    Howard Zhang
    In this blog post, Howard Zhang walks through how to configure and verify MPAM on Ubuntu Linux.
    • September 24, 2025
  • DPDK scalability analysis on Arm Neoverse V2

    Doug Foster
    Doug Foster
    Deep dive into DPDK performance on Arm Neoverse V2, analyzing system bottlenecks and providing guidance on optimizing performance.
    • September 23, 2025
  • Out-of-band telemetry on Arm Neoverse based servers

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and Insyde advance out-of-band telemetry on Neoverse servers, enabling scalable, real-time datacenter insights via open standards and fleet analytics.
    • September 17, 2025