This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

I tried to debug the virtualization code on the Neoverse N2 FVP platform, but found that the performance of the fvp was particularly poor and affected the interaction. How can I debug virtualization code efficiently?

I followed the steps in the hyperlink below to start host and guest

docs/infra/common/gicv4_1-vlpi-vsgi.rst · master · Arm Reference Solutions / arm-reference-solutions-docs · GitLab

There are a few problems here:

Firstly,it takes an hour and a half to start host,  then a long time to start guest.

Secondly,copying the kernel source code to host ,need several hours.

Thirdly,in order to make vmlinux and kernel source consistent, I need to update the kernel of host, but I cannot compile and install the kernel, the performance of fvp is too poor.

How can I debug virtualization code efficiently?

  • I've moved your question to the Arm Virtual Hardware forum.

  • I'm sorry that you are experiencing such poor performance, though the nature of such a complex simulation may result so. What host machine are you running on?

    Regards, Ronan

  • cpu info:

    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Address sizes: 46 bits physical, 48 bits virtual
    Byte Order: Little Endian
    CPU(s): 32
    On-line CPU(s) list: 0-31
    Vendor ID: GenuineIntel
    Model name: Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz
    CPU family: 6
    Model: 62
    Thread(s) per core: 2
    Core(s) per socket: 8
    Socket(s): 2
    Stepping: 4
    CPU max MHz: 4000.0000
    CPU min MHz: 1200.0000

    os info:

    Linux version 5.15.0-60-generic (buildd@lcy02-amd64-054) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023

    Best Regards

  • The engineers who develop fvp should have done virtualization testing. I think the performance of rdn2 fvp can support virtualization debugging. After reading the hardware and os information above, do you have any suggestions?

    Best Regards

  • I'm checking with my Arm colleagues, and will let you know when I have any further information.

    How much RAM do you have on the host machine?  We'd recommend 32GB.

    Stephen

  • First of all, thank you very much for your help!
    The RDN2 FVP is deployed on the server. The hardware resources are sufficient, and the memory exceeds 300 GB.

    However, I observed with the htop command that the workload of one core was 100%, while the utilization of other cores was very low

    Best Regards,Mikhail

  • Hi Mikhail

    My colleagues have done some testing, on Ubuntu 12.04 and Debian 11, and in both cases the Arm ecosystem model RDN2 (11.20.18) FVP booted successfully in under 30 mins.  htop showed the load on the host cores was moderate, and was distributed across all the host cores, and used ~11GB host RAM

    I'm puzzled why you are seeing "the workload of one core was 100%, while the utilization of other cores was very low".  That might point to a memory issue.  You said "The RDN2 FVP is deployed on the server. The hardware resources are sufficient, and the memory exceeds 300 GB."
    Do you mean the hard disk memory is 300GB or DDR memory?  Hard disk size isn't so important here, but a minimum of 32GB is recommended for the main DDR memory.

    On your second point (copying the kernel source code to host), are you trying to do a git clone of Linux kernel to build and install on the host machine?  This clone will be limited by the network speeds, and may be slower because the FVP shares the network from host machine over a TAP interface.

    Hope this helps

    Stephen

  • 1. The DDR memory on my server is 397GB, not hard disk memory.

    2. I run the rsync command to copy the linux kernel to host(fvp) from the machine where RDN2 FVP is deployed without going through an external network.

    3. I tested again. During the startup of host kernel and after the completion of the startup of host kernel, only a certain core always has a 100% utilization rate at the same time, while the utilization rate of other cores is very low.

    4. There are two hyperlinks below. Following the operation of the first hyperlink, the performance of fvp can work normally, while following the operation of the second hyperlink, the performance of fvp cannot work normally, which is also the reason why I put forward this question. What is the immediate difference between the two? My guess is that buildroot boot generates more workload than busybox boot

    docs/infra/common/busybox-boot.rst · master · Arm Reference Solutions / arm-reference-solutions-docs · GitLab

    docs/infra/common/gicv4_1-vlpi-vsgi.rst · master · Arm Reference Solutions / arm-reference-solutions-docs · GitLab

  • Hi Stephen

    Thank you for your help. The above is my reply. You can focus on the difference between the two hyperlinks.

    Best Regards,Mikhail

  • Hi Mikhail

    We've investigated further and have some more information for you.

    Fast Models are currently single-threaded for the main simulation.  There are a few utility threads, but most of the simulation happens in a single thread, which is why you are seeing high utilization on a single host core.

    On your second point (copying the kernel source code to host), you should be able to mount the model's filesystem directly from the host, and then directly load e.g. an ext4 image.  There's no need to involve the model in any copying process.

    You asked about the difference between the two GitLab links.
    For the first one, Busybox boot is comparatively light weight boot as the filesystem is RAM based.
    For the second one, GIC testing, a distribution-based boot is used which involves multiple things such as PCIe I/O, Disk I/O, and usage of kernel storage stack.  Booting of distros is quite slow on models because of these reasons. Using a Debian distribution that is lighter in comparison to Ubuntu may help to some extent.

    Hope this helps

    Stephen