This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Some question about Arm Embedded Evaluation Kit

Ask the Arm ML Embedded Evaluation Kit(review.mlplatform.org/.../documentation.md

1.In the "Memory considerations" section of this article, it is mentioned that there are three memory modes that can be set. May I ask why the Dedicated_Sram mode can only be used on Ethos-U65?

Is it because of some hardware limitation that it can only be used on U65?

2.Why does activate_buf_sz in use_case.cmake refer to different memory when Shared_Sram and Dedicated_Sram are used? Under Shared_Sram, it refers to the size of SRAM, and under Dedicated_Sram, it refers to the size of DRAM?

The activate buffer I know is to put the tensor arena and should be placed in the cache. Not sure why the size of this value for different modes refers to different things.

 (Dedicatd_Sram_mode) (Shared_Sram_mode)

3.I have tried putting a model (FSRCNN, github.com/.../FSRCNN_Tensorflow)
Convert it to tflite, and use the Vela compiler to convert tflite into an optimized model and put it on the FVP of Ethos-U65 to simulate, and encountered a problem.
"tensor allocation failed!"

This situation was encountered before because the value of activate_buf_sz in use_cased.cmake was set too small. At this time, we need to check the memory of the repoert of Vela comilper to adjust its size.

I am using dedicated Sram mode this time, so I have to check the DRAM used value of Memory uesd to adjust.
But this time the situation is different. The value I adjusted has exceeded the required value.

Still tensor allocation failed!
Would like to ask is there any reason why he failed?

 (vela report)

 (activate_buf_sz in use_cased.cmake)

Thanks in advance for your answers

  • 1.  Dedicated SRAM is a memory mode where tensor arena & model live in DRAM. SRAM is only used as a cache. Now, U55 is not designed to use DRAM as it's memory interface will give a lower bandwidth than a U65. So Dedicated SRAM memory mode is specific to U65

    2. activate_buf_sz represents the size of the Arena. As you rightly said which means SRAM for U55 and DRAM for U65. 

    for the dedicated SRAM mode in u65 both the model and area fall in DRAM but then there is a spilling of arena in SRAM ( arena gets cached in SRAM).

    In Eval kit it's given via  get_cache_arena_size() for a static cache_arena

    We recently merged changes https://review.mlplatform.org/c/ml/ethos-u/ml-embedded-evaluation-kit/+/7227 in which we make this configurable with  `ETHOS_U_NPU_CACHE_SIZE`that represent the arena cache from the command line and avoid to manually change the value in the eval kit. As said this parameter has effect only when building for U65.

    3. To answer this, I need to check this issue in more detail like a version of FVP, arena_cache_size given in Vela.ini, full command, and compiler used? Currently, the arena_cache_size is currently fixed as 2MB as per the default 2MB SRAM size in FVP, so you need to allocate the memory accordingly. you can reach out to us at support-ml <support-ml@arm.com> if you need any specific investigation in your use case. 

     

  • Very much like your answer

    I understand the first & second question

    But I have a problem.

    DRAM is used in AXI1 port in Dedicated Sram mode ,but this port can only read.

    As shown below

     (Arm Ethos-U55 NPU Technical Reference Manual r2p0)

    So why can I put tensor arena in Dram?

     Difference between Memory Arena and Tensor Arena (arm.com)

    Doesn't the tensor arena contain the intermediate array?

    Intermediate array is an intermediate value that is only generated when inference is in progress, shouldn't it be possible to write it out?

    So why can it be placed in Dram?

    Tensor arena is written to Dram through Cache?

    FVP Vesrion:

    Corstone_SSE-300_Ethos-U65 --version
    Corstone_SSE-300_Ethos-U55 --version

    Fast Models [11.15.24 (Aug 17 2021)] Copyright 2000-2021 ARM Limited.
    All Rights Reserved.

    Arena_cache_size given in Vela.ini:
    arena_cache_size=393216

    Cross compiler:
    Arm-none-eabi-gcc
    gcc-arm-none-eabi-10-2020


    What is full command?

    cmake option? 
    cross compiler option?
    use_case file?

  • Are you using U55 or U65? Let me reiterate that DRAM is specific with U65 with the memory mode option of "Dedicated SRAM". In U65 both the buses AXI1 and AXI0 are read/write. Intermediate arrays are part of the arena in DRAM but eventually gets cached in SRAM in the case of U65. To run the model, can you please try having sufficeient dram by making ACTIVATION_BUF_SZ as ~19 MB ? I have quikcly tried and it's working for me. If this still fails for you, please let me know your full cmake command. 

  • Thank you for your response!

    This is my cmake commnad.

    cmake .. -DTARGET_PLATFORM=mps3 -DTARGET_SUBSYSTEM=sse-300  DCMAKE_TOOLCHAIN_FILE=/ml-embedded-evaluation-kit/scripts/cmake/toolchains/bare-metal-gcc.cmake -DETHOS_U_NPU_ID=U65 -DETHOS_U_NPU_CONFIG_ID=Y256

    (cmake option)

    What more information do I need to give you?

    I use Ethos-U65.

    And I have made  my  ACTIVATION_BUF_SZ  to 23MB .

    But it have "tensor allocation failed"

    Thanks in advance for your answers

  • issue seems that because your cmake changes of are not getting built. Are you building any specific use case:
    for e.g if you build inference runner make changes in <path of eval kit>/source/use_case/inference_runner/usecase.cmake
    <snip>

    USER_OPTION(${use_case}_ACTIVATION_BUF_SZ "Activation buffer size for the chosen model"
    0x01700000
    STRING)

    <snip>

    Now in the evak kit follow this:

    1. mkdir build;cd build

    2. cmake .. -DUSE_CASE_BUILD=inference_runner -DETHOS_U_NPU_ID=U65 -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/toolchains/bare-metal-gcc.cmake -DETHOS_U_NPU_MEMORY_MODE=Dedicated_Sram -DCPU_PROFILE_ENABLED=1 -DLOG_LEVEL=LOG_LEVEL_TRACE -Dinference_runner_MODEL_TFLITE_PATH=./fscrn/fsrcnn_720p_vela.tflite
     

    You must confirm that your cmake changes ACTIVATION_BUF_SZ coming in logs. 
    <snip>
    -- ETHOS_U_NPU_CACHE_SIZE=393216
    -- ETHOS_U_NPU_MEMORY_MODE=Dedicated_Sram
    -- ETHOS_U_NPU_CONFIG_ID=Y256
    -- ETHOS_U_NPU_TIMING_ADAPTER_ENABLED=ON
    -- TA_CONFIG_FILE=./cmake/timing_adapter/ta_config_u65_high_end.cmake
    -- inference_runner_ACTIVATION_BUF_SZ=0x01700000
    -- inference_runner_DYNAMIC_MEM_LOAD_ENABLED=OFF
    -- inference_runner_MODEL_TFLITE_PATH=./fscrn/fsrcnn_720p_vela.tflite

    <snip>

    3. make // your application will be build. 

    Similar way you have to do this if you are using any diff use-case. Refer : review.mlplatform.org/.../building.md