This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Some question about Arm Embedded Evaluation Kit

Ask the Arm ML Embedded Evaluation Kit(review.mlplatform.org/.../documentation.md

1.In the "Memory considerations" section of this article, it is mentioned that there are three memory modes that can be set. May I ask why the Dedicated_Sram mode can only be used on Ethos-U65?

Is it because of some hardware limitation that it can only be used on U65?

2.Why does activate_buf_sz in use_case.cmake refer to different memory when Shared_Sram and Dedicated_Sram are used? Under Shared_Sram, it refers to the size of SRAM, and under Dedicated_Sram, it refers to the size of DRAM?

The activate buffer I know is to put the tensor arena and should be placed in the cache. Not sure why the size of this value for different modes refers to different things.

 (Dedicatd_Sram_mode) (Shared_Sram_mode)

3.I have tried putting a model (FSRCNN, github.com/.../FSRCNN_Tensorflow)
Convert it to tflite, and use the Vela compiler to convert tflite into an optimized model and put it on the FVP of Ethos-U65 to simulate, and encountered a problem.
"tensor allocation failed!"

This situation was encountered before because the value of activate_buf_sz in use_cased.cmake was set too small. At this time, we need to check the memory of the repoert of Vela comilper to adjust its size.

I am using dedicated Sram mode this time, so I have to check the DRAM used value of Memory uesd to adjust.
But this time the situation is different. The value I adjusted has exceeded the required value.

Still tensor allocation failed!
Would like to ask is there any reason why he failed?

 (vela report)

 (activate_buf_sz in use_cased.cmake)

Thanks in advance for your answers

Parents
  • Very much like your answer

    I understand the first & second question

    But I have a problem.

    DRAM is used in AXI1 port in Dedicated Sram mode ,but this port can only read.

    As shown below

     (Arm Ethos-U55 NPU Technical Reference Manual r2p0)

    So why can I put tensor arena in Dram?

     Difference between Memory Arena and Tensor Arena (arm.com)

    Doesn't the tensor arena contain the intermediate array?

    Intermediate array is an intermediate value that is only generated when inference is in progress, shouldn't it be possible to write it out?

    So why can it be placed in Dram?

    Tensor arena is written to Dram through Cache?

    FVP Vesrion:

    Corstone_SSE-300_Ethos-U65 --version
    Corstone_SSE-300_Ethos-U55 --version

    Fast Models [11.15.24 (Aug 17 2021)] Copyright 2000-2021 ARM Limited.
    All Rights Reserved.

    Arena_cache_size given in Vela.ini:
    arena_cache_size=393216

    Cross compiler:
    Arm-none-eabi-gcc
    gcc-arm-none-eabi-10-2020


    What is full command?

    cmake option? 
    cross compiler option?
    use_case file?

Reply
  • Very much like your answer

    I understand the first & second question

    But I have a problem.

    DRAM is used in AXI1 port in Dedicated Sram mode ,but this port can only read.

    As shown below

     (Arm Ethos-U55 NPU Technical Reference Manual r2p0)

    So why can I put tensor arena in Dram?

     Difference between Memory Arena and Tensor Arena (arm.com)

    Doesn't the tensor arena contain the intermediate array?

    Intermediate array is an intermediate value that is only generated when inference is in progress, shouldn't it be possible to write it out?

    So why can it be placed in Dram?

    Tensor arena is written to Dram through Cache?

    FVP Vesrion:

    Corstone_SSE-300_Ethos-U65 --version
    Corstone_SSE-300_Ethos-U55 --version

    Fast Models [11.15.24 (Aug 17 2021)] Copyright 2000-2021 ARM Limited.
    All Rights Reserved.

    Arena_cache_size given in Vela.ini:
    arena_cache_size=393216

    Cross compiler:
    Arm-none-eabi-gcc
    gcc-arm-none-eabi-10-2020


    What is full command?

    cmake option? 
    cross compiler option?
    use_case file?

Children
  • Are you using U55 or U65? Let me reiterate that DRAM is specific with U65 with the memory mode option of "Dedicated SRAM". In U65 both the buses AXI1 and AXI0 are read/write. Intermediate arrays are part of the arena in DRAM but eventually gets cached in SRAM in the case of U65. To run the model, can you please try having sufficeient dram by making ACTIVATION_BUF_SZ as ~19 MB ? I have quikcly tried and it's working for me. If this still fails for you, please let me know your full cmake command. 

  • Thank you for your response!

    This is my cmake commnad.

    cmake .. -DTARGET_PLATFORM=mps3 -DTARGET_SUBSYSTEM=sse-300  DCMAKE_TOOLCHAIN_FILE=/ml-embedded-evaluation-kit/scripts/cmake/toolchains/bare-metal-gcc.cmake -DETHOS_U_NPU_ID=U65 -DETHOS_U_NPU_CONFIG_ID=Y256

    (cmake option)

    What more information do I need to give you?

    I use Ethos-U65.

    And I have made  my  ACTIVATION_BUF_SZ  to 23MB .

    But it have "tensor allocation failed"

    Thanks in advance for your answers

  • issue seems that because your cmake changes of are not getting built. Are you building any specific use case:
    for e.g if you build inference runner make changes in <path of eval kit>/source/use_case/inference_runner/usecase.cmake
    <snip>

    USER_OPTION(${use_case}_ACTIVATION_BUF_SZ "Activation buffer size for the chosen model"
    0x01700000
    STRING)

    <snip>

    Now in the evak kit follow this:

    1. mkdir build;cd build

    2. cmake .. -DUSE_CASE_BUILD=inference_runner -DETHOS_U_NPU_ID=U65 -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/toolchains/bare-metal-gcc.cmake -DETHOS_U_NPU_MEMORY_MODE=Dedicated_Sram -DCPU_PROFILE_ENABLED=1 -DLOG_LEVEL=LOG_LEVEL_TRACE -Dinference_runner_MODEL_TFLITE_PATH=./fscrn/fsrcnn_720p_vela.tflite
     

    You must confirm that your cmake changes ACTIVATION_BUF_SZ coming in logs. 
    <snip>
    -- ETHOS_U_NPU_CACHE_SIZE=393216
    -- ETHOS_U_NPU_MEMORY_MODE=Dedicated_Sram
    -- ETHOS_U_NPU_CONFIG_ID=Y256
    -- ETHOS_U_NPU_TIMING_ADAPTER_ENABLED=ON
    -- TA_CONFIG_FILE=./cmake/timing_adapter/ta_config_u65_high_end.cmake
    -- inference_runner_ACTIVATION_BUF_SZ=0x01700000
    -- inference_runner_DYNAMIC_MEM_LOAD_ENABLED=OFF
    -- inference_runner_MODEL_TFLITE_PATH=./fscrn/fsrcnn_720p_vela.tflite

    <snip>

    3. make // your application will be build. 

    Similar way you have to do this if you are using any diff use-case. Refer : review.mlplatform.org/.../building.md