Ask the Arm ML Embedded Evaluation Kit(review.mlplatform.org/.../documentation.md
1.In the "Memory considerations" section of this article, it is mentioned that there are three memory modes that can be set. May I ask why the Dedicated_Sram mode can only be used on Ethos-U65?
Is it because of some hardware limitation that it can only be used on U65?
2.Why does activate_buf_sz in use_case.cmake refer to different memory when Shared_Sram and Dedicated_Sram are used? Under Shared_Sram, it refers to the size of SRAM, and under Dedicated_Sram, it refers to the size of DRAM?
The activate buffer I know is to put the tensor arena and should be placed in the cache. Not sure why the size of this value for different modes refers to different things.
(Dedicatd_Sram_mode)(Shared_Sram_mode)
3.I have tried putting a model (FSRCNN, github.com/.../FSRCNN_Tensorflow)Convert it to tflite, and use the Vela compiler to convert tflite into an optimized model and put it on the FVP of Ethos-U65 to simulate, and encountered a problem."tensor allocation failed!"
This situation was encountered before because the value of activate_buf_sz in use_cased.cmake was set too small. At this time, we need to check the memory of the repoert of Vela comilper to adjust its size.
I am using dedicated Sram mode this time, so I have to check the DRAM used value of Memory uesd to adjust.But this time the situation is different. The value I adjusted has exceeded the required value.
Still tensor allocation failed!Would like to ask is there any reason why he failed?
(vela report)
(activate_buf_sz in use_cased.cmake)
Thanks in advance for your answers
Are you using U55 or U65? Let me reiterate that DRAM is specific with U65 with the memory mode option of "Dedicated SRAM". In U65 both the buses AXI1 and AXI0 are read/write. Intermediate arrays are part of the arena in DRAM but eventually gets cached in SRAM in the case of U65. To run the model, can you please try having sufficeient dram by making ACTIVATION_BUF_SZ as ~19 MB ? I have quikcly tried and it's working for me. If this still fails for you, please let me know your full cmake command.
Thank you for your response!
This is my cmake commnad.
cmake .. -DTARGET_PLATFORM=mps3 -DTARGET_SUBSYSTEM=sse-300 DCMAKE_TOOLCHAIN_FILE=/ml-embedded-evaluation-kit/scripts/cmake/toolchains/bare-metal-gcc.cmake -DETHOS_U_NPU_ID=U65 -DETHOS_U_NPU_CONFIG_ID=Y256
(cmake option)
What more information do I need to give you?
I use Ethos-U65.
And I have made my ACTIVATION_BUF_SZ to 23MB .
But it have "tensor allocation failed"
issue seems that because your cmake changes of are not getting built. Are you building any specific use case: for e.g if you build inference runner make changes in <path of eval kit>/source/use_case/inference_runner/usecase.cmake<snip>
<snip>
Now in the evak kit follow this:
1. mkdir build;cd build
2. cmake .. -DUSE_CASE_BUILD=inference_runner -DETHOS_U_NPU_ID=U65 -DCMAKE_TOOLCHAIN_FILE=./scripts/cmake/toolchains/bare-metal-gcc.cmake -DETHOS_U_NPU_MEMORY_MODE=Dedicated_Sram -DCPU_PROFILE_ENABLED=1 -DLOG_LEVEL=LOG_LEVEL_TRACE -Dinference_runner_MODEL_TFLITE_PATH=./fscrn/fsrcnn_720p_vela.tflite
You must confirm that your cmake changes ACTIVATION_BUF_SZ coming in logs. <snip>-- ETHOS_U_NPU_CACHE_SIZE=393216-- ETHOS_U_NPU_MEMORY_MODE=Dedicated_Sram-- ETHOS_U_NPU_CONFIG_ID=Y256-- ETHOS_U_NPU_TIMING_ADAPTER_ENABLED=ON-- TA_CONFIG_FILE=./cmake/timing_adapter/ta_config_u65_high_end.cmake-- inference_runner_ACTIVATION_BUF_SZ=0x01700000-- inference_runner_DYNAMIC_MEM_LOAD_ENABLED=OFF-- inference_runner_MODEL_TFLITE_PATH=./fscrn/fsrcnn_720p_vela.tflite
3. make // your application will be build.
Similar way you have to do this if you are using any diff use-case. Refer : review.mlplatform.org/.../building.md